Image encoding / decoding apparatus and data transmitting apparatus

The LFNST index coding method solves the problem of efficient compression and transmission of high-resolution, high-quality images/videos, improves coding efficiency, and is suitable for image/video encoding and decoding devices in image coding systems.

CN117560507BActive Publication Date: 2026-06-23NOKIA TECHNOLOGIES OY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NOKIA TECHNOLOGIES OY
Filing Date
2020-09-18
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies suffer from high costs due to increased information volume when transmitting and storing high-resolution, high-quality images/videos, and lack effective compression and encoding methods, especially inefficient for broadcasting image features in immersive media and virtual reality/artificial reality content.

Method used

The LFNST index encoding method is adopted. By deriving the modified transform coefficients, it determines whether to parse the LFNST index based on the width, height, tree type and color format of the current block, and encodes the index of the LFNST matrix, which is then applied to the sub-partition transform block.

Benefits of technology

It improves image/video compression efficiency and transform index coding efficiency, making it suitable for efficient compression and transmission of high-resolution and high-quality images/videos.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117560507B_ABST
    Figure CN117560507B_ABST
Patent Text Reader

Abstract

The disclosure relates to an image encoding / decoding apparatus and a device for transmitting data. An image decoding method according to the present document includes a step for deriving modified transform coefficients, wherein the step for deriving modified transform coefficients includes a step of determining whether to parse an LFNST index based on whether a width and a height of a current block satisfy a condition on whether an LFNST can be applied, and whether the condition on whether the LFNST can be applied is determined based on a tree type and a color format of the current block and whether an ISP is applied to the current block.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the original invention patent application No. 202080078903.5 (International Application No.: PCT / KR2020 / 012655, Application Date: September 18, 2020, Invention Title: Transformation-Based Image Coding Method and Apparatus). Technical Field

[0002] This disclosure relates to an image coding technique, and more specifically, to a method and apparatus for encoding images based on transformations in an image coding system. Background Technology

[0003] Today, the demand for high-resolution and high-quality images / videos, such as 4K, 8K, or even higher Ultra High Definition (UHD) images / videos, is constantly growing across various fields. As image / video data becomes higher resolution and higher quality, the amount of information or bits transmitted increases compared to traditional image data. Therefore, transmission and storage costs increase when using media such as traditional wired / wireless broadband lines to transmit image data or when using existing storage media to store image / video data.

[0004] In addition, there is increasing interest and demand for immersive media such as virtual reality (VR) and artificial reality (AR) content or holograms, and broadcasting of images / videos with image characteristics that differ from real images such as game images is on the rise.

[0005] Therefore, there is a need for efficient image / video compression techniques to effectively compress, transmit, store, and reproduce information with high resolution and high quality images / videos that have the various characteristics described above. Summary of the Invention

[0006] Technical issues

[0007] One aspect of this disclosure is to provide a method and apparatus for increasing image coding efficiency.

[0008] Another technical aspect of this disclosure is to provide a method and apparatus for increasing the efficiency of transform index coding.

[0009] Another technical aspect of this disclosure is to provide an image encoding method and apparatus using LFNST.

[0010] Another aspect of this disclosure is to provide a method and apparatus for encoding an image for applying LFNST to a sub-partition transform block.

[0011] Technical solution

[0012] According to embodiments of this specification, an image decoding method performed by a decoding device is provided herein. The method may include the step of deriving modified transform coefficients, wherein the step of deriving modified transform coefficients may include the step of determining whether to parse an LFNST index based on whether the width and height of the current block satisfy the conditions for applying LFNST, and whether the conditions for applying LFNST are satisfied may be determined based on the tree type and color format of the current block and whether an ISP is applied to the current block.

[0013] When the tree structure of the current block is a two-tree chroma, the LFNST index can be resolved when the height and width corresponding to the chroma component block of the current block are equal to 4 or greater.

[0014] When the current block's tree structure is a single-tree or double-tree luminance, the LFNST index can be resolved when the height and width corresponding to the luminance component block of the current block are equal to 4 or greater.

[0015] When the ISP is applied to the current block, the LFNST index can be resolved when the height and width of the partitioned sub-partition block are equal to 4 or greater.

[0016] When the tree structure of the current block is a dual-tree luminance or a single-tree, the LFNST index can be resolved when the height and width of the sub-blocks of the luminance component block of the current block are equal to 4 or greater.

[0017] The current block is a coding unit, and the LFNST index can be resolved when the width and height of the coding unit are equal to or less than the maximum brightness transform size available for the transform.

[0018] According to embodiments of this specification, an image encoding method performed by an encoding device is provided herein. The method may include the following steps: deriving modified transform coefficients from transform coefficients by applying LFNST; and encoding quantized residual information and LFNST indices indicating the LFNST matrix applied to the LFNST, wherein the LFNST indices may be encoded based on whether the width and height of the current block satisfy the conditions for applying LFNST, and wherein satisfying the conditions for applying LFNST may be determined based on the tree type and color format of the current block and whether ISP is applied to the current block.

[0019] According to another embodiment of the present disclosure, a digital storage medium may be provided that stores image data including a bitstream and encoded image information generated according to an image encoding method performed by an encoding device.

[0020] According to another embodiment of the present disclosure, a digital storage medium can be provided that stores image data including encoded image information and bitstreams to enable a decoding device to perform an image decoding method.

[0021] Beneficial effects

[0022] According to this disclosure, the overall image / video compression efficiency can be increased.

[0023] According to this disclosure, the efficiency of transformation index encoding can be increased.

[0024] The technical aspects disclosed herein can provide image encoding methods and devices using LFNST.

[0025] The technical aspects of this disclosure provide methods and apparatus for encoding images to apply LFNST to sub-partition transform blocks.

[0026] The effects achievable through the specific examples of this disclosure are not limited to those listed above. For example, various technical effects may exist that can be understood or derived from this disclosure by one of ordinary skill in the art. Therefore, the specific effects of this disclosure are not limited to those expressly described herein, but may include various effects that can be understood or derived from the technical features of this disclosure. Attached Figure Description

[0027] Figure 1 Examples of video / image coding systems to which this disclosure can be applied are illustrated schematically.

[0028] Figure 2 This is a diagram that schematically illustrates the configuration of a video / image encoding device to which this disclosure can be applied.

[0029] Figure 3 This is a diagram that schematically illustrates the configuration of a video / image decoding device to which this disclosure can be applied.

[0030] Figure 4 The structure of a content streaming system applying this disclosure is illustrated.

[0031] Figure 5 The multi-transformation technique according to the implementation of this document is illustrated schematically.

[0032] Figure 6 The intra-frame orientation patterns for 65 predicted directions are schematically shown.

[0033] Figure 7 This is a diagram used to explain the implementation of RST according to this document.

[0034] Figure 8 This is a diagram illustrating the sequence of output data from a forward first transformation arranged into a one-dimensional vector, based on the example.

[0035] Figure 9 This is a diagram illustrating a sequence of two-dimensional blocks arranged according to an example of the output data of a forward quadratic transform.

[0036] Figure 10 This is a diagram illustrating a wide-angle intra-frame prediction mode according to an embodiment of this specification.

[0037] Figure 11 This is a diagram illustrating the block shape to which LFNST is applied.

[0038] Figure 12 This is a diagram illustrating the arrangement (or alignment) of the output data of the forward LFNST according to an embodiment.

[0039] Figure 13 This is a diagram illustrating that the amount of output data for the positive LFNST, according to the example, is limited to a maximum of 16.

[0040] Figure 14 This is a diagram illustrating the zeroing process in a block using 4×4LFNST, based on the example.

[0041] Figure 15 This is a diagram illustrating the zeroing process in a block of 8×8 LFNST, based on the example.

[0042] Figure 16 This is a diagram illustrating the zeroing process in a block of 8×8 LFNST according to another example.

[0043] Figure 17 This is a diagram illustrating an example of how a coded block is divided into sub-blocks.

[0044] Figure 18 This is another example of how a coded block is divided into sub-blocks.

[0045] Figure 19 This is a diagram illustrating the symmetry between M×2 (M×1) blocks and 2×M (1×M) blocks according to an embodiment.

[0046] Figure 20 This is a diagram illustrating an example of transposing a 2×M block according to an implementation method.

[0047] Figure 21 The scanning sequence of 8×2 or 2×8 regions according to the implementation method is illustrated.

[0048] Figure 22 This is a flowchart illustrating the operation of a video decoding device according to an embodiment of this specification.

[0049] Figure 23This is a flowchart illustrating the operation of a video encoding device according to an embodiment of this specification. Detailed Implementation

[0050] While this disclosure may be readily modified and includes various embodiments, specific embodiments thereof have been illustrated by way of example in the accompanying drawings and will now be described in detail. However, this is not intended to limit this disclosure to the specific embodiments disclosed herein. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the technical concept of this disclosure. The singular form may include the plural form unless the context clearly indicates otherwise. Terms such as “comprising” and “having” are intended to indicate the presence of the features, numbers, steps, operations, elements, components, or combinations thereof used in the following description, and should therefore not be construed as pre-excluding the possibility of the presence or addition of one or more different features, numbers, steps, operations, elements, components, or combinations thereof.

[0051] Furthermore, for ease of description of their different features and functions, the components in the accompanying drawings described herein are illustrated independently; however, this does not imply that each component is implemented by a separate piece of hardware or software. For example, any two or more of these components may be combined to form a single component, and any single component may be divided into multiple components. Embodiments in which components are combined and / or divided will fall within the scope of this disclosure, provided they do not depart from the spirit of this disclosure.

[0052] In the following description, preferred embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Furthermore, in the drawings, the same reference numerals are used for the same components, and repeated descriptions of the same components will be omitted.

[0053] This document relates to video / image coding. For example, the methods / examples disclosed in this document may relate to the VVC (Video Coding Universal) standard (ITU-T Rec.H.266), next-generation video / image coding standards after VVC, or other video coding-related standards (e.g., HEVC (High Efficiency Video Coding) standard (ITU-T Rec.H.265), EVC (Essential Video Coding) standard, AVS2 standard, etc.).

[0054] This document provides various implementations related to video / image encoding, and these implementations may be combined and performed in combination with each other unless otherwise specified.

[0055] In this document, video can refer to a collection of images over a period of time. Typically, an image is a unit representing a specific time region, while a strip / patch is a unit that constitutes a part of an image. A strip / patch can include one or more coding tree units (CTUs). An image can consist of one or more strips / patches. An image can consist of one or more patch groups. A patch group can include one or more patches.

[0056] A pixel or primitive (pel) can refer to the smallest unit that makes up a picture (or image). Alternatively, "sample" can be used as the term corresponding to a pixel. A sample can typically represent a pixel or a pixel value, and can represent only the pixel / pixel value of the luminance component or only the pixel / pixel value of the chrominance component. Alternatively, a sample can refer to a pixel value in the spatial domain, or, when the pixel value is transformed to the frequency domain, it can refer to the transform coefficients in the frequency domain.

[0057] A unit can represent the basic unit of image processing. A unit may include a specific region and at least one of the information associated with that region. A unit may include a luminance block and two chrominance (e.g., cb, cr) blocks. Depending on the context, units and terms such as blocks and regions may be used interchangeably. Typically, an M×N block may include a set (or array) of samples or transform coefficients consisting of M columns and N rows.

[0058] In this document, the terms " / " and "," should be interpreted as indicating "and / or". For example, the expression "A / B" can mean "A and / or B". Additionally, "A, B" can mean "A and / or B". Furthermore, "A / B / C" can mean "at least one of A, B, and / or C". Additionally, "A / B / C" can mean "at least one of A, B, and / or C".

[0059] Additionally, in this document, the term "or" should be interpreted as indicating "and / or". For example, the expression "A or B" could include 1) only A, 2) only B, and / or 3) both A and B. In other words, the term "or" in this document should be interpreted as indicating "additionally or alternatively".

[0060] In this disclosure, "at least one of A and B" can mean "only A", "only B" or "both A and B". Furthermore, in this disclosure, the expression "at least one of A or B" or "at least one of A and / or B" can be interpreted as "at least one of A and B".

[0061] Furthermore, in this disclosure, "at least one of A, B, and C" may mean "A only", "B only", "C only" or "any combination of A, B, and C". Additionally, "at least one of A, B, or C" or "at least one of A, B, and / or C" may mean "at least one of A, B, and C".

[0062] Additionally, the parentheses used in this disclosure can indicate "for example". Specifically, when indicated as "prediction (intra-frame prediction)", it can mean that "intra-frame prediction" is proposed as an example of "prediction". In other words, "prediction" in this disclosure is not limited to "intra-frame prediction", and "intra-frame prediction" is proposed as an example of "prediction". Furthermore, when indicated as "prediction (i.e., intra-frame prediction)", this can also mean that "intra-frame prediction" is proposed as an example of "prediction".

[0063] The technical features described individually in one of the accompanying drawings of this disclosure may be implemented individually or simultaneously.

[0064] Figure 1 Examples of video / image coding systems to which this disclosure can be applied are illustrated schematically.

[0065] Reference Figure 1 A video / image encoding system may include a first device (source device) and a second device (receiving device). The source device may transmit encoded video / image information or data to the receiving device in the form of a file or stream via a digital storage medium or network.

[0066] The source device may include a video source, an encoding device, and a transmitter. The receiving device may include a receiver, a decoding device, and a renderer. The encoding device may be referred to as a video / image encoding device, and the decoding device may be referred to as a video / image decoding device. The transmitter may be included in the encoding device. The receiver may be included in the decoding device. The renderer may include a display, and the display may be configured as a separate device or an external component.

[0067] Video sources can be obtained through processes that capture, synthesize, or generate video / images. Video sources may include video / image capture devices and / or video / image generation devices. Video / image capture devices may include, for example, one or more cameras, video / image archives including previously captured video / images, etc. Video / image generation devices may include, for example, computers, tablets, and smartphones, and can generate video / images (electronically). For example, virtual video / images can be generated by computers, etc. In this case, the video / image capture process can be replaced by a process that generates related data.

[0068] Encoding devices can encode input video / images. They can perform a series of processes such as prediction, transformation, and quantization for compression and coding efficiency. The encoded data (encoded video / image information) can be output as a bitstream.

[0069] A transmitter can send encoded video / image information or data, output in bitstream form, to a receiver in a receiving device via a digital storage medium or network, either as a file or a stream. Digital storage media can include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmitter can include elements for generating media files according to a predetermined file format and may include elements for transmission via a broadcast / communication network. The receiver can receive / extract the bitstream and send the received / extracted bitstream to a decoding device.

[0070] Decoding devices can decode video / images by performing a series of processes such as dequantization, inverse transform, and prediction, which correspond to the operations of encoding devices.

[0071] The renderer can render decoded video / images. The rendered video / images can then be displayed on a monitor.

[0072] Figure 2 This diagram schematically illustrates the configuration of a video / image encoding apparatus to which this disclosure may be applied. In the following, the term "video encoding apparatus" may include an image encoding apparatus.

[0073] Reference Figure 2 The encoding device 200 may include an image segmenter 210, a predictor 220, a residual processor 230, an entropy encoder 240, an adder 250, a filter 260, and a memory 270. The predictor 220 may include an inter-frame predictor 221 and an intra-frame predictor 222. The residual processor 230 may include a transformer 232, a quantizer 233, a dequantizer 234, and an inverse transformer 235. The residual processor 230 may further include a subtractor 231. The adder 250 may be referred to as a reconstructor or a reconstruction block generator. According to embodiments, the image segmenter 210, predictor 220, residual processor 230, entropy encoder 240, adder 250, and filter 260 described above may be constituted by one or more hardware components (e.g., an encoder chipset or processor). Furthermore, the memory 270 may include a decoded picture buffer (DPB) and may be constituted by a digital storage medium. The hardware components may further include the memory 270 as an internal / external component.

[0074] Image partitioner 210 can divide an input image (or picture or frame) input to encoding device 200 into one or more processing units. As an example, a processing unit may be referred to as a coding unit (CU). In this case, starting from a coding tree unit (CTU) or a maximum coding unit (LCU), the coding units can be recursively partitioned according to a quadtree-binary-tritree (QTBTTT) structure. For example, based on a quadtree structure, a binary tree structure, and / or a ternary tree structure, a coding unit can be partitioned into multiple coding units of varying depths. In this case, for example, a quadtree structure can be applied first, and a binary tree structure and / or a ternary tree structure can be applied later. Alternatively, a binary tree structure can be applied first. The encoding process according to this disclosure can be performed based on the final coding units without further partitioning. In this case, the maximum coding unit can be directly used as the final coding unit based on the encoding efficiency according to the image characteristics. Alternatively, the coding units can be recursively partitioned into deeper coding units as needed, thereby allowing the optimally sized coding unit to be used as the final coding unit. Here, the encoding process may include processes such as prediction, transformation, and reconstruction, which will be described later. As another example, the processing unit may further include a prediction unit (PU) or a transformation unit (TU). In this case, the prediction unit and the transformation unit may be separate from or distinct from the final encoding unit described above. The prediction unit may be a unit for predicting samples, and the transformation unit may be a unit for deriving the transform coefficients and / or a unit for deriving the residual signal from the transform coefficients.

[0075] Depending on the context, units and terms such as blocks and regions can be used to represent each other. Typically, an M×N block can represent a set of samples or transform coefficients consisting of M columns and N rows. Samples can typically represent pixels or pixel values, and can represent only the pixel / pixel value of the luminance component, or only the pixel / pixel value of the chrominance component. Samples can be used as a term corresponding to pixels or primitives (pellets) in a picture (or image).

[0076] Subtractor 231 subtracts the predicted signal (predicted block, predicted sample array) output from predictor 220 from the input image signal (original block, original sample array) to generate a residual signal (residual block, residual sample array), and the generated residual signal is sent to converter 232. Predictor 220 can perform prediction on the processing target block (hereinafter referred to as "current block") and can generate a prediction block that includes the prediction samples of the current block. Predictor 220 can determine whether to apply intra-frame prediction or inter-frame prediction based on the current block or CU. As discussed later in the description of each prediction mode, the predictor can generate various prediction-related information such as prediction mode information and send the generated information to entropy encoder 240. The prediction information can be encoded in entropy encoder 240 and output as a bitstream.

[0077] Intra-predictor 222 can predict the current block by referencing samples in the current image. Depending on the prediction mode, the reference samples can be located near or separate from the current block. In intra-prediction, the prediction mode can include multiple non-directional modes and multiple directional modes. Non-directional modes can include, for example, DC mode and planar mode. Depending on the level of detail in the prediction direction, the directional modes can include, for example, 33 or 65 directional prediction modes. However, this is just an example, and more or fewer directional prediction modes can be used depending on the settings. Intra-predictor 222 can determine the prediction mode to be applied to the current block by using the prediction modes applied to neighboring blocks.

[0078] Inter-frame predictor 221 can derive a predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on a reference image. In this case, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted based on blocks, sub-blocks, or samples, according to the correlation between motion information between neighboring blocks and the current block. Motion information may include motion vectors and reference image indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. The reference image including the reference block and the reference image including the temporally neighboring block may be the same as or different from each other. The temporally neighboring block may be referred to as a juxtaposed reference block, a juxtaposed CU (colCU), etc., and the reference image including the temporally neighboring block may be referred to as a juxtaposed image (colPic). For example, inter-frame predictor 221 can configure a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to derive the motion vector and / or reference image index of the current block. Inter-frame prediction can be performed based on various prediction modes. For example, in jump mode and merge mode, the inter-frame predictor 221 can use motion information of neighboring blocks as motion information of the current block. In jump mode, unlike merge mode, residual signals cannot be sent. In motion information prediction (motion vector prediction, MVP) mode, motion vectors of neighboring blocks can be used as motion vector predictors, and the motion vector of the current block can be indicated by signaling the motion vector difference.

[0079] Predictor 220 can generate prediction signals based on various prediction methods. For example, the predictor can apply intra-frame prediction or inter-frame prediction to the prediction of a block, and can also apply intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as combined intra-frame and inter-frame prediction (CIIP). Additionally, the predictor can perform prediction on a block based on an intra-block copy (IBC) prediction mode or a palette mode. The IBC prediction mode or palette mode can be used for content image / video encoding such as games, etc. Although IBC essentially performs prediction within the current block, its execution is similar to inter-frame prediction in that it derives a reference block within the current block. That is, IBC can use at least one of the inter-frame prediction techniques described in this disclosure.

[0080] The predicted signals generated by the inter-frame predictor 221 and / or the intra-frame predictor 222 can be used to generate the reconstructed signal or the residual signal. The transformer 232 can generate transform coefficients by applying transform techniques to the residual signal. For example, the transform techniques can include at least one of Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen-Loève Transform (KLT), Graph-Based Transform (GBT), or Conditional Nonlinear Transform (CNT). Here, GBT refers to a transform obtained from a graph when the relationship information between pixels is represented as a graph. CNT refers to a transform obtained based on the predicted signal generated using all previously reconstructed pixels. Furthermore, the transform processing can be applied to square pixel blocks of the same size, or to blocks of variable size that are not square.

[0081] Quantizer 233 quantizes the transform coefficients and sends them to entropy encoder 240, which encodes the quantized signal (information about the quantized transform coefficients) and outputs the encoded signal in a bitstream. The information about the quantized transform coefficients can be referred to as residual information. Quantizer 233 can rearrange the block-type quantized transform coefficients into a one-dimensional vector based on the coefficient scan order and generate information about the quantized transform coefficients based on this one-dimensional vector form. Entropy encoder 240 can perform various encoding methods such as exponential Golomb, context-adaptive variable-length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). Entropy encoder 240 can encode information required for video / image reconstruction, other than the quantized transform coefficients (e.g., values ​​of syntax elements), either together or separately. The encoded information (e.g., encoded video / image information) can be transmitted or stored in bitstream form on a unit-by-unit basis in the Network Abstraction Layer (NAL). The video / image information may also include information about various parameter sets such as Adaptive Parameter Set (APS), Picture Parameter Set (PPS), Sequence Parameter Set (SPS), and Video Parameter Set (VPS). Additionally, the video / image information may include general constraint information. In this disclosure, information and / or syntax elements sent from the encoding device to / signaled to the decoding device may be included in the video / image information. The video / image information can be encoded using the encoding process described above and included in the bitstream. The bitstream can be transmitted over a network or stored in a digital storage medium. Here, the network may include broadcast networks, communication networks, and / or the like, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmitter (not shown) that sends the signal output from the entropy encoder 240 or a memory (not shown) that stores it may be configured as an internal / external element of the encoding device 200, or the transmitter may be included in the entropy encoder 240.

[0082] The quantized transform coefficients output from quantizer 233 can be used to generate a prediction signal. For example, by applying dequantization and inverse transform using vectorized transform coefficients via dequantizer 234 and inverse transformer 235, the residual signal (residual block or residual sample) can be reconstructed. Adder 155 adds the reconstructed residual signal to the prediction signal output from inter-frame predictor 221 or intra-frame predictor 222, thereby generating a reconstructed signal (reconstructed image, reconstructed block, reconstructed sample array). When there is no residual for the processing target block, as in the case of applying a jump mode, the prediction block can be used as the reconstructed block. Adder 250 can be referred to as a reconstructor or reconstructed block generator. The generated reconstructed signal can be used for intra-frame prediction of the next processing target block in the target image, and, as described later, for inter-frame prediction of the next image by filtering.

[0083] In addition, luminance mapping with chroma scaling (LMCS) can be applied in image encoding and / or reconstruction processing.

[0084] Filter 260 can improve subjective / objective video quality by applying filtering to the reconstructed signal. For example, filter 260 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image, and the modified reconstructed image can be stored in memory 270, specifically in the DPB of memory 270. Various filtering methods can include, for example, deblocking filtering, sample adaptive offset, adaptive ring filter, bilateral filter, etc. As discussed later in the description of each filtering method, filter 260 can generate various filtering-related information and send the generated information to entropy encoder 240. The filtering information can be encoded in entropy encoder 240 and output as a bitstream.

[0085] The modified reconstructed image sent to memory 270 can be used as a reference image in inter-frame predictor 221. Accordingly, the encoding device can avoid prediction mismatch in the encoding device 200 and the decoding device when applying inter-frame prediction, and can also improve encoding efficiency.

[0086] The memory 270DPB can store modified reconstructed images for use as reference images in the inter-frame predictor 221. The memory 270 can store motion information of blocks in the current image from which motion information has been derived (or encoded) and / or motion information of blocks in reconstructed images. The stored motion information can be sent to the inter-frame predictor 221 to be used as motion information for neighboring blocks or temporally neighboring blocks. The memory 270 can store reconstructed samples of reconstructed blocks in the current image and send them to the intra-frame predictor 222.

[0087] Figure 3This is a diagram that schematically illustrates the configuration of a video / image decoding device to which this disclosure can be applied.

[0088] Reference Figure 3 The video decoding device 300 may include an entropy decoder 310, a residual processor 320, a predictor 330, an adder 340, a filter 350, and a memory 360. The predictor 330 may include an intra-frame predictor 331 and an inter-frame predictor 332. The residual processor 320 may include a dequantizer 321 and an inverse transformer 322. According to embodiments, the entropy decoder 310, residual processor 320, predictor 330, adder 340, and filter 350 described above may be constituted by one or more hardware components (e.g., a decoder chipset or processor). Additionally, the memory 360 may include a decoded picture buffer (DPB) and may be constituted by a digital storage medium. The hardware components may also include the memory 360 as an internal / external component.

[0089] When the input includes a bitstream containing video / image information, the decoding device 300 can interact with data already prepared therein. Figure 2 The processing of video / image information in the encoding device correspondingly reconstructs the image. For example, the decoding device 300 can deduce units / blocks based on information related to block segmentation obtained from the bitstream. The decoding device 300 can perform decoding by using processing units applied in the encoding device. Therefore, the decoding processing unit can be, for example, an encoding unit, which can be segmented along a quadtree structure, binary tree structure, and / or ternary tree structure using encoding tree units or maximum encoding units. One or more transform units can be derived using encoding units. And, the reconstructed image signal decoded and output by the decoding device 300 can be reproduced by a reproducer.

[0090] Decoding device 300 can receive data from... in the form of a bitstream. Figure 2The signal output by the encoding device can be decoded by the entropy decoder 310. For example, the entropy decoder 310 can parse the bitstream to derive information (e.g., video / image information) required for image reconstruction (or picture reconstruction). The video / image information may also include information about various parameter sets such as Adaptive Parameter Set (APS), Picture Parameter Set (PPS), Sequence Parameter Set (SPS), Video Parameter Set (VPS), etc. In addition, the video / image information may also include general constraint information. The decoding device can further decode the picture based on the information about the parameter sets and / or general constraint information. In this disclosure, the signaling / receiving information and / or syntax elements, which will be described subsequently, can be decoded and obtained from the bitstream through the decoding process. For example, the entropy decoder 310 can decode the information in the bitstream based on encoding methods such as Exponential Golomb coding, CAVLC, CABAC, etc., and can output the values ​​of the syntax elements required for image reconstruction and the quantized values ​​of the transform coefficients of the residuals. More specifically, the CABAC entropy decoding method can receive bins corresponding to each syntax element in the bitstream, determine a context model using information about the target syntax element and the decoding information of neighboring and target blocks, or information about symbols / bins decoded in previous steps, predict the bin generation probability based on the determined context model, and perform arithmetic decoding on the bins to generate symbols corresponding to each syntax element value. Here, the CABAC entropy decoding method can update the context model after determining it using information about symbols / bins decoded for the context model of the next symbol / bin. Prediction information from the information decoded in the entropy decoder 310 can be provided to the predictors (inter-frame predictor 332 and intra-frame predictor 331), and the residual values ​​(i.e., quantization transform coefficients) and associated parameter information that have undergone entropy decoding in the entropy decoder 310 can be input to the residual processor 320. The residual processor 320 can derive residual signals (residual blocks, residual samples, residual sample arrays). Additionally, filtering information from the information decoded in the entropy decoder 310 can be provided to the filter 350. Furthermore, a receiver (not shown) that receives the signal output from the encoding device can also configure the decoding device 300 as an internal / external component, and the receiver can be a component of the entropy decoder 310. Additionally, the decoding device according to this disclosure can be referred to as a video / image / picture encoding device, and the decoding device can be divided into an information decoder (video / image / picture information decoder) and a sample decoder (video / image / picture sample decoder). The information decoder may include the entropy decoder 310, and the sample decoder may include at least one of a dequantizer 321, an inverse transformer 322, an adder 340, a filter 350, a memory 360, an inter-frame predictor 332, and an intra-frame predictor 331.

[0091] The dequantizer 321 can output transform coefficients by dequantizing the quantized transform coefficients. The dequantizer 321 can rearrange the quantized transform coefficients into two-dimensional blocks. In this case, the rearrangement can be performed based on the order of coefficient scans already performed in the encoding device. The dequantizer 321 can perform dequantization on the quantized transform coefficients using quantization parameters (e.g., quantization step size information) and obtain the transform coefficients.

[0092] The inverse converter 322 obtains the residual signal (residual block, residual sample array) by performing an inverse transformation on the transformation coefficients.

[0093] The predictor can perform predictions on the current block and generate a prediction block that includes prediction samples for the current block. The predictor can determine whether to apply intra-frame prediction or inter-frame prediction to the current block based on information about the prediction output from the entropy decoder 310, and specifically, can determine the intra-frame / inter-frame prediction mode.

[0094] The predictor can generate a predicted signal based on various prediction methods. For example, the predictor can apply intra-frame prediction or inter-frame prediction to the prediction of a block, and can also apply intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as combined intra-frame and inter-frame prediction (CIIP). Additionally, the predictor can perform intra-block copying (IBC) for the prediction of a block. Intra-block copying can be used for content image / video encoding such as in games with screen content coding (SCC). Although IBC essentially performs prediction within the current block, its execution is similar to inter-frame prediction in that it derives a reference block within the current block. That is, IBC can use at least one of the inter-frame prediction techniques described in this disclosure.

[0095] The intra-predictor 331 can predict the current block by referencing samples in the current image. Depending on the prediction mode, the reference samples can be located near or separate from the current block. In intra-prediction, the prediction mode can include multiple non-directional modes and multiple directional modes. The intra-predictor 331 can determine the prediction mode applied to the current block by using the prediction modes applied to neighboring blocks.

[0096] Inter-frame predictor 332 can deduce the predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on a reference image. In this case, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted based on the correlation between motion information of neighboring blocks and the current block, on a block, sub-block, or sample basis. Motion information may include motion vectors and reference image indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. For example, inter-frame predictor 332 can configure a motion information candidate list based on neighboring blocks and deduce the motion vector and / or reference image index of the current block based on received candidate selection information. Inter-frame prediction can be performed based on various prediction modes, and the information about the prediction may include information indicating the mode of inter-frame prediction for the current block.

[0097] Adder 340 can generate a reconstruction signal (reconstructed image, reconstruction block, reconstruction sample array) by adding the obtained residual signal to the prediction signal (prediction block, prediction sample array) output from predictor 330. When there is no residual for processing the target block, as in the case of applying the jump mode, the prediction block can be used as the reconstruction block.

[0098] Adder 340 can be referred to as a reconstructor or reconstruction block generator. The generated reconstructed signal can be used for intra-frame prediction of the next processing target block in the current block, and as described later, it can be output by filtering or used for inter-frame prediction of the next image.

[0099] In addition, luminance mapping with chroma scaling (LMCS) can be applied in image decoding processing.

[0100] Filter 350 can improve subjective / objective video quality by applying filtering to the reconstructed signal. For example, filter 350 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image, and the modified reconstructed image can be sent to memory 360, specifically to the DPB of memory 360. Various filtering methods can include, for example, deblocking filtering, adaptive sample shifting, adaptive ring filtering, bilateral filtering, etc.

[0101] The (modified) reconstructed image stored in the DPB of memory 360 can be used as a reference image in inter-frame predictor 332. Memory 360 can store motion information of blocks in the current image from which motion information has been derived (or decoded) and / or motion information of blocks in a reconstructed image. The stored motion information can be sent to inter-frame predictor 332 to be used as motion information of neighboring blocks or temporally neighboring blocks. Memory 360 can store reconstructed samples of reconstructed blocks in the current image and send them to intra-frame predictor 331.

[0102] The examples described in this specification in the predictor 330, dequantizer 321, inverse transformer 322 and filter 350 of the decoding device 300 can be similarly or correspondingly applied to the predictor 220, dequantizer 234, inverse transformer 235 and filter 260 of the encoding device 200, respectively.

[0103] As described above, prediction is performed to improve compression efficiency during video encoding. Accordingly, a prediction block can be generated that includes prediction samples for the current block, which is the target block for encoding. Here, the prediction block includes prediction samples in the spatial domain (or pixel domain). The prediction block can be derived identically in both the encoding and decoding devices, and the encoding device can improve image encoding efficiency by signaling to the decoding device information about the residual between the original block and the prediction block (residual information), not the original sample values ​​of the original block itself. The decoding device can derive a residual block including residual samples based on the residual information, generate a reconstructed block including reconstructed samples by adding the residual block to the prediction block, and generate a reconstructed image including the reconstructed block.

[0104] Residual information can be generated through transformation and quantization processes. For example, an encoding device can derive a residual block between the original block and the prediction block, derive transform coefficients by performing a transform process on the residual samples (residual sample array) included in the residual block, and derive quantized transform coefficients by performing a quantization process on the transform coefficients. This allows it to signal the associated residual information to the decoding device (via a bitstream). Here, the residual information can include the value information, position information, transform technique, transform kernel, quantization parameters, etc., of the quantized transform coefficients. The decoding device can perform quantization / dequantization processes based on the residual information and derive residual samples (or residual sample blocks). The decoding device can generate a reconstructed block based on the prediction block and the residual block. The encoding device can derive the residual block by performing dequantization / inverse transform on the quantized transform coefficients to serve as a reference for inter-frame prediction of the next image, and can generate a reconstructed image based on this.

[0105] Figure 4 The structure of a content streaming system applying this disclosure is illustrated.

[0106] Furthermore, the content streaming system using this disclosure can generally include an encoding server, a streaming server, a web server, a media storage device, a user device, and a multimedia input device.

[0107] An encoding server is used to compress content input from multimedia input devices such as smartphones, cameras, and camcorders into digital data to generate a bitstream, and then sends it to a streaming server. As another example, in cases where the multimedia input device, such as a smartphone, camera, or camcorder, directly generates the bitstream, the encoding server can be omitted. The bitstream can be generated by applying the encoding method or bitstream generation method disclosed herein. Furthermore, the streaming server can temporarily store the bitstream during the sending or receiving process.

[0108] The streaming server sends multimedia data to the user's device via a web server based on the user's request. The web server acts as a tool to notify the user of available services. When a user requests a desired service, the web server transmits the request to the streaming server, and the streaming server sends the multimedia data to the user. In this context, the content streaming system may include a separate control server, which in this case controls the commands / responses between the corresponding devices within the content streaming system.

[0109] A streaming server can receive content from media storage devices and / or encoding servers. For example, when receiving content from an encoding server, the content can be received in real time. In this case, to provide a smooth streaming service, the streaming server can store the bitstream for a predetermined period of time.

[0110] For example, user devices may include mobile phones, smartphones, laptop computers, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigators, board-type PCs, tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, smart glasses, head-mounted displays (HMDs)), digital TVs, desktop computers, digital signage, etc. The servers in the content streaming system can operate as distributed servers, and in this case, data received by each server can be processed in a distributed manner.

[0111] Figure 5 Multiple transformation techniques according to embodiments of the present disclosure are illustrated schematically.

[0112] Reference Figure 5 The converter can be compared with the aforementioned Figure 2 The converter in the encoding device corresponds to the aforementioned converter, and the inverse converter can be associated with the aforementioned converter. Figure 2 The inverse converter in the encoding device corresponds to or is related to Figure 3 This corresponds to the inverse converter in the decoding device.

[0113] The transformer can derive (first) transform coefficients (S510) by performing a first transform based on residual samples (residual sample array) in the residual block. This first transform can be called the core transform. In this paper, the first transform can be based on multiple transform selection (MTS), and when multiple transforms are applied as a first transform, it can be called a multi-core transform.

[0114] Multi-core transform can represent a method of performing transforms by additionally using Discrete Cosine Transform (DCT) Type 2 and Discrete Sine Transform (DST) Type 7, DCT Type 8, and / or DST Type 1. In other words, multi-core transform can represent a method of transforming a spatial domain residual signal (or residual block) into frequency domain transform coefficients (or primary transform coefficients) based on multiple transform kernels selected from DCT Type 2, DST Type 7, DCT Type 8, and DST Type 1. In this paper, from the perspective of the transformer, primary transform coefficients can be referred to as temporary transform coefficients.

[0115] In other words, when applying conventional transform methods, transform coefficients can be generated by applying a spatial-to-frequency domain transform to the residual signal (or residual block) based on DCT type 2. In contrast, when applying multi-core transforms, transform coefficients (or single-stage transform coefficients) can be generated by applying a spatial-to-frequency domain transform to the residual signal (or residual block) based on DCT type 2, DST type 7, DCT type 8, and / or DST type 1. In this paper, DCT type 2, DST type 7, DCT type 8, and DST type 1 can be referred to as transform types, transform kernels, or transform cores. These DCT / DST transform types can be defined based on basis functions.

[0116] When performing a multi-core transform, a vertical transform kernel and a horizontal transform kernel can be selected from the transform kernels for the target block. A vertical transform can be performed on the target block based on the vertical transform kernel, and a horizontal transform can be performed on the target block based on the horizontal transform kernel. Here, the horizontal transform can indicate the transform of the horizontal components of the target block, and the vertical transform can indicate the transform of the vertical components of the target block. The vertical transform kernel / horizontal transform kernel can be adaptively determined based on the prediction mode and / or transform index of the target (CU or sub-block), including the residual block.

[0117] Furthermore, according to the example, if a transformation is performed by applying an MTS, the mapping relationship of the transformation kernels can be set by setting specific basis functions to predetermined values ​​and combining the basis functions to be applied in the vertical or horizontal transformation. For example, when the horizontal transformation kernel is denoted as trTypeHor and the vertical transformation kernel is denoted as trTypeVer, a value of 0 for trTypeHor or trTypeVer can be set to DCT2, a value of 1 for trTypeHor or trTypeVer can be set to DST7, and a value of 2 for trTypeHor or trTypeVer can be set to DCT8.

[0118] In this scenario, the MTS index information can be encoded and signaled to the decoding device to indicate any one of the multiple transform cores. For example, MTS index 0 can indicate that both trTypeHor and trTypeVer values ​​are 0, MTS index 1 can indicate that both trTypeHor and trTypeVer values ​​are 1, MTS index 2 can indicate that trTypeHor is 2 and trTypeVer is 1, MTS index 3 can indicate that trTypeHor is 1 and trTypeVer is 2, and MTS index 4 can indicate that both trTypeHor and trTypeVer values ​​are 2.

[0119] In one example, the transformation kernel set based on MTS index information is shown in the table below.

[0120] [Table 1]

[0121] tu_mts_idx[x0][y0] 0 1 2 3 4 trTypeHor 0 1 2 1 2 trTypeVer 0 1 1 2 2

[0122] The transformer can perform a quadratic transformation based on the (first) transform coefficients to derive modified (second) transform coefficients (S520). A first transform is a transformation from the spatial domain to the frequency domain, while a quadratic transform refers to transforming to a more compact representation using the correlations existing between the (first) transform coefficients. The quadratic transform can include an inseparable transform. In this case, the quadratic transform can be called an inseparable quadratic transform (NSST) or a mode-dependent inseparable quadratic transform (MDNSST). NSST can represent a transform based on an inseparable transform matrix, performing a quadratic transform on the (first) transform coefficients derived from the first transform to generate modified transform coefficients (or quadratic transform coefficients) for the residual signal. Here, based on the inseparable transform matrix, the transform can be applied first to the (first) transform coefficients without separating the vertical and horizontal transforms (or applying the horizontal / vertical transforms independently). In other words, NSST is not applied solely to (first-order) transform coefficients in the vertical and horizontal directions, but can represent, for example, a transform method that rearranges a two-dimensional signal (transform coefficients) into a one-dimensional signal through a specific predetermined direction (e.g., row-first or column-first) and then generates modified transform coefficients (or second-order transform coefficients) based on an inseparable transform matrix. For example, row-first order is for M = N blocks arranged in the order of first row, second row, ..., and Nth row, while column-first order is for M × N blocks arranged in the order of first column, second column, ..., and Mth column. NSST can be applied to the upper left region of a block containing (first-order) transform coefficients (hereinafter referred to as a transform coefficient block). For example, when both the width W and height H of the transform coefficient block are 8 or greater, an 8 × 8 NSST can be applied to the upper left 8 = 8 region of the transform coefficient block. Furthermore, when both the width (W) and height (H) of the transform coefficient block are 4 or greater, and the width (W) or height (H) of the transform coefficient block is less than 8, 4×4NSST can be applied to the upper left min(8,W)×min(8,H) region of the transform coefficient block. However, the implementation is not limited to this. For example, even if only the condition that the width W or height H of the transform coefficient block is 4 or greater is met, 4×4NSST can be applied to the upper left min(8,W)×min(8,H) region of the transform coefficient block.

[0123] Specifically, for example, if a 4×4 input block is used, the inseparable quadratic transformation can be performed as follows.

[0124] A 4×4 input block X can be represented as follows.

[0125] [Formula 1]

[0126]

[0127] If X is represented as a vector, then the vector It can be represented as follows.

[0128] [Equation 2]

[0129]

[0130] In Equation 2, the vector It is a one-dimensional vector obtained by rearranging the two-dimensional block X of Equation 1 according to the row priority order.

[0131] In this case, the inseparable quadratic transformation can be calculated as follows.

[0132] [Formula 3]

[0133]

[0134] In this formula, represents the transformation coefficient vector, while T represents the 16×16 (inseparable) transformation matrix.

[0135] Using Equation 3 above, the 16×1 transformation coefficient vector can be derived. Furthermore, the vector can be scanned in sequence (horizontal, vertical, and diagonal, etc.). Reorganize into 4×4 blocks. However, the above calculation is an example, and the hypercube-Givens transform (HyGT) and similar methods can also be used to calculate inseparable quadratic transformations in order to reduce the computational complexity of inseparable quadratic transformations.

[0136] Furthermore, in inseparable quadratic transforms, the transform kernel (or transform type) can be selected as mode-dependent. In this case, the mode can include intra-frame prediction mode and / or inter-frame prediction mode.

[0137] As described above, an inseparable quadratic transformation can be performed based on an 8×8 transformation or a 4×4 transformation determined by the width (W) and height (H) of the transform coefficient block. An 8×8 transformation is a transformation applicable to an 8×8 region contained within the transform coefficient block when both W and H are equal to or greater than 8, and this 8×8 region can be the top-left 8×8 region within the transform coefficient block. Similarly, a 4×4 transformation is a transformation applicable to a 4×4 region contained within the transform coefficient block when both W and H are equal to or greater than 4, and this 4×4 region can be the top-left 4×4 region within the transform coefficient block. For example, the 8×8 transform kernel matrix can be a 64×64 / 16×64 matrix, while the 4×4 transform kernel matrix can be a 16×16 / 8×16 matrix.

[0138] Here, to select mode-dependent transform kernels, two inseparable quadratic transform kernels can be configured for each transform set of inseparable quadratic transforms for both 8×8 and 4×4 transforms, and there can be four transform sets. That is, four transform sets can be configured for 8×8 transforms, and four transform sets can be configured for 4×4 transforms. In this case, each transform set in the four transform sets for 8×8 transforms can include two 8×8 transform kernels, and each transform set in the four transform sets for 4×4 transforms can include two 4×4 transform kernels.

[0139] However, as the size of the transformation (i.e., the size of the region to which the transformation is applied) can be, for example, a size other than 8×8 or 4×4, the number of sets can be n, and the number of transformation kernels in each set can be k.

[0140] The transform set can be referred to as the NSST set or the LFNST set. A specific set within the transform set can be selected, for example, based on the intra-prediction mode of the current block (CU or sub-block). The Low-Frequency Inseparable Transform (LFNST) can be an example of a reduced inseparable transform, which will be described later, and represents an inseparable transform for low-frequency components.

[0141] For reference, for example, intra-prediction modes may include two non-directional (or non-angular) intra-prediction modes and 65 directional (or angular) intra-prediction modes. Non-directional intra-prediction modes may include planar intra-prediction mode number 0 and DC intra-prediction mode number 1, and directional intra-prediction modes may include 65 intra-prediction modes numbered 2 through 66. However, this is an example, and this document can be applied even if the number of intra-prediction modes differs. Furthermore, in some cases, intra-prediction mode number 67 may be used, and intra-prediction mode number 67 may represent a linear model (LM) mode.

[0142] Figure 6 An example of an intra-frame orientation mode with 65 predicted directions is shown.

[0143] Reference Figure 6 Based on the intra-prediction mode 34 with a left-top diagonal prediction direction, intra-prediction modes can be divided into intra-prediction modes with horizontal directionality and intra-prediction modes with vertical directionality. Figure 6In the diagram, H and V denote horizontal and vertical orientation, respectively, and the numbers -32 to 32 indicate a displacement of 1 / 32 unit at the sample grid position. These numbers can represent the offset for the mode index value. Intra-prediction modes 2 to 33 are horizontally oriented, and intra-prediction modes 34 to 66 are vertically oriented. Strictly speaking, intra-prediction mode 34 can be considered neither horizontal nor vertical, but it can be classified as horizontally oriented when determining the transform set of the quadratic transform. This is because the input data is transposed for a vertical orientation mode symmetric to intra-prediction mode 34, and the input data alignment method for the horizontal mode is used for intra-prediction mode 34. Transposing the input data means switching the rows and columns of the two-dimensional M×N block data to N×M data. Intra-prediction modes 18 and 50 can represent the horizontal and vertical intra-prediction modes, respectively, and intra-prediction mode 2 can be called the upper-right diagonal intra-prediction mode because it has a left reference pixel and performs prediction in the upper-right direction. Similarly, intra-prediction mode 34 can be referred to as the bottom-right diagonal intra-prediction mode, while intra-prediction mode 66 can be referred to as the bottom-left diagonal intra-prediction mode.

[0144] Based on the example, four transform sets can be mapped according to the intra-frame prediction mode, as shown in the table below.

[0145] [Table 2]

[0146] lfnstPredModeIntra lfnstTrSetIdx lfnstPredModeIntra<0 1 0<=lfnstPredModeIntra<=1 0 2<=lfnstPredModeIntra<=12 1 13<=lfnsstPredModeIntra<=23 2 24<=lfnsstPredModeIntra<=44 3 45<=lfnstPredModeIntra<=55 2 56<=lfnstPredModeIntra<=80 1 81<=lfnstPredModeIntra<=83 0

[0147] As shown in Table 2, any one of the four transform sets, i.e., lfnstTrSetIdx, can be mapped to any one of the four indices (i.e., 0 to 3) according to the intra-prediction mode.

[0148] When a specific set is determined to be used for an inseparable quadratic transform, one of the k transform kernels in that set can be selected using the inseparable quadratic transform index. The encoding device can derive the inseparable quadratic transform index indicating the specific transform kernel based on rate-distortion (RD) check and can signal the inseparable quadratic transform index to the decoding device. The decoding device can select one of the k transform kernels in the specific set based on the inseparable quadratic transform index. For example, lfnst index value 0 can refer to the first inseparable quadratic transform kernel, lfnst index value 1 can refer to the second inseparable quadratic transform kernel, and lfnst index value 2 can refer to the third inseparable quadratic transform kernel. Alternatively, lfnst index value 0 can indicate that the first inseparable quadratic transform is not applied to the target block, and lfnst index values ​​1 through 3 can indicate three transform kernels.

[0149] The converter can perform an inseparable quadratic transform based on the selected transform core and obtain modified (quadratic) transform coefficients. As mentioned above, the modified transform coefficients can be derived as transform coefficients quantized by a quantizer and can be encoded and signaled to the decoding device, and transmitted to the dequantizer / inverse converter in the encoding device.

[0150] Furthermore, as mentioned above, if the second transformation is omitted, the (first) transformation coefficients, which are the output of the first (separable) transformation, can be derived as the transformation coefficients quantized by the quantizer as described above, and can be encoded and signaled to the decoding device, and transmitted to the dequantizer / inverse transformer in the encoding device.

[0151] The inverse transformer can perform a series of processes in the reverse order of those already executed in the aforementioned transformers. The inverse transformer can receive (dequantized) transform coefficients and derive (first) transform coefficients by performing a second (inverse) transform (S550), and obtain residual blocks (residual samples) by performing a first (inverse) transform on the (first) transform coefficients (S560). In this regard, from the perspective of the inverse transformer, the first transform coefficients can be referred to as modified transform coefficients. As described above, the encoding and decoding devices can generate reconstructed blocks based on the residual blocks and prediction blocks, and can generate reconstructed images based on the reconstructed blocks.

[0152] The decoding device may also include a second-order inverse transform application determiner (or a component for determining whether to apply the second-order inverse transform) and a second-order inverse transform determiner (or a component for determining the second-order inverse transform). The second-order inverse transform application determiner can determine whether to apply the second-order inverse transform. For example, the second-order inverse transform can be NSST, RST, or LFNST, and the second-order inverse transform application determiner can determine whether to apply the second-order inverse transform based on a second-order transform flag obtained by parsing the bitstream. In another example, the second-order inverse transform application determiner can determine whether to apply the second-order inverse transform based on the transform coefficients of the residual block.

[0153] A second-order inverse transform determiner can determine the second-order inverse transform. In this case, the second-order inverse transform determiner can determine the second-order inverse transform applied to the current block based on the LFNST (NSST or RST) transform set specified according to the intra-prediction mode. In an implementation, the second-order transform determination method can be determined depending on the first-order transform determination method. Various combinations of the first and second-order transforms can be determined based on the intra-prediction mode. Furthermore, in the example, the second-order inverse transform determiner can determine the region where the second-order inverse transform is applied based on the size of the current block.

[0154] Furthermore, as mentioned above, if the second (inverse) transform is omitted, the (dequantized) transform coefficients can be received, a first (separable) inverse transform can be performed, and a residual block (residual sample) can be obtained. As mentioned above, the encoding and decoding devices can generate a reconstructed block based on the residual block and the prediction block, and can generate a reconstructed image based on the reconstructed block.

[0155] Furthermore, in this disclosure, a reduced quadratic transformation (RST) in which the size of the transformation matrix (kernel) is reduced can be applied to the concept of NSST in order to reduce the computational and storage requirements of the inseparable quadratic transformation.

[0156] Furthermore, the transform kernel, transform matrix, and coefficients constituting the transform kernel matrix described in this disclosure, i.e., kernel coefficients or matrix coefficients, can be represented in 8 bits. This is feasible for implementation in decoding and encoding devices, and compared to existing 9-bit or 10-bit representations, it reduces the amount of storage required to store the transform kernel and can reasonably accommodate performance degradation. Additionally, representing the kernel matrix in 8 bits allows for the use of smaller multipliers and is more suitable for Single Instruction Multiple Data (SIMD) instructions for optimal software implementation.

[0157] In this specification, the term "RST" can refer to a transformation performed on the residual samples of a target block based on a transformation matrix whose size is reduced according to a reduction factor. When performing a reduction transformation, the computational cost required for the transformation can be reduced due to the smaller size of the transformation matrix. In other words, RST can be used to address computational complexity issues that arise when transforming large blocks or when transforming indivisible blocks.

[0158] RST can be referred to by various terms such as reduced transform, reduced quadratic transform, reduced transform, simplified transform, and simple transform, and the names that RST can be called are not limited to the examples listed. Alternatively, since RST is performed primarily in the low-frequency region of the transform block that includes non-zero coefficients, it can be called low-frequency inseparable transform (LFNST). The transform index can be called the LFNST index.

[0159] Furthermore, when performing a second inverse transform based on RST, the inverse transformer 235 of the encoding device 200 and the inverse transformer 322 of the decoding device 300 may include: an inverse reduced second transformer that derives modified transform coefficients based on the inverse RST of the transform coefficients; and an inverse first transformer that derives the residual samples of the target block based on the inverse first transform of the modified transform coefficients. An inverse first transform refers to the inverse transform of a first transform applied to the residuals. In this disclosure, deriving transform coefficients based on a transform can mean deriving the transform coefficients by applying a transform.

[0160] Figure 7This is a schematic diagram of an RST according to an embodiment of the present disclosure.

[0161] In this disclosure, "target block" may refer to the current block, residual block, or transform block to be encoded.

[0162] In the example RST, an N-dimensional vector can be mapped to an R-dimensional vector in another space, thus determining the reduced transformation matrix, where R is less than N. N can refer to the square of the length of the side of the block to which the transformation is applied, or the total number of transformation coefficients corresponding to the block to which the transformation is applied, and the reduction factor can refer to the R / N value. The reduction factor can be called a reduction factor, shrinkage factor, simplification factor, or other various terms. Furthermore, R can be called a reduction coefficient, but depending on the situation, the reduction factor can refer to R. Additionally, depending on the situation, the reduction factor can refer to the N / R value.

[0163] In this example, the reduction factor or reduction coefficient can be signaled via a bitstream, but the example is not limited to this. For instance, a predetermined value for the reduction factor or reduction coefficient can be stored in each of the encoding device 200 and the decoding device 300, and in this case, the reduction factor or reduction coefficient does not need to be signaled separately.

[0164] The size of the reduced transformation matrix, as shown in the example, can be less than N×N (the size of the regular transformation matrix) and can be R×N, as defined in Equation 4 below.

[0165] [Formula 4]

[0166]

[0167] Figure 7 The matrix T in the reduced transformation block shown in (a) can refer to the matrix T in Equation 4. R×N .like Figure 7 As shown in (a), when the reduced transformation matrix T R×N By multiplying by the residual sample of the target block, the transformation coefficients of the current block can be derived.

[0168] In the example, if the size of the block to which the transformation is applied is 8×8 and R = 16 (i.e., R / N = 16 / 64 = 1 / 4), then according to Figure 7 The RST of (a) can be represented as the matrix operation shown in Equation 5. In this case, the storage and multiplication computations can be reduced to approximately 1 / 4 by a reduction factor.

[0169] In this disclosure, matrix operations can be understood as operations on column vectors obtained by multiplying a column vector by a matrix placed to the left of the column vector.

[0170] [Formula 5]

[0171]

[0172] In Equation 6, r1 to r 64 The residual samples of the target block can be represented, and specifically, they can be the transformation coefficients generated by applying a single transformation. As a result of the calculation in Equation 5, the transformation coefficients c of the target block can be derived. i And derive c i The process can be shown in Equation 6.

[0173] [Formula 6]

[0174]

[0175] As a result of Equation 6, the transformation coefficients c1 to c of the target block can be derived. R In other words, when R = 16, the transformation coefficients c1 to c of the target block can be derived. 16 If a conventional transform is applied instead of an RST, and a 64×64 (N×N) transform matrix is ​​multiplied by a 64×1 (N×1) residual sample, only 16(R) transform coefficients are derived for the target block because of the application of the RST, even though 64(N) transform coefficients are derived for the target block. Since the total number of transform coefficients used for the target block is reduced from N to R, the amount of data sent from the encoding device 200 to the decoding device 300 is reduced, thus improving the transmission efficiency between the encoding device 200 and the decoding device 300.

[0176] When considering the size of the transformation matrix, the size of a regular transformation matrix is ​​64×64 (N×N), but the size of a reduced transformation matrix is ​​reduced to 16×64 (R×N). Therefore, compared to performing a regular transformation, the storage utilization rate of performing an RST can be reduced by the R / N ratio. Furthermore, compared to the number of multiplications (N×N) when using a regular transformation matrix, using a reduced transformation matrix can reduce the number of multiplications (R×N) by the R / N ratio.

[0177] In the example, the transformer 232 of the encoding device 200 can derive the transform coefficients of the target block by performing a first transform and an RST-based second transform on the residual samples of the target block. These transform coefficients can be passed to the inverse transformer of the decoding device 300, and the inverse transformer 322 of the decoding device 300 can derive the modified transform coefficients based on the inverse reduced second transform (RST) for the transform coefficients, and can derive the residual samples of the target block based on the inverse first transform for the modified transform coefficients.

[0178] Based on the example inverse RST matrix T N×RIts size is N×R, which is larger than the size of the conventional inverse transformation matrix N×N, and is the same as the reduced transformation matrix T shown in Equation 4. R×N It has a transpose relationship.

[0179] Figure 7 The matrix T in the reduced inverse transform block shown in (b) t It can refer to the inverse RST matrix T N×R T (The superscript T indicates transpose). For example... Figure 7 As shown in (b), when the inverse RST matrix T N×R T Multiplying by the transform coefficients of the target block allows for the derivation of the modified transform coefficients of the target block or the residual samples of the target block. The inverse RST matrix T N×R T It can be represented as (T) R×N ) T N×R .

[0180] More specifically, when the inverse RST is used as a second inverse transformation, when the inverse RST matrix T N×R T When multiplied by the transform coefficients of the target block, the modified transform coefficients of the target block can be derived. Furthermore, the inverse RST can be used as the inverse first-order transform, and in this case, when the inverse RST matrix T... N×R T When multiplied by the transformation coefficients of the target block, the residual sample of the target block can be derived.

[0181] In the example, if the size of the block to which the inverse transform is applied is 8×8 and R = 16 (i.e., R / N = 16 / 64 = 1 / 4), then according to Figure 7 The RST of (b) can be represented as the matrix operation shown in Equation 7.

[0182] [Formula 7]

[0183]

[0184] In Equation 7, c1 to c 16 This can represent the transformation coefficients of the target block. As a result of the calculation in Equation 7, the transformation coefficients representing the modifications to the target block or the r of the residual samples of the target block can be derived. j And derive r j The process can be shown in Equation 8.

[0185] [Formula 8]

[0186]

[0187] As a result of Equation 8, the transformation coefficients representing the modification of the target block or the residual samples of the target block, r1 to r2, can be derived. N From the perspective of the size of the inverse transformation matrix, the size of the regular inverse transformation matrix is ​​64×64 (N×N), but the size of the inverse reduced transformation matrix is ​​reduced to 64×16 (R×N). Therefore, compared with performing the regular inverse transformation, the storage utilization rate of performing the inverse RST can be reduced by the R / N ratio. In addition, when comparing the number of multiplications N×N when using the regular inverse transformation matrix, using the inverse reduced transformation matrix can reduce the number of multiplications (N×R) by the R / N ratio.

[0188] The transform set configuration shown in Table 2 can also be applied to 8×8 RST. That is, 8×8 RST can be applied based on the transform sets in Table 2. Since a transform set includes two or three transforms (kernels) depending on the intra-prediction mode, it can be configured to select one of up to four transforms, including those without applying a secondary transform. In the transforms without applying a secondary transform, the application of an identity matrix can be considered. Assuming indices 0, 1, 2, and 3 are assigned to the four transforms respectively (for example, index 0 can be assigned to the case where the identity matrix is ​​applied, i.e., without applying a secondary transform), the transform index or lfnst index, which is used as a syntax element, can be signaled for each transform coefficient block, thereby specifying the transform to be applied. That is, for the top-left 8×8 block, 8×8 NSST in the RST configuration can be specified via the transform index, or 8×8 lfnst can be specified when applying LFNST. 8×8lfnst and 8×8RST refer to transformations of 8×8 regions within a transform coefficient block when both W and H of the target block are equal to or greater than 8, and the 8×8 region can be the top-left 8×8 region within the transform coefficient block. Similarly, 4×4lfnst and 4×4RST refer to transformations of 4×4 regions within a transform coefficient block when both W and H of the target block are equal to or greater than 4, and the 4×4 region can be the top-left 4×4 region within the transform coefficient block.

[0189] According to embodiments of this disclosure, for the transformation during the encoding process, only 48 data points can be selected, and a maximum 16×48 transformation kernel matrix can be applied to them, instead of applying a 16×64 transformation kernel matrix to the 64 data points forming an 8×8 region. Here, "maximum" means that m has a maximum value of 16 in the m×48 transformation kernel matrix to generate m coefficients. That is, when performing RST by applying an m×48 transformation kernel matrix (m≤16) to an 8×8 region, 48 data points are input, and m coefficients are generated. When m is 16, 48 data points are input, and 16 coefficients are generated. That is, assuming 48 data points form a 48×1 vector, the 16×48 matrix and the 48×1 vector are multiplied sequentially, thereby generating a 16×1 vector. Here, the 48 data points forming the 8×8 region can be appropriately arranged to form a 48×1 vector. For example, a 48×1 vector can be constructed based on 48 data points constituting the region other than the lower right 4×4 region within the 8×8 region. Here, when matrix operations are performed by applying a maximum 16×48 transformation kernel matrix, 16 modified transformation coefficients are generated. These 16 modified transformation coefficients can be arranged in the upper left 4×4 region according to the scan order, and the upper right 4×4 region and the lower left 4×4 region can be filled with zeros.

[0190] For the inverse transform in the decoding process, the transpose of the aforementioned transform kernel matrix can be used. That is, when performing inverse RST or LFNST during the inverse transform performed by the decoding device, the input coefficient data for applying inverse RST is arranged in a one-dimensional vector according to a predetermined arrangement order, and the modified coefficient vector obtained by multiplying the one-dimensional vector with the corresponding inverse RST matrix to the left of the one-dimensional vector is arranged in a two-dimensional block according to a predetermined arrangement order.

[0191] In summary, during the transformation process, when RST or LFNST is applied to an 8×8 region, matrix operations are performed on the 48 transformation coefficients in the upper left, upper right, and lower left regions of the 8×8 region (excluding the lower right region) with a 16×48 transformation kernel matrix. For matrix operations, the 48 transformation coefficients are input as a one-dimensional array. When performing matrix operations, 16 modified transformation coefficients are derived, and these modified coefficients can be arranged in the upper left region of the 8×8 region.

[0192] Conversely, in the inverse transform process, when the inverse RST or LFNST is applied to an 8×8 region, the 16 transform coefficients corresponding to the upper left region of the 8×8 region can be input as a one-dimensional array according to the scan order, and matrix operations can be performed with a 48×16 transform kernel matrix. That is, the matrix operation can be expressed as (48×16 matrix) * (16×1 transform coefficient vector) = (48×1 modified transform coefficient vector). Here, an n×1 vector can be interpreted as having the same meaning as an n×1 matrix, and therefore can be represented as an n×1 column vector. Furthermore, * denotes matrix multiplication. When performing matrix operations, 48 ​​modified transform coefficients can be derived, and these 48 modified transform coefficients can be arranged in the upper left, upper right, and lower left regions of the 8×8 region, excluding the lower right region.

[0193] When the inverse quadratic transform is based on the Regression-Simplified Transform (RST), the inverse transformer 235 of the encoding device 200 and the inverse transformer 322 of the decoding device 300 may include an inverse reduced quadratic transformer for deriving modified transform coefficients based on the inverse RST of the transform coefficients, and an inverse first-order transformer for deriving residual samples of the target block based on the inverse first-order transform of the modified transform coefficients. The inverse first-order transform refers to the inverse transform applied to the first-order transform of the residuals. In this disclosure, deriving transform coefficients based on a transform may refer to deriving transform coefficients by applying a transform.

[0194] The non-separate transform (LFNST) described above will be described in detail below. LFNST may include a forward transform performed by the encoding device and an inverse transform performed by the decoding device.

[0195] The encoding device receives the result (or part of the result) derived after applying a first (core) transform as input and applies a forward second transform (second transform).

[0196] [Formula 9]

[0197] y = G T x

[0198] In Equation 9, x and y are the input and output of the quadratic transformation, respectively, and G is the matrix representing the quadratic transformation, with the transformation basis vectors consisting of column vectors. In the case of inverse LFNST, when the dimension of the transformation matrix G is expressed as [number of rows × number of columns], in the case of forward LFNST, the transpose of matrix G becomes G... T Dimensions.

[0199] For the inverse LFNST, the dimensions of matrix G are [48×16], [48×8], [16×16], [16×8], and the [48×8] matrix and the [16×8] matrix are partial matrices of the eight transformed basis vectors sampled from the left side of the [48×16] matrix and the [16×16] matrix, respectively.

[0200] On the other hand, for a positive LFNST, matrix G T The dimensions are [16×48], [8×48], [16×16], and [8×16], and the [8×48] matrix and the [8×16] matrix are partial matrices obtained by sampling 8 transformation basis vectors from the upper part of the [16×48] matrix and the [16×16] matrix, respectively.

[0201] Therefore, in the case of forward LFNST, a [48×1] vector or a [16×1] vector can be used as input x, and a [16×1] vector or an [8×1] vector can be used as output y. In video encoding and decoding, the output of the forward first transform is two-dimensional (2D) data. Therefore, in order to construct a [48×1] vector or a [16×1] vector as input x, it is necessary to construct a one-dimensional vector by properly arranging the 2D data as the output of the forward transform.

[0202] Figure 8 This is a diagram illustrating the sequence of output data from a forward first transformation arranged into a one-dimensional vector, based on the example. Figure 8 The left figures of (a) and (b) show the order used to construct the [48×1] vector, and Figure 8 The right figures in (a) and (b) illustrate the order used to construct the [16×1] vector. In the case of LFNST, this can be achieved by pressing 2D data in a specific order. Figure 8 Arrange the same order in (a) and (b) to obtain a one-dimensional vector x.

[0203] The orientation of the output data for the forward first transform can be determined based on the intra-prediction mode of the current block. For example, when the intra-prediction mode of the current block is horizontal relative to the diagonal direction, the orientation can be determined by... Figure 8 The output data of the forward first transform are arranged in the order of (a), and when the intra-prediction mode of the current block is perpendicular to the diagonal direction, it can be arranged according to... Figure 8 The output data of the first forward transformation are arranged in the order of (b).

[0204] Based on the example, different methods can be applied. Figure 8 The arrangement order of (a) and (b), and for derivation and application Figure 8The arrangement order of (a) and (b) results in the same outcome (y vector), and the column vectors of matrix G can be rearranged according to the arrangement order. That is, the column vectors of G can be rearranged such that each element constituting the x vector is always multiplied by the same transformation basis vector.

[0205] Since the output y derived by Equation 9 is a one-dimensional vector, when two-dimensional data is required as input data in the process of using the result of the forward quadratic transform as input (e.g., in the process of performing quantization or residual coding), the output y vector of Equation 9 needs to be properly arranged as 2D data again.

[0206] Figure 9 This is a diagram illustrating a sequence of two-dimensional blocks arranged according to an example of the output data of a forward quadratic transform.

[0207] In the case of LFNST, the output values ​​can be arranged in 2D blocks according to a predetermined scan order. Figure 9 (a) shows how the output values ​​are arranged at 16 positions in a 2D block according to the diagonal scan order when the output y is a [16×1] vector. Figure 9 (b) shows that when the output y is an [8×1] vector, the output values ​​are arranged in 8 positions of the 2D block according to the diagonal scan order, and the remaining 8 positions are filled with zeros. Figure 9 In (b), X indicates that it is filled with zeros.

[0208] According to another example, since the order in which the output vector y is processed during quantization or residual coding can be preset, the output vector y does not need to be arranged as shown in the example. Figure 9 In the 2D block shown. However, in the case of residual coding, data encoding can be performed in 2D block (e.g., 4×4) cells (e.g., CG (coefficient group)), and in this case, according to as Figure 9 The data is arranged in a specific order within the diagonal scanning sequence.

[0209] Furthermore, the decoding device can configure the one-dimensional input vector y by arranging the two-dimensional data output from the dequantization process according to a preset scan order used for the inverse transform. The input vector y can be output as the output vector x using the following formula.

[0210] [Formula 10]

[0211] x = Gy

[0212] In the case of inverse LFNST, the output vector x can be derived by multiplying the input vector y, which is a [16×1] vector or an [8×1] vector, by the G matrix. For inverse LFNST, the output vector x can be a [48×1] vector or a [16×1] vector.

[0213] The output vector x is based on Figure 8 The sequence shown is arranged in a two-dimensional block and is arranged as two-dimensional data, which becomes the input data (or part of the input data) for the inverse first transformation.

[0214] Therefore, the inverse quadratic transform is the opposite of the forward quadratic transform process in general, and in the case of the inverse transform, unlike in the forward direction, the inverse quadratic transform is applied first, followed by the inverse first transform.

[0215] In the inverse LFNST, one of eight [48×16] matrices and eight [16×16] matrices can be chosen as the transformation matrix G. Whether to apply the [48×16] matrix or the [16×16] matrix depends on the size and shape of the block.

[0216] Additionally, eight matrices can be derived from the four transform sets shown in Table 2 above, and each transform set can consist of two matrices. The choice of which of the four transform sets to use is determined based on the intra-prediction mode, and more specifically, based on the values ​​of the intra-prediction mode extended by taking into account wide-angle intra-prediction (WAIP). The selection of which matrix to use from the two matrices constituting the chosen transform set is derived via index signaling. More specifically, 0, 1, and 2 can be used as index values ​​for transmission; 0 can indicate that LFNST is not applied, and 1 and 2 can indicate either of the two transform matrices constituting the transform set selected based on the intra-prediction mode values.

[0217] Figure 10 This is a diagram illustrating a wide-angle intra-frame prediction mode according to an implementation of this document.

[0218] Typical intra-prediction mode values ​​can have values ​​from 0 to 66 and from 81 to 83, and intra-prediction mode values ​​extended due to WAIP can have values ​​from -14 to 83 as shown. Values ​​from 81 to 83 indicate CCLM (Cross-Component Linear Model) mode, and values ​​from -14 to -1 and from 67 to 80 indicate intra-prediction mode extended due to WAIP application.

[0219] When the width of the current prediction block is greater than its height, the top reference pixel is typically closer to the interior of the block to be predicted. Therefore, prediction in the lower left direction is more accurate than prediction in the upper right direction. Conversely, when the height of the block is greater than its width, the left reference pixel is typically closer to the interior of the block to be predicted. Therefore, prediction in the upper right direction is more accurate than prediction in the lower left direction. Thus, applying remapping (i.e., mode index modification) to the index of the wide-angle intra-frame prediction mode can be advantageous.

[0220] When wide-angle intra-prediction is applied, information about existing intra-prediction patterns can be signaled, and after the information is parsed, it can be remapped to the index of the wide-angle intra-prediction pattern. Therefore, the total number of intra-prediction patterns used for a specific block (e.g., a non-square block of a specific size) can remain unchanged; that is, the total number of intra-prediction patterns is 67, and the encoding of the intra-prediction patterns used for a specific block can remain unchanged.

[0221] Table 3 below illustrates the process of deriving the modified intra-frame mode by remapping the intra-frame prediction mode to the wide-angle intra-frame prediction mode.

[0222] [Table 3]

[0223]

[0224] In Table 3, the extended intra-prediction mode values ​​are ultimately stored in the `predModeIntra` variable, and `ISP_NO_SPLIT` indicates that the CU block is not divided into sub-partitions using the intra-segmentation (ISP) technique currently used in the VVC standard. The `cIdx` variable values ​​of 0, 1, and 2 indicate the cases for the luma, Cb, and Cr components, respectively. The `log2` function shown in Table 3 returns a log value with a base of 2, and the `Abs` function returns the absolute value.

[0225] The variable `predModeIntra`, which indicates the intra-prediction mode, along with the height and width of the transform block, are used as input values ​​for the wide-angle intra-prediction mode mapping process, and the output value is the modified intra-prediction mode `predModeIntra`. The height and width of the transform block or coded block can be the height and width of the current block used for intra-prediction mode remapping. In this case, the variable `whRatio`, which reflects the width-to-width ratio, can be set to `Abs(Log2(nW / nH))`.

[0226] For non-square blocks, the intra-prediction mode can be divided into two cases and modified accordingly.

[0227] First, if all conditions (1) to (3) are met, (1) the width of the current block is greater than its height, (2) the intra-prediction mode before modification is equal to or greater than 2, and (3) the intra-prediction mode is less than the value derived as (8+2*whRatio) when the variable whRatio is greater than 1 and less than 8 when the variable whRatio is less than or equal to 1 (predModeIntra is less than (whRatio>1)?(8+2*whRatio):8), then the intra-prediction mode is set to a value 65 greater than predModeIntra [predModeIntra is set to be equal to (predModeIntra+65)].

[0228] If the above is different, that is, if conditions (1) to (3) are satisfied, (1) the height of the current block is greater than the width, (2) the intra-prediction mode before modification is less than or equal to 66, and (3) the intra-prediction mode is greater than the value derived as (60-2*whRatio) when whRatio is greater than 1 and greater than 60 when whRatio is less than or equal to 1 (predModeIntra is greater than (whRatio>1)?(60-2*whRatio):60), then the intra-prediction mode is set to a value 67 smaller than predModeIntra [predModeIntra is set to be equal to (predModeIntra-67)].

[0229] Table 2 above illustrates how to select the transform set in LFNST based on intra-prediction mode values ​​extended by WAIP. For example... Figure 10 As shown, modes 14 to 33 and modes 35 to 80 are symmetrical about the prediction directions around mode 34. For example, modes 14 and 54 are symmetrical about the direction corresponding to mode 34. Therefore, the same set of transformations is applied to modes located in mutually symmetrical directions, and this symmetry is also reflected in Table 2.

[0230] Furthermore, it is assumed that the positive LFNST input data of mode 54 is symmetrical to the positive LFNST input data of mode 14. For example, for modes 14 and 54, according to Figure 8 (a) and Figure 8 The arrangement shown in (b) rearranges the two-dimensional data into one-dimensional data. Furthermore, it can be seen that... Figure 8 (a) and Figure 8 The pattern in the sequence shown in (b) is symmetrical about the direction indicated by pattern 34 (diagonal direction).

[0231] Furthermore, as mentioned above, the size and shape of the target block determine which transformation matrix, either the [48×16] matrix or the [16×16] matrix, will be applied to the LFNST.

[0232] Figure 11 This is a diagram illustrating the block shape to which LFNST is applied. Figure 11 (a) shows a 4×4 block. Figure 11 (b) shows 4×8 blocks and 8×4 blocks. Figure 11 (c) shows a 4×N block or an N×4 block, where N is 16 or greater. Figure 11 (d) shows an 8×8 block. Figure 11 (e) shows an M×N block, where M≥8, N≥8 and N>8 or M>8.

[0233] exist Figure 11In the diagram, blocks with thick boundaries indicate the area where LFNST is applied. For Figure 11 For blocks (a) and (b), LFNST is applied to the top-left 4×4 region, and for Figure 11 Block (c) is individually applied to two consecutively arranged top-left 4×4 regions. Figure 11 In (a), (b), and (c), since the LFNST is applied in units of 4×4 regions, this LFNST will be referred to as "4×4 LFNST" in the following text. Based on the matrix dimension of G, a [16×16] or [16×8] matrix can be applied.

[0234] More specifically, a [16×8] matrix is ​​applied to Figure 11 (a) consists of 4×4 blocks (4×4TU or 4×4CU), and a [16×16] matrix is ​​applied to it. Figure 11 The blocks in (b) and (c) are used to adjust the worst-case computational complexity to 8 multiplications per sample.

[0235] about Figure 11 In (d) and (e), LFNST is applied to the top-left 8×8 region, and this LFNST is referred to as "8×8 LFNST" below. As the corresponding transformation matrix, a [48×16] matrix or a [48×8] matrix can be applied. In the case of the forward LFNST, since the [48×1] vector (the X vector in Equation 9) is input as input data, not all sample values ​​from the top-left 8×8 region are used as input values ​​for the forward LFNST. That is, if... Figure 8 The left-hand order of (a) or Figure 8 As can be seen from the left-hand order of (b), the [48×1] vector can be constructed based on the samples belonging to the other three 4×4 blocks while leaving the bottom right 4×4 block as is.

[0236] A [48×8] matrix can be applied to Figure 11 The 8×8 blocks (8×8TU or 8×8CU) in (d) and the [48×16] matrix can be applied Figure 11 The 8×8 blocks in (e). This is also to adjust the worst-case computational complexity to 8 multiplications per sample.

[0237] Depending on the block shape, when the corresponding forward LFNST (4×4 or 8×8 LFNST) is applied, 8 or 16 output data (the Y vector in Equation 9, [8×1] or [16×1] vectors) are generated. In the forward LFNST, due to matrix G... T Due to its characteristic, the amount of output data is equal to or less than the amount of input data.

[0238] Figure 12 This is a diagram illustrating the arrangement of the output data of the forward LFNST according to an example, and showing the blocks in which the output data of the forward LFNST is arranged according to the block shape.

[0239] exist Figure 12 The shaded area in the upper left corner of the block shown corresponds to the region where the output data of the forward LFNST is located. The positions marked with 0 indicate samples filled with a value of 0, and the remaining areas represent regions that were not altered by the forward LFNST. In regions not altered by LFNST, the output data of the first forward transform remains unchanged.

[0240] As mentioned above, since the size of the applied transformation matrix varies depending on the shape of the block, the amount of output data also varies. Figure 12 The output data of a forward LFNST may not completely fill the top-left 4×4 block. Figure 12 In cases (a) and (d), the [16×8] matrix and the A[48×8] matrix are applied to the block indicated by the thick line or a portion of the area inside the block, respectively, and an [8×1] vector is generated as the output of the positive LFNST. That is, according to Figure 8 The scan order shown in (b) can fill only 8 output data, such as Figure 12 As shown in (a) and (d), zeros can be filled in the remaining 8 positions. Figure 11 In the case of (d) of the LFNST application block, such as Figure 12 As shown in (d), the two 4×4 blocks adjacent to the top-left 4×4 block, the top-right and bottom-left blocks, are also filled with the value 0.

[0241] As described above, essentially, by signaling the LFNST index, it is specified whether to apply LFNST and the transformation matrix to be applied. Figure 12 As shown, when LFNST is applied, since the number of output data of the positive LFNST can be equal to or less than the number of input data, the following area filled with zero values ​​appears.

[0242] 1) such as Figure 12 As shown in (a), the samples are from the eighth position and the subsequent positions in the scanning order of the top left 4×4 block, that is, from the ninth to the sixteenth position.

[0243] 2) such as Figure 12 As shown in (d) and (e), when applying a [48×16] matrix or a [48×8] matrix, the two 4×4 blocks adjacent to the top left 4×4 block or the second and third 4×4 blocks in the scan order.

[0244] Therefore, if non-zero data is found in regions 1) and 2), it is determined that LFNST has not been applied, so the signaling for the corresponding LFNST index can be omitted.

[0245] Based on the example, such as in the case of LFNST used in the VVC standard, since the signaling for the LFNST index is executed after residual coding, the encoding device can know from the residual coding whether non-zero data (valid coefficients) exists at all locations within the TU or CU block. Therefore, the encoding device can determine whether to execute signaling regarding the LFNST index based on the presence of non-zero data, and the decoding device can determine whether to parse the LFNST index. The signaling for the LFNST index is executed when non-zero data does not exist in the areas specified in 1) and 2) above.

[0246] Because truncated unary codes are used as the binarization method for the LFNST index, the LFNST index consists of up to two bins, and 0, 10, and 11 are assigned as binary codes for possible LFNST index values ​​0, 1, and 2, respectively. In the current case of LFNST used for VVC, context-based CABAC encoding is applied to the first bin (regular encoding), and bypass encoding is applied to the second bin. The total number of contexts in the first bin is 2. When (DCT-2, DCT-2) is applied as a single transform pair for the horizontal and vertical directions, and the luma and chroma components are encoded in a dual-tree type, one context is assigned, and the other context is applied for the rest. The encoding of the LFNST index is shown in the table below.

[0247] [Table 4]

[0248]

[0249] In addition, the following simplification method can be applied to the LFNST used.

[0250] (i) As shown in the example, the number of output data for a positive LFNST can be limited to a maximum of 16.

[0251] exist Figure 11 In case (c), 4×4 LFNST can be applied to two adjacent 4×4 regions to the upper left, and in this case, a maximum of 32 LFNST output data can be generated. When the number of output data for the forward LFNST is limited to a maximum of 16, in the case of 4×N / N×4 (N≥16) blocks (TU or CU), 4×4 LFNST is applied only to one 4×4 region to the upper left, and LFNST can be applied only to... Figure 11 All blocks are processed at once. This simplifies the implementation of image encoding.

[0252] Figure 13 The example shows that the amount of output data for a positive LFNST is limited to a maximum of 16. Figure 13 When LFNST is applied to the top left 4×4 region of a 4×N or N×4 block (where N is 16 or greater), the output data of the forward LFNST becomes 16.

[0253] (ii) As in the example, zeroing can be additionally applied to regions where LFNST has not been applied. In this document, zeroing can mean filling all positions belonging to a specific region with a value of 0. That is, zeroing can be applied to regions that have not changed due to LFNST and maintain the result of a positive first transformation. As mentioned above, since LFNST is divided into 4×4 LFNST and 8×8 LFNST, zeroing can be divided into two types as follows ((ii)-(A) and (ii)-(B)).

[0254] (ii)-(A) When 4×4LFNST is applied, the area where 4×4LFNST is not applied can be zeroed. Figure 14 This is a diagram illustrating the zeroing process in a block using 4×4LFNST, based on the example.

[0255] like Figure 14 As shown, regarding the block that applied 4×4LFNST, that is, for Figure 12 All blocks in (a), (b) and (c) where LFNST is not applied can be filled with zeros.

[0256] on the other hand, Figure 14 (d) shows that when the maximum number of output data for the positive LFNST is limited to 16 (e.g. Figure 13 When (as shown), zero out the remaining blocks that have not applied 4×4LFNST.

[0257] (ii)-(B) When 8×8LFNST is applied, areas where 8×8LFNST is not applied can be zeroed. Figure 15 This is a diagram illustrating the zeroing process in a block of 8×8 LFNST, based on the example.

[0258] like Figure 15 As shown, regarding the application of 8×8LFNST, that is, for Figure 12 In all blocks in (d) and (e), the entire region where LFNST is not applied can be filled with zeros.

[0259] (iii) Due to the zeroing presented in (ii) above, the zero-filled area may not be the same as when LFNST was applied. Therefore, it can be determined by comparison. Figure 12In the case of LFNST, a wider area is used to perform zeroing as proposed in (ii) to check for the presence of non-zero data.

[0260] For example, when (ii)-(B) is applied, in the examination Figure 12 After checking whether there is non-zero data in the zero-filled regions in (d) and (e), additional checks are performed. Figure 15 The presence of non-zero data in the region filled with 0s can be checked, and signaling for the LFNST index can be executed only if no non-zero data exists.

[0261] Of course, even with the zeroing proposed in application (ii), the existence of non-zero data can be checked in the same way as existing LFNST index signaling. That is, when checking... Figure 12 After determining whether non-zero data exists within the zero-padded block, LFNST index signaling can be applied. In this case, the encoding device only performs zeroing and the decoding device does not assume zeroing; that is, it only checks whether non-zero data exists within the zero-padded block. Figure 12 In regions explicitly marked as 0, LFNST index resolution can be performed.

[0262] Alternatively, according to another example, the following can be performed: Figure 16 The reset is shown. Figure 16 This is a diagram illustrating the zeroing of a block in an 8×8 LFNST application, based on another example.

[0263] like Figure 14 and Figure 15 As shown, zeroing can be applied to all areas except the area where LFNST is applied, or it can be applied only to a local area, such as... Figure 16 As shown. Zeroing only applies to items other than... Figure 16 Zeroing the area outside the top-left 8×8 area does not apply to the bottom-right 4×4 block within the top-left 8×8 area.

[0264] Various implementations of the simplified methods for applying LFNST (combinations of (i), (ii)-(A), (ii)-(B), (iii)) can be derived. Of course, the combinations of the above simplified methods are not limited to the following implementations, and any combination can be applied to LFNST.

[0265] Implementation

[0266] - Limit the number of output data for the positive LFNST to a maximum of 16 → (i)

[0267] - When 4×4LFNST is applied, all areas where 4×4LFNST is not applied are zeroed → (II)-(A)

[0268] - When 8×8LFNST is applied, all areas where 8×8LFNST is not applied are zeroed → (II)-(B)

[0269] - After checking whether non-zero data also exists in existing areas filled with zero values ​​and areas filled with zero due to additional clearing ((ii)-(A), (ii)-(B)), signal the LFNST index → ​​(iii) only if no non-zero data exists.

[0270] In the implementation scenario, when LFNST is applied, the region containing non-zero output data is limited to the upper left 4×4 region. More specifically, in Figure 14 (a) and Figure 15 In case (a), the eighth position in the scan order is the last position where non-zero data can exist. Figure 14 (b) and (c) and Figure 15 In case (b), the sixteenth position in the scan order (i.e., the position at the bottom right edge of the top-left 4×4 block) is the last position in which data other than 0 can exist.

[0271] Therefore, after applying LFNST, and after checking whether non-zero data exists at a position that is not allowed in the residual encoding process (at a position beyond the last position), it can be determined whether to signal the LFNST index.

[0272] In the case of the zeroing method proposed in (ii), the computational cost required to perform the entire transformation process can be reduced because of the amount of data ultimately generated when both the first transformation and LFNST are applied. That is, when LFNST is applied, since zeroing is applied to regions where the output data of the forward first transformation exists but LFNST is not applied, it is not necessary to generate data for regions that are zeroed during the forward first transformation. Therefore, the computational cost required to generate the corresponding data can be reduced. The additional effects of the zeroing method proposed in (ii) are summarized below.

[0273] First, as mentioned above, reduce the amount of computation required to perform the entire transformation process.

[0274] In particular, when (ii)-(B) is applied, the worst-case computational cost is reduced, making the transformation process lighter. In other words, generally, a large amount of computation is required to perform a single transformation of a large size. By applying (ii)-(B), the amount of data derived as a result of performing a forward LFNST can be reduced to 16 or less. Furthermore, as the size of the entire block (TU or CU) increases, the effect of reducing the number of transformation operations further increases.

[0275] Secondly, it can reduce the amount of computation required for the entire transformation process, thereby reducing the power consumption required to perform the transformation.

[0276] Third, it reduces the delay involved in the transformation process.

[0277] Secondary transforms, such as LFNST, add computational complexity to existing primary transforms, thus increasing the overall latency involved in performing the transform. Specifically, in the case of intra-frame prediction, the increased latency due to secondary transforms during encoding leads to an increase in latency until reconstruction because reconstructed data from adjacent blocks is used during prediction. This can result in an increase in the overall latency of intra-frame predictive coding.

[0278] However, if the zeroing proposed in application (ii) is applied, the delay time for performing a single transformation can be greatly reduced when LFNST is applied, maintaining or reducing the delay time of the entire transformation, making it easier to implement the encoding device.

[0279] In traditional intra-frame prediction, the block to be encoded is treated as a single coding unit and encoding is performed without segmentation. However, Intra-Frame Sub-Partition (ISP) coding means performing intra-frame prediction coding by dividing the block to be encoded horizontally or vertically. In this case, reconstructed blocks can be generated by performing encoding / decoding on a block-by-block basis, and the reconstructed blocks can be used as reference blocks for the next block. According to implementations, in ISP coding, a coding block can be divided into two or four sub-blocks and encoded, and in ISP, within a sub-block, intra-frame prediction is performed with reference to the reconstructed pixel values ​​of the adjacent left or upper sub-block. Hereinafter, "encoding" can be used as a concept encompassing both encoding performed by an encoding device and decoding performed by a decoding device.

[0280] Table 5 shows the number of sub-blocks divided according to the block size when applying ISP, and the sub-partitions divided according to ISP can be called transform blocks (TU).

[0281] [Table 5]

[0282] Block size (CU) Number of divisions 4×4 Unavailable 4×8、8×4 2 All other cases 4

[0283] ISP divides blocks within a predicted luma frame into two or four sub-partitions in the vertical or horizontal direction, based on the block size. For example, the minimum block size that can be applied to ISP is 4×8 or 8×4. When the block size is larger than 4×8 or 8×4, the block is divided into 4 sub-partitions.

[0284] Figure 17 and Figure 18 An example of how a coded block is divided into sub-blocks is given, and more specifically, Figure 17Examples of coded block (width (W) × height (H)) divisions of 4×8 blocks or 8×4 blocks are shown, and Figure 18 Examples of partitioning are shown for cases where the coded block is not a 4×8 block, 8×4 block, or 4×4 block.

[0285] When applying ISP, sub-blocks are encoded sequentially from left to right or top to bottom (e.g., horizontally or vertically) according to the partitioning type. After reconstruction processing via inverse transform and intra-prediction for one sub-block, the encoding of the next sub-block can be performed. For the leftmost or topmost sub-block, reconstructed pixels of already encoded blocks are referenced, as in conventional intra-prediction methods. Furthermore, when each side of a subsequent internal sub-block is not adjacent to the previous sub-block, reconstructed pixels of already encoded adjacent blocks are referenced to derive reference pixels adjacent to the corresponding side, as in conventional intra-prediction methods.

[0286] In ISP coding mode, all sub-blocks can be encoded using the same intra-prediction mode, and signals can be sent indicating whether ISP coding is used and whether the sub-blocks are divided in the direction (horizontal or vertical). For example... Figure 17 and Figure 18 As shown, the number of sub-blocks can be adjusted to 2 or 4 depending on the shape of the block. When the size (width × height) of a sub-block is less than 16, it can be restricted so that it is not allowed to be divided into corresponding sub-blocks or the ISP encoding itself is not applied.

[0287] In ISP prediction mode, a coding unit is divided into two or four partition blocks (i.e., sub-blocks) and prediction is performed, and the same intra-prediction mode is applied to the two or four partition blocks.

[0288] As described above, in terms of partitioning direction, both the horizontal direction (when M×N coding units with horizontal and vertical lengths of M and N are partitioned horizontally, if an M×N coding unit is divided into two, then the M×N coding unit is divided into M×(N / 2) blocks, and if an M×N coding unit is divided into four blocks, then the M×N coding unit is divided into M×(N / 4) blocks) and the vertical direction (when M×N coding units are partitioned vertically, if an M×N coding unit is divided into two, then the M×N coding unit is divided into (M / 2)×N blocks, and if an M×N coding unit is divided into four, then the M×N coding unit is divided into (M / 4)×N blocks) are possible. When M×N coding units are partitioned horizontally, the partition blocks are encoded in a top-to-bottom order, and when M×N coding units are partitioned vertically, the partition blocks are encoded in a left-to-right order. In the case of horizontal (vertical) division, the reconstructed pixel values ​​of the upper (left) partition can be referenced to predict the current encoded partition.

[0289] Transforms can be applied to residual signals generated in blocks using the ISP prediction method. Multiple transform selection (MTS) techniques based on the DST-7 / DCT-8 combination and the existing DCT-2 can be applied to a forward-based single transform (core transform), and a forward low-frequency non-separable transform (LFNST) can be applied to the transform coefficients generated from the single transform to generate the final modified transform coefficients.

[0290] In other words, LFNST can be applied to partitions divided by applying the ISP prediction mode, and the same intra-prediction mode is applied to the partitioned partitions, as described above. Therefore, when selecting an LFNST set derived based on the intra-prediction mode, the derived LFNST set can be applied to all partitions. That is, because the same intra-prediction mode is applied to all partitions, the same LFNST set can be applied to all partitions.

[0291] According to the implementation, LFNST can be applied only to transform blocks with both horizontal and vertical lengths of 4 or greater. Therefore, when the horizontal or vertical length of a partition block divided according to the ISP prediction method is less than 4, LFNST is not applied and no LFNST index is signaled. Furthermore, when applying LFNST to each partition block, the corresponding partition block can be considered as a transform block. When the ISP prediction method is not applied, LFNST can be applied to the coded block.

[0292] The method of applying LFNST to each partition block will be described in detail.

[0293] According to the implementation method, after applying the forward LFNST to each partition block, only a maximum of 16 (8 or 16) coefficients are left in the upper left 4×4 region in the order of scanning the transform coefficients, and then zeroing can be applied, in which the remaining positions and regions are all filled with 0.

[0294] Alternatively, according to the implementation, when the length of one side of the partition block is 4, LFNST is applied only to the upper left 4×4 region, and when the length (i.e., width and height) of all sides of the partition block is 8 or greater, LFNST can be applied to the remaining 48 coefficients in the upper left 8×8 region, excluding the lower right 4×4 region.

[0295] Alternatively, according to the implementation method, in order to adjust the worst-case computational complexity to 8 multiplications per sample, when each partition is 4×4 or 8×8, only 8 transformation coefficients can be output after applying the forward LFNST. That is, when the partition is 4×4, an 8×16 matrix can be used as the transformation matrix, and when the partition is 8×8, an 8×48 matrix can be used as the transformation matrix.

[0296] In the current VVC standard, LFNST index signaling is executed on a unit-by-unit basis. Therefore, in ISP prediction mode, and when LFNST is applied to all partition blocks, the same LFNST index value can be applied to the corresponding partition block. That is, when an LFNST index value is sent once at the unit-level, the corresponding LFNST index can be applied to all partition blocks within that unit. As mentioned above, the LFNST index value can have values ​​of 0, 1, and 2, where 0 indicates no LFNST application, and 1 and 2 represent two transform matrices existing in a set of LFNSTs when LFNST is applied.

[0297] As mentioned above, the LFNST set is determined by the intra-prediction mode, and in the case of ISP prediction mode, since all partition blocks in the coding unit are predicted in the same intra-prediction mode, the partition blocks can refer to the same LFNST set.

[0298] As another example, LFNST index signaling is still performed on a unit-by-unit basis. However, in ISP predictive mode, it is uncertain whether LFNST is applied uniformly to all blocks. For each block, the application of the LFNST index value signaled at the unit-by-unit level and the application of LFNST can be determined by separate conditions. Here, separate conditions can be signaled via a bitstream in the form of flags for each block. When the flag value is 1, the LFNST index value signaled at the unit-by-unit level is applied, and when the flag value is 0, LFNST may not be applied.

[0299] In an encoding unit that uses ISP mode, an example of applying LFNST when the length of one side of the partition block is less than 4 is described below.

[0300] First, when the size of the partition block is N×2 (2×N), LFNST can be applied to the upper left M×2 (2×M) region (where M≤N). For example, when M=8, the upper left region becomes 8×2 (2×8), so the region with 16 residual signals can be the input of the forward LFNST, and an R×16 (R≤16) forward transformation matrix can be applied.

[0301] Here, the forward LFNST matrix can be a separate, additional matrix besides those included in the current VVC standard. Furthermore, for worst-case complexity control, an 8×16 matrix, where only the top 8 rows of the 16×16 matrix are sampled, can be used for the transformation. The complexity control method will be described in detail later.

[0302] Secondly, when the size of the partition block is N×1 (1×N), LFNST can be applied to the upper left M×1 (1×M) region (where M≤N). For example, when M=16, the upper left region becomes 16×1 (1×16), so the region with 16 residual signals can be the input of the forward LFNST, and an R×16 (R≤16) forward transformation matrix can be applied.

[0303] Here, the corresponding forward LFNST matrix can be a separate additional matrix besides those included in the current VVC standard. Furthermore, to control worst-case complexity, an 8×16 matrix, where only the top 8 rows of the 16×16 matrix are sampled, can be used for the transformation. The complexity control method will be described in detail later.

[0304] The first and second embodiments can be applied simultaneously, or either one of the two embodiments can be applied. Specifically, in the case of the second embodiment, because a transformation is considered in the LFNST, experiments have shown that the compression performance improvement obtainable in the existing LFNST is relatively small compared to the LFNST index signaling cost. However, in the case of the first embodiment, a compression performance improvement similar to that obtainable from the conventional LFNST is observed. That is, in the case of ISP, the contribution of applying 2×N and N×2 LFNSTs to the actual compression performance can be observed experimentally.

[0305] In the current VVC's LFNST, symmetry is applied between intra-prediction modes. The same set of LFNSTs is applied to two directional modes set around mode 34 (prediction in the 45-degree diagonal direction at the bottom right), for example, the same set of LFNSTs is applied to mode 18 (horizontal prediction mode) and mode 50 (vertical prediction mode). However, in modes 35 through 66, when a forward LFNST is applied, the input data is transposed before the LFNST is applied.

[0306] VVC supports Wide-Angle Intra-Prediction (WAIP) mode. Considering WAIP mode, the LFNST set is derived based on the modified intra-prediction mode. For modes extended from WAIP, the LFNST set is determined using symmetry, just as in general intra-prediction directional modes. For example, because mode-1 is symmetric to mode 67, the same LFNST set is applied, and because mode-14 is symmetric to mode 80, the same LFNST set is applied. Modes 67 through 80 apply the LFNST transform after transposing the input data before applying the forward LFNST.

[0307] When applying LFNST to the top-left M×2 (M×1) block, symmetry with respect to LFNST cannot be applied because the block to which LFNST is applied is not square. Therefore, instead of applying symmetry based on intra-prediction mode, as shown in Table 2 for LFNST, symmetry between M×2 (M×1) and 2×M (1×M) blocks can be applied.

[0308] Figure 19 This is a diagram illustrating the symmetry between an M×2 (M×1) block and a 2×M (1×M) block according to an embodiment.

[0309] like Figure 19 As shown, since pattern 2 in the M×2 (M×1) block can be considered symmetric to pattern 66 in the 2×M (1×M) block, the same LFNST set can be applied to both the 2×M (1×M) block and the M×2 (M×1) block.

[0310] In this case, in order to apply the LFNST set applied to the M×2 (M×1) block to the 2×M (1×M) block, the LFNST set is selected based on mode 2 instead of mode 66. That is, the LFNST can be applied after transposing the input data of the 2×M (1×M) block, before applying the forward LFNST.

[0311] Figure 20 This is a diagram illustrating an example of transposing a 2×M block according to an implementation method.

[0312] Figure 20 (a) is a diagram illustrating how LFNST can be applied by reading 2×M blocks of input data in column-major order. Figure 20 (b) is a diagram illustrating how LFNST can be applied by reading the input data of an M×2 (M×1) block in row-major order. The method for applying LFNST to the top-left M×2 (M×1) or 2×M (M×1) block is described below.

[0313] 1. First, as Figure 20 As shown in (a) and (b), the input data is arranged into an input vector that constitutes a positive LFNST. For example, refer to Figure 19 For an M×2 block predicted using mode 2, follow Figure 20 In the order of (b), for a 2×M block predicted in pattern 66, the input data is in the following order: Figure 20 The sequential arrangement of (a) can then be applied to LFNST set for mode 2.

[0314] 2. For an M×2 (M×1) block, considering WAIP, the LFNST set is determined based on the modified intra-prediction mode. As mentioned above, a preset mapping relationship is established between the intra-prediction mode and the LFNST set, which can be represented by the mapping table shown in Table 2.

[0315] For a 2×M (1×M) block, taking into account WAIP, a symmetric mode around the prediction mode (mode 34 in the case of the VVC standard) can be obtained from the modified intra-prediction mode, moving downwards along a 45-degree diagonal. The LFNST set is then determined based on the corresponding symmetric mode and the mapping table. The symmetric mode (y) around mode 34 can be derived using the following formula. The mapping table will be described in more detail below.

[0316] [Equation 11]

[0317] If 2 ≤ x ≤ 66, then y = 68 - x.

[0318] Otherwise (x≤-1 or x≥67), y=66-x

[0319] 3. When applying forward LFNST, the transform coefficients can be derived by multiplying the input data prepared in process 1 by the LFNST kernel. The LFNST kernel can be selected based on the LFNST set determined in process 2 and the predetermined LFNST index.

[0320] For example, when M=8 and a 16×16 matrix is ​​used as the LFNST kernel, 16 transform coefficients can be generated by multiplying the matrix by 16 input data. The generated transform coefficients can be arranged in the upper left 8×2 or 2×8 region according to the scan order used in the VVC standard.

[0321] Figure 21 The scanning sequence of 8×2 or 2×8 regions according to the implementation method is illustrated.

[0322] All regions except the top-left 8×2 or 2×8 region can be filled with zero values ​​(cleared), or existing transformation coefficients that have undergone a single transformation can be left as is. The predefined LFNST index can be one of the LFNST index values ​​(0, 1, 2) that are tried when calculating the RD cost while changing the LFNST index value during programming processing.

[0323] When the worst-case computational complexity is tuned to a certain level or lower (e.g., 8 multiplications / sample), for example, after generating only 8 transformation coefficients by multiplying by an 8×16 matrix that takes only the top 8 rows of a 16×16 matrix, the transformation coefficients can be... Figure 21 The scan order can be set, and zeroing can be applied to the remaining coefficient regions. Worst-case complexity control will be described later.

[0324] 4. When applying the inverse LFNST, a preset number (e.g., 16) of transform coefficients are set as the input vector, and the LFNST set obtained from process 2 and the LFNST kernel (e.g., a 16×16 matrix) derived from the selected LFNST index are selected. The output vector can then be derived by multiplying the LFNST kernel with the corresponding input vector.

[0325] In the case of M×2 (M×1) blocks, the output vector can be... Figure 20 The row priority setting in (b) is used, while in the case of 2×M (1×M) blocks, the output vector can be set to... Figure 20 The column precedence setting for (a).

[0326] Except for the region where the corresponding output vector is set in the upper left M×2 (M×1) or 2×M (M×2) region, the remaining regions in the partition block except for the upper left M×2 (M×1) or 2×M (M×2) region (the M×2 region in the partition block) can all be cleared to have zero values, or can be configured to preserve the reconstructed transform coefficients as is through residual coding and inverse quantization.

[0327] When constructing the input vector, as in point 3, the input data can be constructed according to... Figure 21 The scanning order can be arranged, and in order to keep the worst-case computational complexity to a certain extent or lower, the input vector can be constructed by reducing the number of input data (e.g., 8 instead of 16).

[0328] For example, when M=8, if 8 input data are used, the leftmost 16×8 matrix can be taken from the corresponding 16×16 matrix and multiplied to obtain 16 output data. Worst-case complexity control will be described later.

[0329] In the above implementation, when applying LFNST, the case of applying symmetry between M×2 (M×1) blocks and 2×M (1×M) blocks is shown. However, according to another example, different sets of LFNST can be applied to each of the two block shapes.

[0330] The following sections will describe various examples of mapping methods using intra-prediction mode and LFNST set configurations using ISP mode.

[0331] In ISP mode, the LFNST set configuration can differ from the existing LFNST set. In other words, a different core than the existing LFNST core can be applied, and a different mapping table can be applied than the mapping table used between the intra-prediction mode index and the LFNST set in the current VVC standard. The mapping table used in the current VVC standard can be the same as the mapping table in Table 2.

[0332] In Table 2, the preModeIntra value represents the intra-prediction mode value that has changed to take WAIP into account, and the lfnstTrSetIdx value is the index value indicating a specific LFNST set. Each LFNST set is configured with two LFNST cores.

[0333] When applying the ISP prediction mode, if both the horizontal and vertical lengths of each partition block are equal to or greater than 4, the same kernel as the LFNST kernel used in the current VVC standard can be applied, and the mapping table can be applied as is. Alternatively, mapping tables and LFNST kernels different from those in the current VVC standard can be applied.

[0334] When applying the ISP prediction mode, if the horizontal or vertical length of each block is less than 4, a mapping table and LFNST core different from the current VVC standard can be applied. In the following text, Tables 6 to 8 show the mapping table between intra-prediction mode values ​​(intra-prediction mode values ​​changed to take into account WAIP) and LFNST sets, which can be applied to M×2 (M×1) blocks or 2×M (1×M) blocks.

[0335] [Table 6]

[0336] predModeIntra lfnstTrSetIdx predModeIntra<0 1 0<=predModeIntra<=1 0 2<=predModeIntra<=12 1 13<=predModeIntra<=23 2 24<=predModeIntra<=34 3 35<=predModeIntra<=44 4 45<=predModeIntra<=55 5 56<=predModeIntra<=66 6 67<=predModeIntra<=80 6 81<=predModeIntra<=83 0

[0337] [Table 7]

[0338] predModeIntra lfnstTrSetIdx predModeIntra<0 1 0<=predModeIntra<=1 0 2<=predModeIntra<=23 1 24<=predModeIntra<=44 2 45<=predModeIntra<=66 3 67<=predModeIntra<=80 3 81<=predModeIntra<=83 0

[0339] [Table 8]

[0340] predModeIntra lfnstTrSetIdx predModeIntra<0 1 0<=predModeIntra<=1 0 2<=predModeIntra<=80 1 81<=predModeIntra<=83 0

[0341] The first mapping table in Table 6 is configured with seven LFNST sets, the mapping table in Table 7 is configured with four LFNST sets, and the mapping table in Table 8 is configured with two LFNST sets. As another example, when it is configured with one LFNST set, the lfnstTrSetIdx value can be fixed to 0 relative to the preModeIntra value.

[0342] The following section describes a method for maintaining the worst-case computational complexity when applying LFNST to the ISP pattern.

[0343] In ISP mode, when applying LFNST, the number of multiplications per sample (or per coefficient, per position) may be limited to a certain value or less. Depending on the size of the partition block, the number of multiplications per sample (or per coefficient, per position) can be kept to 8 or less by applying LFNST as follows.

[0344] 1. When both the horizontal and vertical lengths of the partition block are 4 or greater, the same computational complexity control method as the worst-case method for LFNST in the current VVC standard can be applied.

[0345] In other words, when the partition block is 4×4, an 8×16 matrix obtained by sampling the top 8 rows of a 16×16 matrix can be applied instead of a 16×16 matrix in the forward direction, and a 16×8 matrix obtained by sampling the left 8 columns of a 16×16 matrix can be applied in the reverse direction. Furthermore, when the partition block is 8×8, in the forward direction, instead of a 16×48 matrix, an 8×48 matrix obtained by sampling the top 8 rows of a 16×48 matrix is ​​applied, and in the reverse direction, instead of a 48×16 matrix, a 48×8 matrix obtained by sampling the left 8 columns of a 48×16 matrix can be applied.

[0346] In the case of 4×N or N×4 (N>4) blocks, when performing a forward transformation, the 16 coefficients generated after applying the 16×16 matrix only to the top-left 4×4 block can be set in the top-left 4×4 region, and other regions can be filled with values ​​of 0. Conversely, when performing an inverse transformation, the 16 coefficients in the top-left 4×4 block are set in scan order to form the input vector, and then multiplied by the 16×16 matrix to generate 16 output data. The generated output data can be set in the top-left 4×4 region, and the remaining regions outside the top-left 4×4 region can be filled with values ​​of 0.

[0347] In the case of 8×N or N×8 (N>8) blocks, when performing the forward transformation, the 16 coefficients generated after applying a 16×48 matrix to only the ROI region within the top-left 8×8 block (excluding the remaining regions from the bottom-right 4×4 block within the top-left 8×8 block) can be set in the top-left 4×4 region, and all other regions can be filled with values ​​of 0. Furthermore, when performing the inverse transformation, the 16 coefficients located in the top-left 4×4 region are set in scan order to form the input vector, which can then be multiplied by a 48×16 matrix to generate 48 output data points. The generated output data can be filled in the ROI regions, and all other regions can be filled with values ​​of 0.

[0348] 2. When the size of the partition block is N×2 or 2×N and LFNST is applied to the top left M×2 or 2×M region (M≤N), a matrix sampled according to the value of N can be applied.

[0349] With M=8, for partitions of N=8, i.e., 8×2 or 2×8 blocks, in the case of forward transformation, an 8×16 matrix obtained by sampling the top 8 rows of a 16×16 matrix can be applied instead of a 16×16 matrix, and in the case of inverse transformation, a 16×8 matrix obtained by sampling the left 8 columns of a 16×16 matrix can be applied instead of a 16×16 matrix.

[0350] When N is greater than 8, in the forward transformation, the 16×16 matrix applied to the top-left 8×2 or 2×8 block generates 16 output data points, which are then placed within that block, with the remaining areas filled with values ​​of 0. In the inverse transformation, the 16 coefficients in the top-left 8×2 or 2×8 block are arranged in scan order to form the input vector, which is then multiplied by the 16×16 matrix to generate 16 output data points. These output data points can also be placed within the top-left 8×2 or 2×8 block, with all remaining areas filled with values ​​of 0.

[0351] 3. When the size of the partition block is N×1 or 1×N and LFNST is applied to the top left M×1 or 1×M region (M≤N), a matrix sampled according to the value of N can be applied.

[0352] When M=16, for partitioned blocks of N=16, i.e., 16×1 or 1×16 blocks, in the case of forward transformation, an 8×16 matrix obtained by sampling the top 8 rows of the 16×16 matrix can be applied instead of a 16×16 matrix, and in the case of inverse transformation, a 16×8 matrix obtained by sampling the left 8 columns of the 16×16 matrix can be applied instead of a 16×16 matrix.

[0353] When N is greater than 16, in the forward transformation, the 16 output data generated by applying a 16×16 matrix to the top-left 16×1 or 1×16 block can be set within the top-left 16×1 or 1×16 block, and the remaining areas can be filled with values ​​of 0. In the inverse transformation, the 16 coefficients located in the top-left 16×1 or 1×16 block can be set in scan order to form the input vector, and then multiplied by the 16×16 matrix to generate 16 output data. The generated output data can be set within the top-left 16×1 or 1×16 block, and all remaining areas can be filled with values ​​of 0.

[0354] As another example, to keep the number of multiplications per sample (or per coefficient, per position) at a certain value or less, the number of multiplications per sample (or per coefficient, per position) can be kept to 8 or less based on the ISP coding unit size rather than the ISP block size. When only one block in the ISP blocks satisfies the conditions for applying LFNST, the worst-case complexity calculation for LFNST can be applied based on the corresponding coding unit size rather than the block size. For example, the luma coding block of a coding unit (CU) is divided (or partitioned) into four blocks, each with a size of 4×4. Furthermore, in this paper, if there are no non-zero transform coefficients for two of the four blocks, each of the remaining two blocks can be configured to have 16 transform coefficients (based on the encoder) generated therein, instead of 8 transform coefficients.

[0355] The following section describes a method for signaling the LFNST index in ISP mode.

[0356] As described above, the LFNST index can have values ​​0, 1, and 2, where 0 indicates that LFNST is not applied, and where 1 and 2 respectively indicate each of the two LFNST kernel matrices included in the selected LFNST set. LFNST is applied based on the LFNST kernel matrix selected by the LFNST index. In the current VVC standard, the method of sending LFNST according to this will be described below.

[0357] 1. The LFNST index can be sent once for each coding unit (CU), and in the case of a dual-tree system, the LFNST index can be signaled individually for each of the luma and chroma blocks.

[0358] 2. When no signal is sent to the LFNST index, the LFNST index is inferred to be 0, which is the default value. The following section describes the case where the LFNST index value is inferred to be 0.

[0359] A. When the pattern corresponds to a pattern in which no transformation is applied (e.g., transform skip, BDPCM, lossless coding, etc.).

[0360] B. When a transformation is not DCT-2 (DST7 or DCT8), that is, when the horizontal or vertical transformation is not DCT-2.

[0361] C. LFNST cannot be applied when the horizontal or vertical length of the luminance block of the coding unit exceeds the maximum luminance transform size available for transformation, for example, when the maximum luminance transform size available for transformation is equal to 64, and when the size of the luminance block of the coding unit is equal to 128×16.

[0362] In the case of a dual-tree system, it is determined whether each of the coding units for the luma component and the chroma component exceeds the maximum luma transform size. That is, it is checked whether the luma block exceeds the maximum luma transform size available for the transform, and whether the chroma block exceeds the horizontal / vertical length of the transform available for the corresponding luma block in the color format and the maximum luma transform size. For example, when the color format is 4:2:0, each of the horizontal / vertical lengths of the corresponding luma block becomes twice the length of the corresponding chroma block, and the transform size of the corresponding luma block becomes twice the size of the corresponding chroma block. As another example, when the color format is 4:4:4, the horizontal / vertical length and transform size of the corresponding luma block are the same as those of the corresponding chroma block.

[0363] A 64-length transformation or a 32-length transformation refers to a transformation applied to a horizontal or vertical length of 64 or 32, respectively. Furthermore, "transformation size" can refer to the corresponding length of 64 or 32.

[0364] In the case of a single tree, after checking whether the horizontal or vertical length of the luminance block exceeds the maximum luminance transform block size available for transformation, the LFNST index signaling can be skipped (or omitted) if the length exceeds the transform block size.

[0365] D. The LFNST index may be sent only if both the horizontal and vertical lengths of the coding unit are equal to 4 or greater.

[0366] In the case of a two-tree system, the LFNST index can only be signaled if both the horizontal and vertical lengths of the corresponding component (i.e., the luminance component or the chrominance component) are equal to 4 or greater.

[0367] In the case of a single tree, when both the horizontal and vertical lengths of the luminance component are equal to 4 or greater, a signal can be sent to notify the LFNST index.

[0368] E. When the last non-zero coefficient position is not the DC position (top left position in the block), if the block is a dual-tree luma block and the last non-zero coefficient position is not the DC position, send the LFNST index. If the block is a dual-tree chroma block and at least one of the last non-zero coefficient positions of Cb and Cr is not the DC position, send the corresponding LFNST index.

[0369] In the case of a single-tree type, for any of the luminance component, Cb component, and Cr component, if the corresponding last non-zero coefficient position is not a DC position, then an LFNST index is sent.

[0370] In this paper, when the Code Block Flag (CBF) value, which indicates the presence or absence of transform coefficients for a transform block, is equal to 0, the position of the last non-zero coefficient for the corresponding transform block is not checked in order to determine whether to execute the LFNST index signaling. That is, when the corresponding CBF value is equal to 0, since the transform is not applied to the corresponding block, the position of the last non-zero coefficient can be ignored when checking the conditions for the LFNST index signaling.

[0371] For example, 1) in the case of dual-tree type and luma component, if the corresponding CBF value is equal to 0, no signal is sent to notify the LFNST index; 2) in the case of dual-tree type and chrominance component, if the CBF value of Cb is equal to 0 and the CBF value of Cr is equal to 1, only the position of the last non-zero coefficient of Cr is checked in order to send the corresponding LFNST index; and 3) in the case of single-tree type, only the position of the last non-zero coefficient of the luma component, Cb component or Cr component with a CBF value of 1 is checked.

[0372] F. When verifying that the transform coefficients exist in positions other than those where LFNST transform coefficients can exist, the LFNST index signaling can be skipped (or omitted). In the case of 4×4 and 8×8 transform blocks, according to the transform coefficient scan order of the VVC standard, LFNST transform coefficients can exist in 8 positions starting from the DC position, and all remaining positions can be filled with 0. Furthermore, in the case where the transform block is not a 4×4 or 8×8 transform block, according to the transform coefficient scan order of the VVC standard, LFNST transform coefficients can exist in 16 positions starting from the DC position, and all remaining positions can be filled with 0.

[0373] Therefore, after performing residual coding, the LFNST index signaling can be skipped (or omitted) when non-zero transform coefficients exist in regions that should only be filled with 0 values.

[0374] Furthermore, the ISP mode can be applied only to luma blocks or to both luma and chroma blocks. As described above, when applying ISP prediction, prediction is performed after the corresponding coding unit is divided (or partitioned) into 2 or 4 partition blocks, and the transform can also be applied to each of the corresponding partition blocks. Therefore, even when determining the conditions for signaling the LFNST index by the coding unit, it should be considered that the LFNST can be applied to each of the corresponding partition blocks. Additionally, when the ISP prediction mode is applied only to a specific component (e.g., luma block), the LFNST index should be signaled based on the fact that the coding unit is divided into partition blocks only for the corresponding component. The LFNST index signaling method that can be used in ISP mode will be described below.

[0375] 1. The LFNST index can be sent once for each coding unit (CU), and in the case of a dual-tree system, the LFNST index can be signaled individually for each of the luma and chroma blocks.

[0376] 2. When no signal is sent to the LFNST index, the LFNST index is inferred to be 0, which is the default value. The following section describes the case where the LFNST index value is inferred to be 0.

[0377] A. When the pattern corresponds to a pattern in which no transformation is applied (e.g., transform skip, BDPCM, lossless coding, etc.).

[0378] B. LFNST cannot be applied when the horizontal or vertical length of the luminance block of the coding unit exceeds the maximum luminance transform size available for transformation, for example, when the maximum luminance transform size available for transformation is equal to 64, and when the size of the luminance block of the coding unit is equal to 128×16.

[0379] The decision to execute the LFNST index signaling can be based on the size of the partition block rather than the coding unit. That is, when the horizontal or vertical length of the partition block corresponding to the luma block exceeds the maximum luma transformation size available for transformation, the LFNST index signaling can be skipped (or omitted), and the LFNST index value can be inferred to be 0.

[0380] In the case of a dual-tree system, it is determined whether each coding unit or block of the luma component and each coding unit or block of the chroma component exceeds the maximum block transform size. That is, by comparing each of the horizontal and vertical lengths of the luma component's coding unit or block with the maximum luma transform size, and when it is determined that at least one length is greater than the maximum luma transform size, LFNST is not applied. Furthermore, in the case of the chroma component's coding unit or block, the horizontal / vertical length of the corresponding luma block of the color format is compared with the maximum luma transform size available for the maximum transform. For example, when the color format is 4:2:0, each of the horizontal / vertical lengths of the corresponding luma block becomes twice the length of the corresponding chroma block, and the transform size of the corresponding luma block becomes twice the size of the corresponding chroma block. As another example, when the color format is 4:4:4, the horizontal / vertical length and transform size of the corresponding luma block are the same as those of the corresponding chroma block.

[0381] In the case of a single tree, after checking whether the horizontal or vertical length of the luminance block (encoding unit or partition block) exceeds the maximum luminance transform block size available for transform, the LFNST index signaling can be skipped (or omitted) if the length exceeds the transform block size.

[0382] C. If the application includes LFNST in the current VVC standard, the LFNST index can only be sent if both the horizontal and vertical lengths of the partition block are equal to 4 or greater.

[0383] In addition to the LFNST included in the current VVC standard, if an LFNST is applied for 2×M (1×M) or M×2 (M×1) blocks, the LFNST index can be sent only if the partition block size is equal to or greater than 2×M (1×M) or M×2 (M×1) blocks. In this paper, when P×Q blocks are equal to or greater than R×S blocks, this means P≥R and Q≥S.

[0384] In summary, the LFNST index can be sent only when the partition block size is equal to or greater than the minimum size for which LFNST can be applied. In the case of a two-tree system, the LFNST index can be signaled only when the partition block size of either the luma or chroma component is equal to or greater than the minimum size for which LFNST can be applied. In the case of a single-tree system, the LFNST index can be signaled only when the partition block size of the luma component is equal to or greater than the minimum size for which LFNST can be applied.

[0385] In this specification, when M×N blocks are equal to or greater than K×L blocks, this means that M is equal to or greater than K and N is equal to or greater than L. When M×N blocks are greater than K×L blocks, this means that M is equal to or greater than K and N is equal to or greater than L, and simultaneously M is greater than K or N is greater than L. When M×N blocks are less than or equal to K×L blocks, this means that M is less than or equal to K and N is less than or equal to L. Furthermore, when M×N blocks are less than K×L blocks, this means that M is less than or equal to K and N is less than or equal to L, and simultaneously M is less than K or N is less than L.

[0386] D. When the last non-zero coefficient position is not the DC position (top-left position in the block), if the block is a dual-tree luma block, and if even the corresponding last non-zero coefficient position of one of the partition blocks is not the DC position, then the corresponding LFNST index can be sent. If the block is a dual-tree chroma block, and even if one of the last non-zero coefficient positions of all partition blocks of Cb (when the ISP mode is not applied to the chroma component, the number of partition blocks is equal to 1) and the last non-zero coefficient positions of all partition blocks of Cr (when the ISP mode is not applied to the chroma component, the number of partition blocks is equal to 1) is not the DC position, then the corresponding LFNST index can be sent.

[0387] In the case of a single-tree type, for any one of the luminance component, Cb component, and Cr component, if the corresponding last non-zero coefficient position of even one of the partition blocks is not the DC position, then the LFNST index can be sent.

[0388] In this paper, when the Code Block Flag (CBF) value, which indicates the presence or absence of transform coefficients for each partition block, is equal to 0, the position of the last non-zero coefficient of the corresponding partition block is not checked in order to determine whether to execute the LFNST index signaling. That is, when the corresponding CBF value is equal to 0, since the transform has not been applied to the corresponding block, the position of the last non-zero coefficient of the corresponding partition block is not considered when checking the conditions for the LFNST index signaling.

[0389] For example, 1) in the case of dual-tree type and luma component, if the corresponding CBF value of each partition block is equal to 0, then the corresponding partition block is excluded when determining whether to execute LFNST index signaling; 2) in the case of dual-tree type and chroma component, if the CBF value of Cb of each partition block is equal to 0 and the CBF value of Cr is equal to 1, then only the position of the last non-zero coefficient of Cr is checked to determine whether to execute the corresponding LFNST index signaling; and 3) in the case of single-tree type, only the position of the last non-zero coefficient of the luma component, Cb component or Cr component with a CBF value of 1 of all partition blocks is checked to determine whether to execute LFNST index signaling.

[0390] In ISP mode, the image information can be configured so that the position of the last non-zero coefficient is not checked, and the corresponding implementation will be described below.

[0391] i. In ISP mode, the check for the last non-zero coefficient position of both luma and chroma blocks is skipped, and LFNST index signaling can be authorized. That is, even if the last non-zero coefficient position of all partition blocks is a DC position or has a corresponding CBF value of 0, the corresponding LFNST index signaling can still be authorized.

[0392] ii. In ISP mode, only the check for the last non-zero coefficient position of the luma block is skipped, while for the chroma block, the check for the last non-zero coefficient position according to the method described above can be performed. For example, in the case of dual-tree type and luma block, the check for the last non-zero coefficient position is not performed, and LFNST index signaling can be authorized. Furthermore, in the case of dual-tree type and chroma block, the presence or absence of the DC position corresponding to the last non-zero coefficient position is checked according to the method described above to determine whether to execute the corresponding LFNST index signaling.

[0393] iii. In the case of ISP mode and single-tree type, method number i and method number ii can be applied. That is, when method number i is applied to ISP mode and single-tree type, the check for the last non-zero coefficient position of both luma and chroma blocks can be skipped, and LFNST index signaling can be authorized. Alternatively, by applying method number ii, the check for the last non-zero coefficient position of the luma component's partition blocks can be skipped, and the check for the last non-zero coefficient position of the chroma component's partition blocks can be performed according to the above method (when ISP mode is not applied to the chroma component, the number of partition blocks can be given as 1) to determine whether to execute the corresponding LFNST index signaling.

[0394] E. For example, when verifying that the transform coefficients exist in a location other than where the LFNST transform coefficients might exist even for one of all partition blocks, the LFNST index signaling can be skipped (or omitted).

[0395] For example, in the case of 4×4 and 8×8 partition blocks, according to the transform coefficient scan order of the VVC standard, the LFNST transform coefficients can exist in 8 positions starting from the DC position, and all remaining positions can be filled with 0. Furthermore, when the partition block is equal to or greater than 4×4, and when the partition block is not a 4×4 or 8×8 partition block, according to the transform coefficient scan order of the VVC standard, the LFNST transform coefficients can exist in 16 positions starting from the DC position, and all remaining positions can be filled with 0.

[0396] Therefore, after performing residual coding, the LFNST index signaling can be skipped (or omitted) when non-zero transform coefficients exist in regions that should only be filled with 0 values.

[0397] If LFNST can be applied even for partition block sizes of 2×M (1×M) or M×2 (M×1), then the region where LFNST transform coefficients can be located can be specified as follows. The region outside the region where LFNST transform coefficients can be located can be filled with 0. Furthermore, when assuming LFNST has already been applied, if non-zero transform coefficients exist in the region that should be filled with 0, the LFNST index signaling can be skipped.

[0398] i. When LFNST can be applied to 2×M or M×2 blocks, and when M=8, only 8 LFNST transform coefficients can be generated for 2×8 or 8×2 partitioned blocks. Figure 20 When arranging the transformation coefficients in the scan order shown, 8 transformation coefficients are arranged in scan order starting from the DC position, and the remaining 8 positions can be filled with 0.

[0399] Sixteen LFNST transform coefficients can be generated for 2×N or N×2 (N>8) partitions. When using... Figure 20 When arranging transform coefficients in the scan order shown, 16 transform coefficients are arranged in scan order starting from the DC position, and the remaining area can be filled with 0. That is, in a 2×N or N×2 (N>8) partition block, the area excluding the top left 2×8 or 8×2 block can be filled with 0. Instead of 8 LFNST transform coefficients, 16 coefficient blocks can also be generated for a 2×8 or 8×2 partition block, and in this case, there is no area that needs to be filled with 0. As mentioned above, when LFNST is applied, if a non-zero transform coefficient is detected in an area that is even specified to be filled with 0 in a partition block, the LFNST index signaling can be skipped, and the LFNST index can be inferred to be 0.

[0400] ii. When LFNST can be applied to 1×M or M×1 blocks, and when M=16, only 8 LFNST transform coefficients can be generated for 1×16 or 16×1 partition blocks. When the transform coefficients are arranged in a left-to-right or top-to-bottom scan order, 8 transform coefficients are arranged starting from the DC position in the corresponding scan order, and the remaining 8 positions can be filled with 0.

[0401] Sixteen LFNST transform coefficients can be generated for a 1×N or N×1 partition (N>16). When the transform coefficients are arranged in a left-to-right or top-to-bottom scan order, the 16 transform coefficients are arranged starting from the DC position in the corresponding scan order, and the remaining area can be filled with 0. That is, in a 1×N or N×1 (N>16) partition, the area excluding the top left 1×16 or 16×1 block can be filled with 0.

[0402] Instead of 8 LFNST transform coefficients, 16 coefficient blocks can be generated for 1×16 or 16×1 partition blocks, and in this case, there are no regions that need to be filled with 0. As mentioned above, when LFNST is applied, if a non-zero transform coefficient is detected in a region that is specified to be filled with 0 even in a partition block, the LFNST index signaling can be skipped, and the LFNST index can be inferred as 0.

[0403] Furthermore, in ISP mode, in the current VVC standard, DST-7 is applied instead of DCT-2 by independently (or separately) referencing the length conditions in the horizontal and vertical directions, without performing signaling for the MTS index. A transform core is determined based on whether the horizontal or vertical length is equal to or greater than 4 and less than or equal to 16. Therefore, in ISP mode, and when LFNST can be applied, the following transform combinations can be configured as described below.

[0404] 1. For cases where the LFNST index is 0 (including cases where the LFNST index is inferred to be 0), the conditions used to determine the first transform corresponding to the ISP mode included in the current VVC standard can be followed. That is, by independently (or separately) checking whether the length conditions in the horizontal and vertical directions are met (i.e., the condition that the length is equal to or greater than 4 and less than or equal to 16), if the length conditions are met, DST-7 instead of DCT-2 is applied to the first transform. And if the length conditions are not met, DCT-2 can be applied.

[0405] 2. For cases where the LFNST index is greater than 0, the following two configurations may be possible for a single transformation.

[0406] A.DCT-2 can be applied to both the horizontal and vertical directions.

[0407] B. Conditions for determining the first transformation corresponding to the ISP mode included in the current VVC standard can be followed. That is, by independently (or separately) checking whether the length conditions in the horizontal and vertical directions are met (i.e., the condition that the length is equal to or greater than 4 and less than or equal to 16), if the length conditions are met, DST-7 is applied instead of DCT-2. And if the length conditions are not met, DCT-2 can be applied.

[0408] In ISP mode, image information can be configured to send LFNST indexes for each partition block, rather than for each coding unit. In this case, the LFNST index signaling method described above assumes that only one partition block exists within the unit through which the LFNST index is sent, and can determine whether to execute LFNST index signaling.

[0409] Furthermore, the following describes an implementation of an LFNST core that can be applied to 2×N or N×2 (N≥8) partition blocks when in ISP mode. This LFNST core can be applied to the top-left 2×8 or 8×2 region within a 2×N or N×2 (N≥8) partition block. And, the corresponding LFNST core can be used in the LFNST described in the above implementation.

[0410] Each of Tables 9 through 11 shows an example where the LFNST cores are configured as a total of 2 LFNST sets, where each LFNST set is configured as two LFNST core candidates (i.e., LFNST core matrices). The LFNST cores presented in Tables 9 through 11 are defined according to the syntax of the C / C++ programming language. And in g_lfnst_2×8_8×2[2][2]

[16]

[16] , which is an array storing LFNST core data, [2] indicates that the cores are configured as a total of 2 LFNST sets, [2] indicates that each LFNST set is configured as 2 LFNST core candidates, and

[16]

[16] indicates that each LFNST core is configured as a 16×16 matrix. Each 16×16 matrix shown in Tables 9 through 11 represents a matrix used in the forward LFNST transformation. That is, a row is transformed into a transformation basis vector (1×16 vector), which is then multiplied by the input data configured by the first transformation coefficients.

[0411] [Table 9]

[0412]

[0413]

[0414] [Table 10]

[0415]

[0416]

[0417] [Table 11]

[0418]

[0419]

[0420] Each of Tables 12 to 14 shows an example where the LFNST core is configured as one LFNST set, where one LFNST set is configured as two LFNST core candidates (i.e., LFNST core matrices). The LFNST cores presented in Tables 12 to 14 are defined according to the syntax of the C / C++ programming language. And in g_lfnst_2×8_8×2[1][2]

[16]

[16] , which is an array storing LFNST core data, [1] indicates that the core is configured as one LFNST set, [2] indicates that each LFNST set is configured as two LFNST core candidates, and

[16]

[16] indicates that each LFNST core is configured as a 16×16 matrix. Each 16×16 matrix shown in Tables 12 to 14 represents a matrix used in the forward LFNST transformation. That is, a row is transformed into a transformation basis vector (1×16 vector), which is then multiplied by the input data configured by the first transformation coefficients.

[0421] [Table 12]

[0422]

[0423] [Table 13]

[0424]

[0425] [Table 14]

[0426]

[0427] The following figures are provided to illustrate specific examples of this disclosure. Since the specific names of the devices or signals / messages / fields shown in the figures are for illustrative purposes only, the technical features of this disclosure are not limited to the specific names used in the following figures.

[0428] Figure 22 This is a flowchart illustrating the operation of a video decoding device according to an embodiment of this document.

[0429] Figure 22 Each step disclosed in the document is based on the above. Figures 4 to 22 Some of the content described above. Therefore, omissions or simplifications will be made in relation to the above. Figures 3 to 21 The detailed descriptions described in the text overlap with the detailed descriptions in the text.

[0430] According to this embodiment, the decoding device 300 can receive residual information from the bit stream (S2210).

[0431] More specifically, the decoding device 300 can decode information about the quantization transform coefficients of the target block from the bitstream, and can deduce the quantization transform coefficients of the current block based on the information about the quantization transform coefficients of the current block. The information about the quantization transform coefficients of the target block can be included in the Sequence Parameter Set (SPS) or the stripe header, and may include at least one of the following: information about whether a reduction transform (RST) is applied, information about a reduction factor, information about the minimum transform size for applying the reduction transform, information about the maximum transform size for applying the reduction transform, the size of the inverse reduction transform, and information indicating the transform index of any transform kernel matrix included in the transform set.

[0432] Additionally, the decoding device can receive information about the intra-prediction mode of the current block and information about whether ISP is applied to the current block. By receiving and parsing flag information indicating whether to apply ISP codes or ISP modes, the decoding device can deduce whether the current block is divided (or split or partitioned) into a predetermined number of sub-partition transform blocks. In this paper, the current block can be a coded block. Furthermore, the decoding device can deduce the size and number of the sub-partition blocks by using flag information indicating the direction in which the current block will be divided (or partitioned).

[0433] The decoding device 300 can deduce the residual information of the current block, that is, deduce the transform coefficients by performing dequantization on the quantized transform coefficients (S2220).

[0434] The derived transform coefficients can be arranged (or aligned) in a 4×4 block according to the reverse diagonal scan order, and the transform coefficients within the 4×4 block can also be arranged according to the reverse diagonal scan order. In other words, the transform coefficients processed by dequantization can be arranged according to the reverse scan order applied in a video codec (e.g., VVC or HEVC).

[0435] Decoding devices can derive modified transform coefficients by applying LFNST to the transform coefficients.

[0436] Unlike a single transform that individually transforms the coefficients as the target of the transform along a vertical or horizontal direction, LFNST is an inseparable transform that applies the transform without separating the coefficients along a specific direction. This inseparable transform can be a low-frequency inseparable transform that applies the forward transform only in the low-frequency region rather than the entire block region.

[0437] LFNST index information is received as syntax information, and the syntax information can be received as a binary bin string containing 0s and 1s.

[0438] According to this embodiment, the syntax elements of the LFNST index can indicate whether an inverse LFNST or an inverse inseparable transformation is being applied, and whether any of the transformation kernel matrices are included in the transformation set. Furthermore, when the transformation set includes two transformation kernel matrices, the transformation index can contain three different syntax element values.

[0439] In other words, according to the implementation, the syntax element value for the LFNST index may include: 0, which indicates that the inverse LFNST is not applied to the target block; 1, which indicates the first of the two transform kernel matrices; and 2, which indicates the second of the two transform kernel matrices.

[0440] Intra-prediction mode information and LFNST index information can be signaled at the coding unit level.

[0441] The decoding device can determine whether to parse the LFNST index based on the tree type and color format of the current block and whether the ISP is applied to the current block (S2230), and the decoding device can parse the LFNST index only when LFNST is applicable (S2240).

[0442] According to the implementation method, when the tree structure of the current block is a dual-tree chroma, the decoding device can parse the LFNST index when the height and width corresponding to the chroma component block of the current block are equal to 4 or greater.

[0443] Additionally, according to the implementation method, when the tree structure of the current block is a single-tree or dual-tree luminance, the decoding device can parse the LFNST index when the height and width corresponding to the luminance component block of the current block are equal to 4 or greater.

[0444] Furthermore, when the ISP is applied to the current block, i.e., when the current block is partitioned into sub-partition transform blocks, the decoding device can determine whether the LFNST is applicable to the height and width of the partitioned sub-partition blocks. And, in this case, when the height and width of the sub-partition blocks are equal to 4 or greater, the decoding device can resolve the LFNST index.

[0445] Additionally, when the current block's tree structure is either a dual-tree luminance or a single-tree, the decoding device can resolve the LFNST index when the height and width of the sub-blocks of the current block's luminance component block are equal to 4 or greater.

[0446] For example, if the current block has a two-tree chroma structure, ISP may not be applied, and in this case, the decoding device can resolve the LFNST index when the height and width corresponding to the chroma component block of the current block are equal to 4 or greater.

[0447] Conversely, if the current block's tree structure is a dual-tree luminance or a single-tree rather than a dual-tree chrominance, the decoding device can resolve the LFNST index if the height and width of the sub-blocks of the current block's luminance component block or the height and width of the current block are equal to 4 or greater, depending on whether the ISP is applied to the current block.

[0448] According to the implementation, the current block is an encoding unit, and the decoding device can resolve the LFNST index when the width and height of the encoding unit are equal to or less than the maximum brightness transformation size available for transformation.

[0449] Subsequently, the decoding device can derive the modified transform coefficients from the transform coefficients based on the LFNST index and the LFNST matrix used for LFNST (S2250).

[0450] The decoding device can determine the LFNST set, which includes the LFNST matrix, based on the intra-prediction mode derived from the intra-prediction mode information, and select any one of the multiple LFNST matrices based on the LFNST set and the LFNST index.

[0451] In this scenario, the same LFNST set and the same LFNST index can be applied to sub-partition transform blocks partitioned from the current block. That is, because the same intra-prediction mode is applied to sub-partition transform blocks, the LFNST set determined based on the intra-prediction mode can be applied equally to all sub-partition transform blocks. Furthermore, because the LFNST index is signaled at the coding unit level, the same LFNST matrix can be applied to sub-partition transform blocks partitioned from the current block.

[0452] As described above, the transform set can be determined based on the intra-prediction mode of the transform block to be transformed, and the inverse LFNST can be performed based on any of the transform kernel matrices (i.e., LFNST matrices) included in the transform set indicated by the LFNST index. The matrix applied to the inverse LFNST can be called the inverse LFNST matrix or the LFNST matrix, and such a matrix can have any name as long as it has a transpose relation to the matrix used for the forward LFNST.

[0453] In one example, the inverse LFNST matrix can be a non-square matrix in which the number of columns is less than the number of rows.

[0454] A predetermined number of transformation coefficients can be derived as the output data of LFNST based on the size of the current block or sub-partition transform block. For example, such as Figure 8 As illustrated on the left, when the height and width of the current block or sub-partition transform block are 8 or greater, 48 transform coefficients can be derived, and as shown... Figure 8As illustrated on the right, when the width and height of the sub-partition transform block are not 8 or greater, that is, when the width or height of the sub-partition transform block is 4 or greater and less than 8, 16 transform coefficients can be derived.

[0455] like Figure 8 As shown, 48 transformation coefficients can be arranged in the upper left, upper right, and lower left 4×4 regions of the upper left 8×8 region of the sub-partition transformation block, and 16 transformation coefficients can be arranged in the upper left 4×4 region of the sub-partition transformation block.

[0456] 48 transform coefficients and 16 transform coefficients can be arranged in the vertical or horizontal direction according to the intra-prediction mode of the sub-partition transform block. For example, Figure 8 As shown in (a), when the intra-frame prediction mode is based on the diagonal direction ( Figure 10 Pattern 34) is in the horizontal direction ( Figure 10 In modes 2 to 34), the transformation coefficients can be arranged horizontally, that is, in row-major order, and as shown in... Figure 8 As shown in (b), when the intra-frame prediction mode is based on the diagonal direction in the vertical direction ( Figure 10 In modes 35 to 66, the transformation coefficients can be arranged in the horizontal direction, that is, in column priority order.

[0457] The decoding device can derive the residual sample of the current block based on the inverse first transform of the modified transform coefficients (S2260).

[0458] At this point, a generally separable transformation can be used as an inverse first-order transformation, and the aforementioned MTS can also be used.

[0459] Subsequently, the decoding device 300 can generate a reconstructed sample based on the residual sample of the current block and the predicted sample of the current block (S2270).

[0460] The following figures are provided to illustrate specific examples of this disclosure. Since the specific names of the devices or signals / messages / fields shown in the figures are for illustrative purposes only, the technical features of this disclosure are not limited to the specific names used in the following figures.

[0461] Figure 23 This is a flowchart illustrating the operation of a video encoding device according to an embodiment of this document.

[0462] Figure 23 Each step disclosed in the document is based on the above. Figures 4 to 21 Some of the content described above. Therefore, omissions or simplifications will be made in relation to the above. Figure 2 and Figures 4 to 21 The detailed descriptions described in the text overlap with the detailed descriptions in the text.

[0463] According to the implementation method, the encoding device 200 can first derive the prediction sample of the current block based on the intra-prediction mode applied to the current block.

[0464] When the ISP is applied to the current block, the encoding device can perform predictions for each sub-partition transform block.

[0465] The encoding device can determine whether to apply ISP encoding or ISP mode to the current block (i.e., the encoded block), determine the direction in which the current block will be divided based on the determination result, and deduce the size and number of sub-blocks.

[0466] For example, when the current block size (width × height) is 8 × 4, such as Figure 17 As shown, the current block can be vertically divided into two sub-blocks, and when the current block's size (width × height) is 4 × 8, the current block can be horizontally divided into two sub-blocks. Alternatively, as... Figure 18 As shown, when the size (width × height) of the current block is greater than 4×8 or 8×4, that is, when the size of the current block is 1) 4×N or N×4 (N≥16) or 2) M×N (M≥8, N≥8), the current block can be divided into 4 sub-blocks in the horizontal or vertical direction.

[0467] The same intra-prediction mode can be applied to sub-partition transform blocks divided from the current block, and the coding device can derive prediction samples for each sub-partition transform block. That is, the coding device performs intra-prediction sequentially from left to right or from top to bottom, for example, horizontally or vertically, depending on the sub-partition transform block division. For the leftmost or topmost sub-block, reconstructed pixels of already encoded blocks are referenced, as in conventional intra-prediction methods. Furthermore, for each side of a subsequent internal sub-partition transform block, when it is not adjacent to the previous sub-partition transform block, reconstructed pixels of already encoded adjacent blocks are referenced to derive reference pixels adjacent to the corresponding side, as in conventional intra-prediction methods.

[0468] The encoding device 200 can derive the residual sample of the current block based on the predicted sample (S2310).

[0469] Furthermore, the encoding device 200 can derive the transformation coefficients of the current block based on a single transformation of the residual sample (S2320).

[0470] A transformation can be performed using multiple transform kernels, and in this case, the transform kernel can be selected based on the intra-frame prediction mode.

[0471] The encoding device 200 can determine whether to perform a quadratic transform or an inseparable transform (more specifically, LFNST) on the transform coefficients of the current block, and can derive the modified transform coefficients by applying LFNST to the transform coefficients.

[0472] Unlike a single transformation that individually transforms the coefficients as the transformation target along a vertical or horizontal direction, LFNST is an inseparable transformation that applies the transformation without separating the coefficients along a specific direction. This inseparable transmission can be a low-frequency inseparable transformation that applies the transformation only in the low-frequency region rather than in the entire target block (which is the transformation target).

[0473] The encoding device can determine whether LFNST is applicable to the current block based on the tree type and color format of the current block and whether ISP is applied to the current block (S2330).

[0474] According to the implementation method, when the tree structure of the current block is a dual-tree chroma, the encoding device can determine that LFNST can be applied when the height and width corresponding to the chroma component block of the current block are equal to 4 or greater.

[0475] In addition, according to the implementation method, when the tree structure of the current block is a single-tree or dual-tree luminance, the encoding device can determine that LFNST can be applied when the height and width corresponding to the luminance component block of the current block are equal to 4 or greater.

[0476] Furthermore, when the ISP is applied to the current block, i.e., when the current block is partitioned into sub-partition transform blocks, the decoding device can determine whether LFNST is applicable to the height and width of the partitioned sub-partition blocks. And, in this case, when the height and width of the sub-partition blocks are equal to 4 or greater, the encoding device can determine that LFNST can be applied.

[0477] For example, if the current block has a two-tree chroma structure, ISP may not be applied, and in this case, the encoding device can determine that LFNST can be applied when the height and width corresponding to the chroma component block of the current block are equal to 4 or greater.

[0478] Conversely, if the current block's tree structure is a dual-tree luma or a single-tree rather than a dual-tree chroma, the encoding device can determine that LFNST can be applied, depending on whether ISP is applied to the current block, when the height and width of the sub-blocks of the luma component block of the current block or the height and width of the current block are equal to 4 or greater.

[0479] In addition, according to the implementation, the current block is an encoding unit, and when the width and height of the encoding unit are equal to or less than the maximum brightness transformation size that can be used for transformation, the encoding device can determine that LFNST can be applied.

[0480] When it is determined that LFNST should be performed, the coding device 200 can derive the modified transform coefficients for the current block or sub-partition transform block based on the LFNST set mapped to the intra-prediction mode and the LFNST matrix included in the LFNST set (S2340).

[0481] The encoding device 200 can determine the LFNST set based on the mapping relationship according to the intra-prediction mode applied to the current block, and perform LFNST, i.e., inseparable transformation, based on one of the two LFNST matrices included in the LFNST set.

[0482] In this scenario, the same LFNST set and the same LFNST index can be applied to sub-partition transform blocks partitioned from the current block. That is, because the same intra-prediction mode is applied to sub-partition transform blocks, the LFNST set determined based on the intra-prediction mode can also be applied equally to all sub-partition transform blocks. Furthermore, because the LFNST index is encoded at the coding unit level, the same LFNST matrix can be applied to sub-partition transform blocks partitioned from the current block.

[0483] As described above, the transform set can be determined based on the intra-prediction mode of the transform block to be transformed. The matrix applied to LFNST has a transpose relationship with the matrix used for inverse LFNST.

[0484] In one example, the LFNST matrix can be a non-square matrix in which the number of rows is less than the number of columns.

[0485] The region containing the transform coefficients of the input data used for LFNST can be derived based on the size of the sub-partition transform block. For example, Figure 8 As shown on the left, when the height and width of the sub-partition transform block are equal to 8 or greater, this region can be the upper-left, upper-right, and lower-left 4×4 regions of the upper-left 8×8 region in the sub-partition transform block, and as shown... Figure 8 As shown on the right, when the height and width of the sub-partition transform block are not equal to 8 or greater, the region can be the top left 4×4 region in the current block.

[0486] To perform multiplication using the LFNST matrix, the transform coefficients of the aforementioned region can be read along the vertical or horizontal direction based on the intra-prediction mode of the sub-partition transform block, thereby configuring a one-dimensional vector.

[0487] The 48 modified transform coefficients or 16 modified transform coefficients can be read along the vertical or horizontal direction according to the intra-prediction mode of the sub-partition transform block and arranged (or aligned) in a one-dimensional layout (or alignment). For example, as Figure 8 As shown in (a), when the intra-frame prediction mode is based on the diagonal direction ( Figure 10 Pattern 34) is in the horizontal direction ( Figure 10 In modes 2 to 34), the transformation coefficients can be arranged horizontally (i.e., in row-major order), and as... Figure 8 As shown in (b), when the intra-frame prediction mode is based on the diagonal direction in the vertical direction ( Figure 10 In modes 35 to 66, the transformation coefficients can be arranged (or aligned) along the horizontal direction (i.e., in column-priority order).

[0488] In one implementation, the encoding device may include the following steps: determining whether the encoding device is under conditions for applying LFNST, generating and encoding LFNST indexes based on this determination, selecting a transform kernel matrix, and applying LFNST to residual samples based on the selected transform kernel matrix and / or a simplification factor when the encoding device is under conditions for applying LFNST. In this case, the size of the simplified transform kernel matrix may be determined based on the simplification factor.

[0489] The encoding device can derive the quantized transform coefficients by performing quantization based on the modified transform coefficients of the current block, and the encoding device can then encode the information about the quantized transform coefficients, and, where LFNST is applicable (i.e., where LFNST can be applied), encode the LFNST index indicating the LFNST matrix (S2350).

[0490] In other words, the encoding device can generate residual information that includes information about the quantization transform coefficients. The residual information can include the transform-related information / syntax elements mentioned above. The encoding device can encode the image / video information including the residual information and output the encoded image / video information as a bitstream.

[0491] More specifically, the encoding device 200 can generate information about the quantization transform coefficients and encode the information about the generated quantization transform coefficients.

[0492] The syntax elements of the LFNST index according to this embodiment can indicate whether (inverse) LFNST is applied and any LFNST matrix included in the LFNST set, and when the LFNST set includes two transformation kernel matrices, the syntax elements of the LFNST index can have three values.

[0493] According to the implementation method, when the partitioning tree structure of the current block is a dual-tree type, each of the luma block and chroma block can be encoded with an LFNST index.

[0494] According to the implementation, the syntax element values ​​of the transform index can be deduced as 0, 1, and 2, where 0 indicates that (inverse) LFNST is not applied to the current block, 1 indicates the first LFNST matrix in the LFNST matrix, and 2 indicates the second LFNST matrix in the LFNST matrix.

[0495] In this disclosure, at least one of quantization / dequantization and / or transformation / inverse transformation may be omitted. When quantization / dequantization is omitted, the quantization transformation coefficients may be referred to as transformation coefficients. When transformation / inverse transformation is omitted, the transformation coefficients may be referred to as coefficients or residual coefficients, or, for the sake of consistency, may still be referred to as transformation coefficients.

[0496] Furthermore, in this disclosure, quantization transform coefficients and transform coefficients can be referred to as transform coefficients and scaling transform coefficients, respectively. In this case, residual information can include information about the transform coefficients, and this information can be signaled via residual coding syntax. Transform coefficients can be derived based on residual information (or information about transform coefficients), and scaling transform coefficients can be derived through the inverse transform (scaling) of the transform coefficients. Residual samples can be derived based on the inverse transform (scaling) of the scaling transform coefficients. These details can also be applied / expressed in other parts of this disclosure.

[0497] In the above embodiments, the method is explained based on a flowchart using a series of steps or blocks. However, this disclosure is not limited to the order of the steps, and a step may be performed in a different order or sequence than described above, or a step may be performed concurrently with other steps. Furthermore, those skilled in the art will understand that the steps shown in the flowchart are not exclusive, and another step may be incorporated or one or more steps in the flowchart may be deleted without affecting the scope of this disclosure.

[0498] The methods described above according to this disclosure can be implemented in software form, and the encoding and / or decoding devices according to this disclosure can be included in devices for image processing such as televisions, computers, smartphones, set-top boxes, and display devices.

[0499] When the embodiments of this disclosure are implemented by software, the above methods can be implemented as modules (steps, functions, etc.) for performing the above functions. These modules can be stored in memory and can be executed by a processor. The memory can be internal or external to the processor and can be connected to the processor in various well-known ways. The processor may include application-specific integrated circuits (ASICs), other chipsets, logic circuits, and / or data processing devices. The memory may include read-only memory (ROM), random access memory (RAM), flash memory, memory cards, storage media, and / or other storage devices. That is, the embodiments described in this disclosure can be implemented and executed on a processor, microprocessor, controller, or chip. For example, the functional units shown in each figure can be implemented and executed on a computer, processor, microprocessor, controller, or chip.

[0500] Furthermore, the decoding and encoding devices using this disclosure can include multimedia broadcast transceivers, mobile communication terminals, home theater video devices, digital cinema video devices, surveillance cameras, video chat devices, real-time communication devices (such as video communication), mobile streaming devices, storage media, cameras, video-on-demand (VoD) service providers, over-the-top (OTT) video devices, internet streaming service providers, three-dimensional (3D) video devices, video telephony devices, and medical video devices, and can be used to process video signals or data signals. For example, over-the-top (OTT) video devices can include game consoles, Blu-ray players, internet access TVs, home theater systems, smartphones, tablet PCs, digital video recorders (DVRs), etc.

[0501] Furthermore, the processing methods of this disclosure can be produced in the form of a computer-executable program and can be stored in a computer-readable storage medium. Multimedia data having the data structure according to this disclosure can also be stored in a computer-readable storage medium. Computer-readable storage media include various storage devices and distributed storage devices for storing computer-readable data. Computer-readable storage media can include, for example, Blu-ray discs (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical data storage devices. In addition, computer-readable storage media include media implemented in the form of a carrier wave (e.g., transmission over the Internet). Furthermore, bitstreams generated by encoding methods can be stored in a computer-readable storage medium or transmitted via wired or wireless communication networks. Additionally, embodiments of this disclosure can be implemented as computer program products by program code, and the program code can be executed on a computer according to embodiments of this disclosure. The program code can be stored on a computer-readable medium.

[0502] The claims disclosed herein can be combined in various ways. For example, the technical features of the method claims can be combined to be implemented or performed in a device, and the technical features of the device claims can be combined to be implemented or performed in a method. Furthermore, the technical features of the method claims and the device claims can be combined to be implemented or performed in a device, and the technical features of the method claims and the device claims can be combined to be implemented or performed in a method.

Claims

1. A decoding device for image decoding, the decoding device comprising: Memory; as well as At least one processor, connected to the memory, is configured to: Receive a bit stream including residual information; The transformation coefficients of the current block are derived based on the residual information; The modified transform coefficients are derived by applying the low-frequency inseparable transform LFNST to the transform coefficients; and The residual samples of the current block are derived based on the inverse first-order transform of the modified transform coefficients. The at least one processor is further configured to: Whether to parse the LFNST index is determined based on whether the width and height of the current block meet the conditions for applying the LFNST. Whether the conditions for applying the LFNST are met is determined based on the tree type of the current block, the color format of the current block, and whether the intra-frame sub-partition ISP is applied to the current block. Specifically, when applying the ISP, the LFNST index is parsed based on the width and height of the sub-partition block being 4 or greater. Wherein, the current block is an encoding unit, and Specifically, the LFNST index is parsed based on the fact that the width and height of the encoding unit are equal to or less than the maximum brightness transformation size that can be used for transformation.

2. An encoding device for image encoding, the encoding device comprising: Memory; as well as At least one processor, connected to the memory, is configured to: Derive the predicted sample for the current block; The residual sample of the current block is derived based on the predicted sample; The transformation coefficients of the current block are derived based on a first transformation of the residual sample; The modified transform coefficients are derived from the transform coefficients by applying the low-frequency inseparable transform LFNST. and The residual information regarding the modified transform coefficients and the LFNST index associated with the LFNST matrix applied to the LFNST are encoded. Specifically, the LFNST index is encoded based on whether the width and height of the current block meet the conditions for applying the LFNST. Whether the conditions for applying the LFNST are met is determined based on the tree type of the current block, the color format of the current block, and whether the intra-frame sub-partition ISP is applied to the current block. Specifically, when applying the ISP, the LFNST index is encoded based on the width and height of the sub-partition block being equal to 4 or greater. Wherein, the current block is an encoding unit, and Specifically, the LFNST index is parsed based on the fact that the width and height of the encoding unit are equal to or less than the maximum brightness transformation size that can be used for transformation.

3. An apparatus for transmitting image data, the apparatus comprising: At least one processor is configured to obtain a bitstream of the image, wherein the bitstream is generated based on the following operations: deriving a prediction sample of the current block, deriving a residual sample of the current block based on the prediction sample, deriving transform coefficients of the current block based on a first transform of the residual sample, deriving modified transform coefficients from the transform coefficients by applying a low-frequency inseparable transform (LFNST), and encoding residual information about the modified transform coefficients and an LFNST index associated with an LFNST matrix applied to the LFNST; and A transmitter configured to transmit the data comprising the bit stream. Specifically, the LFNST index is encoded based on whether the width and height of the current block meet the conditions for applying the LFNST. Whether the conditions for applying the LFNST are met is determined based on the tree type of the current block, the color format of the current block, and whether the intra-frame sub-partition ISP is applied to the current block. Specifically, when applying the ISP, the LFNST index is encoded based on the width and height of the sub-partition block being equal to 4 or greater. Wherein, the current block is an encoding unit, and Specifically, the LFNST index is parsed based on the fact that the width and height of the encoding unit are equal to or less than the maximum brightness transformation size that can be used for transformation.