Image encoding / decoding method and apparatus and recording medium having bitstream stored therein
By employing an inseparable master transform and dimension reduction transform kernel, combined with coding parameter determination technology, the problem of insufficient compression efficiency for high-resolution and high-quality images is solved, achieving more efficient image encoding and decoding performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- LG ELECTRONICS INC
- Filing Date
- 2024-12-02
- Publication Date
- 2026-06-23
Smart Images

Figure CN122270918A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to image encoding / decoding methods and apparatus, as well as recording media for storing bit streams. Background Technology
[0002] Recently, the demand for high-resolution and high-quality images, such as HD (high-definition) and UHD (ultra-high-definition) images, has been increasing in various application fields, and therefore, efficient image compression technology is being discussed.
[0003] There are various techniques, such as inter-frame prediction techniques that use video compression to predict the pixel values included in the current image from images before or after the current image, intra-frame prediction techniques that use pixel information in the current image to predict the pixel values included in the current image, and entropy coding techniques that assign short symbols to values that occur frequently and long symbols to values that occur infrequently. These image compression techniques can be used to effectively compress image data and transmit or store it. Summary of the Invention
[0004] Technical issues
[0005] This disclosure aims to provide a method and apparatus for performing a transformation using an indivisible master transform.
[0006] This disclosure aims to provide a method and apparatus for performing a transformation using a dimension-reduced, non-separable master transform kernel.
[0007] This disclosure aims to provide a method and apparatus for determining / transmitting an indivisible transform core based on encoding parameters.
[0008] Technical solution
[0009] The image decoding method and apparatus of this disclosure can derive the transform coefficients of the current block from the bitstream, derive the residual samples of the current block based on the inverse transform of the transform coefficients of the current block, and reconstruct the current block based on the residual samples of the current block. Here, the current block can be a block based on inter-frame predictive coding. The transform set used for the inverse transform can be selected based on a predetermined intra-frame prediction mode for the current block.
[0010] In the image decoding method and apparatus according to the present disclosure, the inverse transformation can be performed based on at least one of a separable transformation or a non-separable transformation.
[0011] In the image decoding method and apparatus according to the present disclosure, when a sub-block transform is applied to the current block and an inseparable transform is applied to the current block, the transform type of the separable transform used for the current block can be determined as DCT-2.
[0012] In the image decoding method and apparatus according to the present disclosure, when a sub-block transform is applied to the current block and an inseparable transform is applied to the current block, the transform type for the separable transform of the current block can be determined to be one of a combination of DST-7 and DCT-8.
[0013] In the image decoding method and apparatus according to the present disclosure, the intra-prediction mode can be derived based on the prediction block of the current block by applying the decoder-side intra-mode derivation (DIMD) method.
[0014] In the image decoding method and apparatus according to the present disclosure, when a sub-block transform is applied to the current block, it can be determined whether an inseparable transform is applied to the current block based on whether the first sub-block of the current block corresponds to the size in which an inseparable transform is allowed.
[0015] In the image decoding method and apparatus according to the present disclosure, the intra-prediction mode can be determined based on the first sub-block of the current block by applying the decoder-side intra-mode derivation (DIMD) method.
[0016] In the image decoding method and apparatus according to the present disclosure, the transform set may include a plurality of transform kernel candidates, and an index indicating one of the plurality of transform kernel candidates may be transmitted by signal.
[0017] In the image decoding method and apparatus according to the present disclosure, an index can be sent by signaling based on whether the current block meets a predetermined zero-out condition.
[0018] In the image decoding method and apparatus according to the present disclosure, when the intra-block copy (IBC) mode is applied to the current block, the inseparable transform can be applied to the current block.
[0019] In the image decoding method and apparatus according to the present disclosure, the transform set for the inseparable transform can be determined based on whether the inter-frame prediction mode that performs prediction in the sub-block unit is applied to the current block.
[0020] The image coding method and apparatus of this disclosure can derive residual samples of the current block, derive transform coefficients of the current block based on the transform of the residual samples of the current block, and encode the transform coefficients of the current block. Here, the current block can be a block based on inter-frame predictive coding. The transform set used for the transform is selected based on a predetermined intra-frame predictive mode for the current block.
[0021] A computer-readable digital storage medium is provided that stores encoded video / image information, thereby causing an image decoding method to be performed by a decoding apparatus according to the present disclosure.
[0022] A computer-readable digital storage medium is provided according to the present disclosure for storing video / image information generated according to an image encoding method.
[0023] A method and apparatus for transmitting video / image information generated according to an image encoding method are provided according to the present disclosure.
[0024] Beneficial effects
[0025] This disclosure can improve the performance of the transform by using an inseparable principal transform as the principal transform.
[0026] This disclosure can improve the performance of the transformation by using a dimensionally reduced, non-separable master transform kernel to perform the transformation.
[0027] This disclosure can improve coding efficiency by effectively determining and / or signaling an inseparable transform kernel based on coding parameters. Attached Figure Description
[0028] Figure 1 A video / image encoding system according to this disclosure is shown.
[0029] Figure 2 A schematic block diagram illustrating an encoding apparatus to which embodiments of the present disclosure are applicable and to perform encoding of video / image signals is shown.
[0030] Figure 3 A schematic block diagram of a decoding apparatus to which embodiments of the present disclosure are applicable and to perform decoding of video / image signals is shown.
[0031] Figure 4 The illustration shows an image decoding method performed by a decoding device (300) according to an embodiment of the present disclosure.
[0032] Figure 5 An intra-frame prediction mode and its prediction direction according to this disclosure are illustrated by way of example.
[0033] Figure 6 The figure shows a flowchart of a method for deriving a DIMD pattern according to the present disclosure.
[0034] Figure 7 The diagram illustrates a filter for deriving DIMD modes according to this disclosure.
[0035] Figure 8 The figure shows a schematic configuration of a decoding apparatus (300) for performing an image decoding method according to the present disclosure.
[0036] Figure 9 The illustration shows an image encoding method performed by an encoding device (200) according to an embodiment of the present disclosure.
[0037] Figure 10The figure shows a schematic configuration of an encoding apparatus (200) for performing an image encoding method according to the present disclosure.
[0038] Figure 11 Examples of content streaming systems to which embodiments of this disclosure can be applied are shown. Detailed Implementation
[0039] Because this disclosure can be modified in various ways and has several embodiments, specific embodiments will be illustrated in the accompanying drawings and described in detail in the detailed description. However, this disclosure is not intended to be limited to the specific embodiments and should be understood to include all variations, equivalents, and substitutions included within the spirit and scope of this disclosure. In describing each drawing, similar reference numerals are used for similar components.
[0040] Terms such as "first," "second," etc., may be used to describe various components, but components should not be limited by these terms. These terms are used only to distinguish one component from other components. For example, without departing from the scope of this disclosure, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component. Terms and / or combinations of any one or more related statement items are included.
[0041] When a component is described as "connected" or "linked" to another component, it should be understood that it can be directly connected or linked to another component, but there may also be another component in between. On the other hand, when a component is described as "directly connected" or "directly linked" to another component, it should be understood that there is no other component in between.
[0042] The terminology used in this application is for describing particular embodiments only and is not intended to limit this disclosure. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, it should be understood that terms such as “comprising” or “having” are intended to designate the presence of features, numbers, steps, operations, components, portions, or combinations thereof described in the specification, but do not preclude the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, portions, or combinations thereof.
[0043] This disclosure relates to video / image coding. For example, the methods / exercises disclosed herein can be applied to methods disclosed in the Universal Video Coding (VVC) standard. Additionally, the methods / exercises disclosed herein can be applied to methods disclosed in the Basic Video Coding (EVC) standard, the AOMedia Video 1 (AV1) standard, the second-generation Audio Video Coding (AVS2) standard, or next-generation video / image coding standards (e.g., H.267 or H.268).
[0044] This specification presents various embodiments of video / image encoding, and unless otherwise stated, these embodiments may be combined with each other to perform the task.
[0045] Here, video can refer to a collection of images over time. An image generally refers to a unit representing an image within a specific time period, and a tile is a unit that forms part of an image during encoding. A tile can include at least one Code Tree Unit (CTU). An image can consist of at least one tile. A tile is a rectangular area consisting of multiple CTUs within a specific tile column and a specific tile row of an image. A tile column is a rectangular area of CTUs with the same height as the image and a width assigned by the syntax requirements of the image parameter set. A tile row is a rectangular area of CTUs with the same height assigned by the image parameter set and a width equal to the width of the image. CTUs within a tile can be arranged consecutively according to a CTU raster scan, and tiles within an image can be arranged consecutively according to a tile raster scan. A tile can include an integer number of complete tiles or an integer number of consecutive complete CTU rows that can be exclusively included within a single NAL unit of an image. Simultaneously, an image can be divided into at least two sub-images. A sub-image can be a rectangular area of at least one tile within an image.
[0046] A pixel, cell, or pixel unit can refer to the smallest unit that makes up a picture (or image). Additionally, "sample" can be used as the term corresponding to a pixel. A sample can typically represent a pixel or pixel value, and can represent only the pixel / pixel value of the luminance component, or only the pixel / pixel value of the chrominance component.
[0047] A unit can represent a basic unit of image processing. A unit may include a specific region of an image and at least one of the information associated with that region. A unit may include a luminance block and two chrominance (e.g., Cb, cr) blocks. In some cases, units may be used interchangeably with terms such as block or region. In general, an MxN block may include a set (or array) of transform coefficients or samples (or sample arrays) consisting of M columns and N rows.
[0048] Here, "A or B" can refer to "A only", "B only", or "both A and B". In other words, "A or B" can be interpreted as "A and / or B". For example, "A, B or C" can refer to "A only", "B only", "C only", or "any combination of A, B and C".
[0049] The forward slash ( / ) or comma used in this article can refer to "and / or". For example, "A / B" can refer to "A and / or B". Therefore, "A / B" can refer to "A only", "B only", or "both A and B". For example, "A, B, C" can refer to "A, B, or C".
[0050] Here, "at least one of A and B" can refer to "only A", "only B" or "both A and B". Furthermore, expressions such as "at least one of A or B" or "at least one of A and / or B" can be interpreted in the same way as "at least one of A and B".
[0051] Additionally, here, "at least one of A, B, and C" can refer to "A only", "B only", "C only" or "any combination of A, B, and C". Furthermore, "at least one of A, B, or C" or "at least one of A, B, and / or C" can refer to "at least one of A, B, and C".
[0052] Additionally, the parentheses used in this document can refer to "for example". Specifically, when the indication is "prediction (intra-frame prediction)", "intra-frame prediction" can be cited as an example of "prediction". In other words, "prediction" here is not limited to "intra-frame prediction", and "intra-frame prediction" can be cited as an example of "prediction". Furthermore, even when the indication is "prediction (i.e., intra-frame prediction)", "intra-frame prediction" can be cited as an example of "prediction".
[0053] Here, a technical feature described individually in a single figure can be implemented individually or simultaneously.
[0054] Figure 1 A video / image encoding system according to this disclosure is shown.
[0055] refer to Figure 1 A video / image encoding system may include a first device (source device) and a second device (receiving device).
[0056] A source device can transmit encoded video / image information or data to a receiving device in the form of a file or stream via digital storage media or a network. The source device may include a video source, an encoding device, and a transmitting unit. The receiving device may include a receiving unit, a decoding device, and a renderer. The encoding device may be referred to as a video / image encoding device, and the decoding device may be referred to as a video / image decoding device. A transmitter may be included in the encoding device. A receiver may be included in the decoding device. The renderer may include a display unit, and the display unit may consist of a separate device or external components.
[0057] A video source can acquire video / images through the process of capturing, compositing, or generating video / images. A video source can include devices for capturing video / images and devices for generating video / images. Devices for capturing video / images can include at least one camera, video / image archives containing previously captured video / images, etc. Devices for generating video / images can include computers, tablets, smartphones, etc., and can generate video / images (electronically). For example, virtual video / images can be generated by computers, etc., and in this case, the process of capturing video / images can be replaced by the process of generating related data.
[0058] An encoding device can encode input video / images. The encoding device can perform a series of processes such as prediction, transformation, and quantization for compression and encoding efficiency. The encoded data (encoded video / image information) can be output as a bitstream.
[0059] The transmitting unit can send encoded video / image information or data, output in bitstream form, to the receiving unit of the receiving device via digital storage media or a network, either as a file or through streaming. Digital storage media can include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmitting unit can include elements for generating media files according to a predetermined file format and may include elements for transmission over broadcast / communication networks. The receiving unit can receive / extract the bitstream and send it to a decoding device.
[0060] Decoding devices can decode video / images by performing a series of processes, such as dequantization, inverse transform, and prediction, that correspond to the operations of encoding devices.
[0061] The renderer can render decoded video / images. The rendered video / images can be displayed through a display unit.
[0062] Figure 2 A rough block diagram of an encoding apparatus that can be applied to embodiments of the present disclosure and perform encoding of video / image signals is shown.
[0063] refer to Figure 2The encoding device 200 may consist of an image segmenter 210, a predictor 220, a residual processor 230, an entropy encoder 240, an adder 250, a filter 260, and a memory 270. The predictor 220 may include an inter-frame predictor 221 and an intra-frame predictor 222. The residual processor 230 may include a transformer 232, a quantizer 233, a dequantizer 234, and an inverse transformer 235. The residual processor 230 may further include a subtractor 231. The adder 250 may be referred to as a reconstructor or a reconstruction block generator. According to an embodiment, the image segmenter 210, predictor 220, residual processor 230, entropy encoder 240, adder 250, and filter 260 may be configured by at least one hardware component (e.g., an encoder chipset or processor). Additionally, the memory 270 may include a decoded picture buffer (DPB) and may be configured by a digital storage medium. The hardware component may further include the memory 270 as an internal / external component.
[0064] Image segmenter 210 can partition an input image (or picture, frame) input to encoding device 200 into at least one processing unit. As an example, a processing unit can be referred to as a coding unit (CU). In this case, the coding unit can be recursively partitioned from coding tree unit (CTU) or maximum coding unit (LCU) according to a quadtree-binary-tritree (QTBTTT) structure.
[0065] For example, a coding unit can be segmented into multiple coding units of greater depth based on a quadtree structure, a binary tree structure, and / or a ternary structure. In this case, for example, a quadtree structure can be applied first, and a binary tree structure and / or a ternary structure can be applied later. Alternatively, a binary tree structure can be applied before the quadtree structure. The coding process according to this specification can be performed based on the final coding unit that is no longer segmented. In this case, based on coding efficiency according to image characteristics, the largest coding unit can be directly used as the final coding unit, or if necessary, the coding unit can be recursively segmented into deeper coding units, and the coding unit with the optimal size can be used as the final coding unit. Here, the coding process can include processes such as prediction, transformation, and reconstruction, as described later.
[0066] As another example, the processing unit may further include a prediction unit (PU) or a transform unit (TU). In this case, the prediction unit and the transform unit may be divided or segmented from the aforementioned final encoding unit, respectively. The prediction unit may be a unit for predicting samples, and the transform unit may be a unit for deriving transform coefficients and / or a unit for deriving residual signals from transform coefficients.
[0067] In some cases, a unit can be used interchangeably with terms such as block or region. Generally, an MxN block can represent a set of transform coefficients or samples consisting of M columns and N rows. Samples can typically represent pixels or pixel values, and can represent only the pixel / pixel value of the luminance component, or only the pixel / pixel value of the chrominance component. Samples can be used as a term to correspond a picture (or image) to pixels or cells.
[0068] The encoding device 200 can subtract the prediction signal (prediction block, prediction sample array) output from the inter-frame predictor 221 or the intra-frame predictor 222 from the input image signal (original block, original sample array) to generate a residual signal (residual signal, residual sample array), and the generated residual signal is sent to the converter 232. In this case, the unit in the encoding device 200 that subtracts the prediction signal (prediction block, prediction sample array) from the input image signal (original block, original sample array) can be called the subtractor 231.
[0069] Predictor 220 can perform prediction on the block to be processed (hereinafter referred to as the current block) and generate a block of predictions including prediction samples for the current block. Predictor 220 can determine whether to apply intra-frame prediction or inter-frame prediction on a block or CU basis. Predictor 220 can generate various information about the prediction, such as prediction mode information, and send it to entropy encoder 240, as described later in the description of each prediction mode. The information about the prediction can be encoded in entropy encoder 240 and output as a bitstream.
[0070] Intra-predictor 222 can predict the current block by referencing samples within the current image. Depending on the prediction mode, the referenced samples can be located near the current block or at a distance from it. In intra-prediction, the prediction mode can include at least one non-directional mode and multiple directional modes. The non-directional mode can include at least one of a DC mode or a planar mode. Depending on the level of detail of the prediction direction, the directional modes can include 33 or 65 directional modes. However, this is just an example, and more or fewer directional modes can be used depending on the configuration. Intra-predictor 222 can determine the prediction mode applied to the current block by using prediction modes applied to neighboring blocks.
[0071] Inter-frame predictor 221 can derive a prediction block for the current block based on a reference block (reference sample array) specified by motion vectors on a reference image. In this case, to reduce the amount of motion information transmitted in the inter-frame prediction mode, motion information can be predicted on a block, sub-block, or sample basis based on the correlation between motion information between neighboring blocks and the current block. Motion information may include motion vectors and reference image indices. Motion information may further include inter-frame prediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). For inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. The reference image including the reference block and the reference image including the temporally neighboring block may be the same or different. The temporally neighboring block may be referred to as a juxtaposed reference block, juxtaposed CU (colCU), etc., and the reference image including the temporally neighboring block may be referred to as a juxtaposed image (colPic). For example, inter-frame predictor 221 can configure a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to derive the motion vector and / or reference image index of the current block. Inter-frame prediction can be performed based on various prediction modes, and for example, for skip mode and merge mode, the inter-frame predictor 221 can use the motion information of neighboring blocks as the motion information of the current block. For skip mode, unlike merge mode, residual signals may not be sent. For motion vector prediction (MVP) mode, the motion vectors of neighboring blocks are used as motion vector predictors, and the motion vector difference is signaled to indicate the motion vector of the current block.
[0072] Predictor 220 can generate a prediction signal based on various prediction methods described later. For example, the predictor can not only apply intra-frame prediction or inter-frame prediction to predict a block, but also apply both intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as a combined intra-frame and inter-frame prediction (CIIP) mode. Alternatively, the predictor can be based on an intra-block copy (IBC) prediction mode or a palette mode for prediction against a block. The IBC prediction mode or palette mode can be used for content image / video coding such as screen content coding (SCC) in games, etc. IBC essentially performs prediction within the current image, but it can be performed similarly to inter-frame prediction because it derives a reference block within the current image. In other words, IBC can use at least one of the inter-frame prediction techniques described herein. A palette mode can be considered an example of intra-frame coding or intra-frame prediction. When a palette mode is applied, sample values within the image can be signaled based on information about the palette table and palette index. The prediction signal generated by predictor 220 can be used to generate a reconstructed signal or a residual signal.
[0073] Transformer 232 can generate transform coefficients by applying a transform technique to the residual signal. For example, the transform technique may include at least one of Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen-Loève Transform (KLT), Graphical Based Transform (GBT), or Conditional Nonlinear Transform (CNT). Here, GBT refers to the transform obtained from a graphic when the relationship information between pixels is expressed as a graphic. CNT refers to the transform obtained based on generating a prediction signal using all previously reconstructed pixels. Furthermore, the transform process can be applied to square pixel blocks of the same size or to non-square blocks of variable size.
[0074] Quantizer 233 can quantize the transform coefficients and send them to entropy encoder 240, which can encode the quantized signal (information about the quantized transform coefficients) and output it as a bitstream. The information about the quantized transform coefficients can be referred to as residual information. Quantizer 233 can rearrange the quantized transform coefficients in block form into a 1D vector form based on the coefficient scan order, and can generate information about the quantized transform coefficients based on the 1D vector form of the quantized transform coefficients.
[0075] The entropy encoder 240 can perform various encoding methods, such as exponential Columbus coding, context-adaptive variable-length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). The entropy encoder 240 can encode information necessary for video / video image reconstruction (e.g., values of syntax elements, etc.) in addition to the transform coefficients quantized together or individually.
[0076] Encoded information (e.g., encoded video / image information) can be transmitted or stored in bitstream form at the network abstraction layer (NAL) unit level. The video / image information may further include information about various parameter sets such as adaptive parameter sets (APS), picture parameter sets (PPS), sequence parameter sets (SPS), or video parameter sets (VPS). Additionally, the video / image information may further include general constraint information. Here, information transmitted from the encoding device / signaled to the decoding device and / or syntax elements can be included in the video / image information. The video / image information can be encoded by the above-described encoding process and included in the bitstream. The bitstream can be transmitted over a network or stored in a digital storage medium. Here, the network may include broadcast networks and / or communication networks, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. Transmission units (not shown) for transmission and / or storage units (not shown) for storing signals output from the entropy encoder 240 can be configured as internal / external elements of the encoding device 200, or the transmission unit may also be included in the entropy encoder 240.
[0077] The quantized transform coefficients output from quantizer 233 can be used to generate a prediction signal. For example, the residual signal (residual block or residual sample) can be reconstructed by applying dequantization and inverse transform to the quantized transform coefficients using dequantizer 234 and inverse transformer 235. Adder 250 can add the reconstructed residual signal to the prediction signal output from inter-frame predictor 221 or intra-frame predictor 222 to generate a reconstructed signal (reconstructed image, reconstructed block, reconstructed sample array). When there is no residual for the block to be processed, such as when a skip mode is applied, the prediction block can be used as a reconstructed block. Adder 250 can be referred to as a reconstructor or reconstructed block generator. The generated reconstructed signal can be used for intra-frame prediction of the next block to be processed within the current image, and can also be used for inter-frame prediction of the next image by filtering, which will be described later. Meanwhile, a luminance mapping with chroma scaling (LMCS) can be applied during image encoding and / or reconstruction.
[0078] Filter 260 can improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, filter 260 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image, and the modified reconstructed image can be stored in memory 270, specifically in the DPB of memory 270. Various filtering methods can include deblocking filtering, sample adaptive shifting, adaptive loop filtering, bilateral filtering, etc. Filter 260 can generate various information about the filtering and send it to entropy encoder 240. The information about the filtering can be encoded in entropy encoder 240 and output as a bitstream.
[0079] The modified reconstructed image sent to memory 270 can be used as a reference image in inter-frame predictor 221. When inter-frame prediction is applied through it, the encoding device can avoid prediction mismatch in encoding device 200 and decoding device, and can also improve encoding efficiency.
[0080] The DPB of memory 270 can store modified reconstructed images for use as reference images in inter-frame predictor 221. Memory 270 can store motion information of blocks from which motion information in the current image is derived (or encoded) and / or motion information of blocks in the pre-reconstructed image. The stored motion information can be sent to inter-frame predictor 221 to be used as motion information for spatially or temporally neighboring blocks. Memory 270 can store reconstructed samples of reconstructed blocks in the current image and send them to intra-frame predictor 222.
[0081] Figure 3 A rough block diagram of a decoding apparatus that can be applied to embodiments of the present disclosure and perform decoding of video / image signals is shown.
[0082] refer to Figure 3 The decoding device 300 can be configured to include an entropy decoder 310, a residual processor 320, a predictor 330, an adder 340, a filter 350, and a memory 360. The predictor 330 may include an inter-frame predictor 332 and an intra-frame predictor 331. The residual processor 320 may include a dequantizer 321 and an inverse transformer 321.
[0083] According to an embodiment, the entropy decoder 310, residual processor 320, predictor 330, adder 340, and filter 350 described above can be configured by a single hardware component (e.g., a decoder chipset or processor). Additionally, the memory 360 may include a decoded image buffer (DPB) and can be configured by a digital storage medium. The hardware component may further include the memory 360 as an internal / external component.
[0084] When the input includes a bitstream containing video / image information, the decoding device 300 can respond to... Figure 2 The decoding device 300 reconstructs an image by processing video / image information in its encoding apparatus. For example, the decoding device 300 can derive units / blocks based on information related to block segmentation obtained from the bitstream. The decoding device 300 can perform decoding by using processing units applied in the encoding apparatus. Therefore, the processing unit for decoding can be an encoding unit, and the encoding unit can be segmented from encoding tree units or maximally encoded units according to a quadtree structure, binary tree structure, and / or ternary tree structure. At least one transform unit can be derived from the encoding unit. Furthermore, the reconstructed image signal decoded and output by the decoding device 300 can be played back by a playback device.
[0085] Decoding device 300 can receive data in bitstream form from... Figure 2 The signal output by the encoding device and the received signal can be decoded by the entropy decoder 310. For example, the entropy decoder 310 can parse the bitstream to derive information (e.g., video / image information) necessary for image reconstruction (or picture reconstruction). The video / image information may further include information about various parameter sets such as adaptive parameter sets (APS), picture parameter sets (PPS), sequence parameter sets (SPS), or video parameter sets (VPS). In addition, the video / image information may further include general constraint information. The decoding device can further decode the picture based on the information about the parameter sets and / or the general constraint information. The information sent / received by the signal and / or the syntax elements described later herein can be decoded and obtained from the bitstream through the decoding process. For example, the entropy decoder 310 can decode the information in the bitstream based on encoding methods such as exponential Golomb coding, CAVLC, CABAC, etc., and output the values of the syntax elements necessary for image reconstruction and the quantized values of the transform coefficients of the residuals. More specifically, the CABAC entropy decoding method can receive bins corresponding to each syntax element from the bitstream, determine a context model using information about the syntax element to be decoded, decoding information of neighboring blocks and the block to be decoded, or information about symbols / bins decoded in the previous step, perform arithmetic decoding on the bins by predicting the occurrence probability of the bins based on the determined context model, and generate symbols corresponding to the value of each syntax element. In this case, after determining the context model, the CABAC entropy decoding method can update the context model by using information about the decoded symbols / bins for the context model used for the next symbol / bin. Among the information decoded in the entropy decoder 310, information about prediction is provided to the predictors (inter-frame predictor 332 and intra-frame predictor 331), and the residual values of entropy decoding performed on them in the entropy decoder 310, i.e., the quantized transform coefficients and related parameter information, can be input to the residual processor 320. The residual processor 320 can derive residual signals (residual blocks, residual samples, residual sample arrays). In addition, information about filtering among the information decoded in the entropy decoder 310 can be provided to the filter 350. Meanwhile, the receiving unit (not shown) that receives the signal output from the encoding device can be further configured as an internal / external element of the decoding device 300 or the receiving unit can be a component of the entropy decoder 310.
[0086] Furthermore, the decoding device according to this specification can be referred to as a video / image / picture decoding device, and the decoding device can be divided into an information decoder (video / image / picture information decoder) and a sample decoder (video / image / picture sample decoder). The information decoder may include an entropy decoder 310, and the sample decoder may include at least one of a dequantizer 321, an inverse transformer 322, an adder 340, a filter 350, a memory 360, an inter-frame predictor 332, and an intra-frame predictor 331.
[0087] Dequantizer 321 can dequantize the quantized transform coefficients and output the transform coefficients. Dequantizer 321 can rearrange the quantized transform coefficients into a two-dimensional block form. In this case, the rearrangement can be performed based on the coefficient scan order performed in the encoding device. Dequantizer 321 can obtain the transform coefficients by performing dequantization on the quantized transform coefficients using quantization parameters (e.g., quantization step size information).
[0088] The inverse transformer 322 performs an inverse transformation on the transformation coefficients to obtain the residual signal (residual block, residual sample array).
[0089] Predictor 320 can perform prediction on the current block and generate a prediction block including prediction samples for the current block. Predictor 320 can determine whether to apply intra-frame prediction or inter-frame prediction to the current block based on the prediction information output from entropy decoder 310, and determine a specific intra-frame / inter-frame prediction mode.
[0090] Predictor 320 can generate prediction signals based on various prediction methods described later. For example, predictor 320 can not only apply intra-frame prediction or inter-frame prediction to predict a block, but also apply intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as a combined intra-frame and inter-frame prediction (CIIP) mode. Alternatively, the predictor can be based on an intra-block copy (IBC) prediction mode or a palette mode for block prediction. The IBC prediction mode or palette mode can be used for content image / video coding such as screen content coding (SCC) in games, etc. IBC essentially performs prediction within the current frame, but it can be performed similarly to inter-frame prediction because it derives a reference block within the current frame. In other words, IBC can use at least one of the inter-frame prediction techniques described herein. Palette mode can be considered an example of intra-frame coding or intra-frame prediction. When a palette mode is applied, information about the palette table and palette index can be included in the video / image information and transmitted as a signal.
[0091] Intra-predictor 331 can predict the current block by referencing samples within the current image. Depending on the prediction mode, the referenced samples can be located near the current block or at a certain distance away from the current block. In intra-prediction, the prediction mode can include at least one non-directional mode and multiple directional modes. Intra-predictor 331 can determine the prediction mode applied to the current block by using prediction modes applied to neighboring blocks.
[0092] Inter-frame predictor 332 can derive a prediction block for the current block based on a reference block (reference sample array) specified by motion vectors on a reference image. In this case, to reduce the amount of motion information transmitted in the inter-frame prediction mode, motion information can be predicted on a block, sub-block, or sample basis based on the correlation between motion information of neighboring blocks and the current block. Motion information may include motion vectors and reference image indices. Motion information may further include inter-frame prediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). For inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. For example, inter-frame predictor 332 can configure a motion information candidate list based on neighboring blocks and derive the motion vector and / or reference image index of the current block based on received candidate selection information. Inter-frame prediction can be performed based on various prediction modes, and information about the prediction may include information indicating the inter-frame prediction mode used for the current block.
[0093] Adder 340 can add the obtained residual signal to the prediction signal (prediction block, prediction sample array) output from the predictor (including inter-frame predictor 332 and / or intra-frame predictor 331) to generate a reconstruction signal (reconstructed image, reconstruction block, reconstruction sample array). When there is no residual for the block to be processed, such as when a skip mode is applied, the prediction block can be used as the reconstruction block.
[0094] Adder 340 can be referred to as a reconstructor or reconstructed block generator. The generated reconstructed signal can be used for intra-frame prediction of the next block to be processed in the current image, output by filtering as described later, or it can be used for inter-frame prediction of the next image. Meanwhile, a luminance map with chroma scaling (LMCS) can be applied during image decoding.
[0095] Filter 350 can improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, filter 350 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image, and send the modified reconstructed image to memory 360, specifically the DPB of memory 360. Various filtering methods can include deblocking filtering, adaptive sampling offset, adaptive loop filtering, bilateral filtering, etc.
[0096] The (modified) reconstructed image stored in the DPB of memory 360 can be used as a reference image in inter-frame predictor 332. Memory 360 can derive (or decode) motion information of blocks from its current image and / or motion information of blocks in the pre-reconstructed image. The stored motion information can be sent to inter-frame predictor 260 as motion information for spatially or temporally neighboring blocks. Memory 360 can store reconstructed samples of reconstructed blocks in the current image and send them to intra-frame predictor 331.
[0097] Here, the embodiments described in the encoding device 200’s filter 260, inter-frame predictor 221 and intra-frame predictor 222 can also be applied equally or correspondingly to the decoding device 300’s filter 350, inter-frame predictor 332 and intra-frame predictor 331, respectively.
[0098] Figure 4 The illustration shows an image decoding method performed by a decoding apparatus (300) according to an embodiment of the present disclosure.
[0099] refer to Figure 4 The transform coefficients of the current block can be derived from the bitstream (S400). In other words, the bitstream can include the residual information of the current block, and the transform coefficients of the current block can be derived by decoding the residual information.
[0100] refer to Figure 4 The residual sample of the current block can be derived by performing at least one of dequantization and inverse transformation on the transform coefficients of the current block (S410).
[0101] When Adaptive Multiple Transform Selection (MTS) is applied, the inverse transform can be performed based on at least one of DCT-2, DST-7, or DCT-8. Here, DCT-2, DST-7, DCT-8, etc., can be referred to as transform type, transform kernel, or transform core.
[0102] In this disclosure, the inverse transform may refer to a separable transform. However, it is not limited thereto; the inverse transform may refer to an inseparable transform, or it may be a concept that includes both separable and inseparable transforms. Furthermore, the inverse transform in this disclosure refers to the primary transform, but is not limited thereto, and can be applied to the secondary transform by being modified to the same / similar form.
[0103] For example, as a method for inverse transformation, DCT-2 and the non-separable transformation can be used alone, or the non-separable transformation can be used in addition to at least one of DCT-2, DST-7 or DCT-8, or the non-separable transformation can replace one or more of the transform kernels of DCT-2, DST-7 or DCT-8.
[0104] As a more specific embodiment, when (DCT-2, DCT-2), (DST-7, DST-7), (DCT-8, DST-7), (DST-7, DCT-8), and (DCT-8, DCT-8) are present as transform kernel candidates for separable transforms, non-separable transforms can replace or be added to one or more of the five transform kernel candidates. Here, the notation (transformer 1, transformer 2) indicates that transform 1 is applied in the horizontal direction and transform 2 is applied in the vertical direction. When a non-separable transform replaces a portion of the transform kernel candidates, the remaining transform kernel candidates other than (DCT-2, DCT-2) and (DST-7, DST-7) can be replaced with non-separable transforms. However, the transform kernel candidates described above are merely examples, and other types of DCTs and / or DSTs may be included, and transform skips may be included as transform kernel candidates.
[0105] An inseparable transformation can refer to a transformation or inverse transformation based on an inseparable transformation matrix. That is, unlike a separable transformation, which performs horizontal and vertical transformations independently by separating the vertical and horizontal transformations, an inseparable transformation can perform both horizontal and vertical transformations simultaneously.
[0106] For example, when performing an inseparable transformation on a 4x4 block, the input data X to the inseparable transformation is shown in Equation 1 below.
[0107] [Formula 1]
[0108] When the input data X is expressed in vector form, the vector X' can be expressed as follows.
[0109] [Equation 2]
[0110] In this case, the inseparable transformation can be performed as shown in Equation 3 below.
[0111] [Formula 3]
[0112] In Equation 3, F represents the transformation coefficient vector, T represents the 16x16 inseparable transformation matrix, and It represents the product of a matrix and a vector.
[0113] The 16x1 transform coefficient vector F can be derived using Equation 3, and F can be reconfigured into 4x4 blocks according to a predetermined scan order. The scan order can be horizontal scan, vertical scan, diagonal scan, z-scan, raster scan, or a predefined scan.
[0114] The set of inseparable transforms and / or the transform kernel used for inseparable transforms can be configured differently based on at least one of the following: prediction mode (e.g., intra-frame mode, inter-frame mode, etc.), width, height or number of pixels of the current block, position of sub-blocks within the current block, syntax elements explicitly sent by signaling, statistical characteristics of neighboring samples, whether auxiliary transforms or quantization parameters (QP) are used.
[0115] Specifically, for intra-frame modes, predefined intra-frame prediction modes can be grouped to correspond to n inseparable transform sets, and each inseparable transform set can include k transform kernel candidates. Here, n and k can be arbitrary constants, according to the same rules (conditions) defined for the encoding and decoding devices.
[0116] The number of inseparable transform sets and / or the number of transform kernel candidates included in each inseparable transform set can be configured differently depending on the width and / or height of the current block. For example, for a 4x4 block, n1 inseparable transform sets and k1 transform kernel candidates can be configured. For a 4x8 block, n2 inseparable transform sets and k2 transform kernel candidates can be configured. Furthermore, the number of inseparable transform sets and the number of transform kernel candidates included in each inseparable transform set can be configured differently depending on the product of the width and height of the current block. For example, when the product of the width and height of the current block is equal to or greater than 256, n3 inseparable transform sets and k3 transform kernel candidates can be configured, and otherwise, n4 inseparable transform sets and k4 transform kernel candidates can be configured. In other words, because the degree of variation in the statistical properties of the residual signal varies depending on the block size, the number of inseparable transform sets and transform kernel candidates can be configured differently to reflect this.
[0117] When the current block is divided into multiple sub-blocks, the statistical characteristics of the residual signal may differ for each sub-block, and therefore the number of inseparable transform sets and transform kernel candidates can be configured differently. For example, when a 4x8 or 8x4 block is divided into two 4x4 sub-blocks and an inseparable transform is applied to each sub-block, n5 inseparable transform sets and k5 transform kernel candidates can be configured for the top-left 4x4 sub-block, and n6 inseparable transform sets and k6 transform kernel candidates can be configured for the other 4x4 sub-blocks.
[0118] Based on syntax elements explicitly signaled using signals, the number of inseparable transform sets and transform kernel candidates can be configured differently. As syntax elements, information indicating one of multiple inseparable transform configurations can be used. For example, when three inseparable transform configurations are supported (i.e., n7 inseparable transform sets and k7 transform kernel candidates, n8 inseparable transform sets and k8 transform kernel candidates, and n9 inseparable transform sets and k9 transform kernel candidates), the syntax elements can have values of 0, 1, and 2, and the inseparable transform configuration applied to the current block can be determined based on the value of the syntax element signaled using signals.
[0119] The number of inseparable transform sets and transform kernel candidates can be configured differently depending on whether and / or which auxiliary transform is applied. For example, when no auxiliary transform is applied, a set including n... 10 A set of inseparable transformations and k 10 An inseparable transformation configuration of n transformation kernel candidates. When applying the auxiliary transformation, it is possible to apply n... 11 A set of inseparable transformations and k 11 Inseparable transformation configuration of a transform kernel candidate.
[0120] Based on the quantization parameter (QP) and / or the range to which the QP value belongs, different configurations of the inseparable transformation can be applied. For example, when the QP value is small, configurations including n can be applied. 12 A set of inseparable transformations and k 12 An inseparable transformation configuration of n transformation kernel candidates. On the other hand, when the QP value is large, an application including n... 13 A set of inseparable transformations and k 13 The non-separable transform configuration of each transform kernel candidate. When the QP value is less than or equal to a threshold (e.g., 32), the case is classified as having a small QP value; otherwise, the case is classified as having a large QP value. Alternatively, the range of QP values can be divided into three or more, and a different non-separable transform configuration can be applied to each range.
[0121] For relatively large blocks, instead of using an inseparable transformation corresponding to the block's width and height, the block can be divided into multiple sub-blocks, and inseparable transformations corresponding to the width and height of the sub-blocks can be used. For example, when performing an inseparable transformation on a 4x8 block, it can be divided into two 4x4 sub-blocks, and the inseparable transformation based on the 4x4 block can be applied to each of the 4x4 sub-blocks. Alternatively, an 8x16 block can be divided into two 8x8 sub-blocks, and inseparable transformations based on the 8x8 block can be used.
[0122] The set of non-separable transforms can be determined based on the intra-prediction mode and mapping table of the current block. The mapping table defines the mapping relationship between predefined intra-prediction modes and the set of non-separable transforms. The predefined intra-prediction modes can include two non-directional modes and 65 directional modes. Typically, non-separable transforms have a larger transform kernel size than separable transforms. This means that the computational complexity of the transform process is higher, and the memory required to store the transform kernel is larger. Furthermore, while separable transforms may only consider statistical properties existing in the horizontal and / or vertical directions, non-separable transforms can consider statistical properties in a two-dimensional space including both the horizontal and vertical directions, thus providing better compression efficiency. Because the statistical properties and diversity of residuals vary depending on the orientation of the intra-prediction mode, there may be cases where non-separable transforms are absolutely necessary, and there may be intra-prediction modes whose residual properties can be fully identified only by separable transforms. Therefore, by predefining in the encoding and decoding units which transforms to use based on the intra-prediction mode, a transform process with optimized complexity and memory requirements can be designed. Non-directional modes may include the planar mode numbered 0 and the DC mode numbered 1, while directional modes may include intra-prediction modes numbered 2 to 66. However, this is merely an example, and this disclosure can also be applied to cases where the predefined intra-prediction modes have different numbers.
[0123] Due to the application of Wide-Angle Intra-Prediction (WAIP), the predefined intra-prediction modes can be further included to include intra-prediction modes from -14 to -1 and intra-prediction modes from 67 to 80.
[0124] Figure 5 An intra-frame prediction mode and its prediction direction according to this disclosure are illustrated exemplarily. References Figure 5 Patterns -14 to -1, 2 to 33, and 35 to 80 are symmetrical with respect to pattern 34 in terms of prediction direction. For example, patterns 10 and 58 are symmetrical with respect to the direction corresponding to pattern 34, and pattern -1 is symmetrical with respect to pattern 67. Therefore, for vertical orientation patterns that are symmetrical with respect to the horizontal orientation patterns with respect to pattern 34, the input data can be transposed and used. Transposing the input data means that the rows and columns in the MxN input data of the two-dimensional block are transformed into columns and rows, respectively, to form NxM data.
[0125] For example, when using 4x4 blocks, the 16 data points forming the 4x4 blocks can be arranged appropriately to form a 16x1 1D vector for the inseparable transformation. In this case, the 1D vector can be formed in row-major or column-major order. The residual samples resulting from the inseparable transformation can be arranged in the above order to form a 2D block.
[0126] For modes -14 to -1 and 2 to 33, the data used to form the 16x1 input vector is arranged in row priority order; for modes 35 to 80, the input vector can be formed in column priority order.
[0127] Pattern 34 cannot be considered either a horizontally oriented pattern or a vertically oriented pattern, but in this disclosure, it is classified as a horizontally oriented pattern. That is, for patterns -14 to -1 and 2 to 33, the input data arrangement method used for horizontally oriented patterns is used, i.e., row priority order, and for vertically oriented patterns that are symmetrical with respect to pattern 34, the input data can be transposed and used.
[0128] For non-square blocks, the symmetry in square blocks cannot be utilized (i.e., the symmetry between mode P and mode (68-P) in an NxN block (2<=P<=33) or the symmetry between mode Q and mode (66-Q) (-14<=Q<=-1)). Therefore, in addition to relying solely on the symmetry of intra-predicted modes, the symmetry between block shapes that are transposes of each other can be utilized, i.e., the symmetry between KxL blocks and LxK blocks. Specifically, there is a symmetry between the KxL blocks predicted by mode P and the LxK blocks predicted by mode (68-P). Alternatively, there is a symmetry between the KxL blocks predicted by mode Q and the LxK blocks predicted by mode (66-Q).
[0129] Since a KxL block with mode 2 and an LxK block with mode 66 can be considered symmetrical to each other, the same transform kernel can be applied to both KxL and LxK blocks. If the mapping is used for the non-separable transform set of intra-prediction modes of the KxL block, then to apply the non-separable transform to the LxK block, the non-separable transform set can be derived based on mode (68-P) instead of mode P applied to the LxK block through the mapping table corresponding to the KxL block. Alternatively, the non-separable transform set can be derived based on mode (66-Q) instead of mode Q applied to the LxK block through the mapping table corresponding to the KxL block.
[0130] For example, to apply an inseparable transformation to an LxK block, the set of inseparable transformations can be selected based on mode 2 instead of mode 66. Furthermore, for a KxL block, the input data can be read in a predetermined order (e.g., row-major or column-major) to form a 1D vector, and then the corresponding inseparable transformation can be applied. For an LxK block, the input data can be read in transposed order to form a 1D vector, and then the corresponding inseparable transformation can be applied. That is, when a KxL block is read in row-major order, an LxK block can be read in column-major order. Conversely, when a KxL block is read in column-major order, an LxK block can be read in row-major order.
[0131] Furthermore, when mode 34 is applied to a KxL block, the set of inseparable transformations can be determined based on mode 34, and the input data can be read in a predetermined order to form a 1D vector and perform the corresponding inseparable transformations. When mode 34 is applied to an LxK block, the set of inseparable transformations can be determined based on mode 34, but the input data can be read in a transposed order to form a 1D vector and perform the corresponding inseparable transformations.
[0132] In this disclosure, methods for determining the set of inseparable transforms and methods for forming input data are described based on KxL blocks. However, inseparable transforms can be performed based on LxK blocks by leveraging the aforementioned symmetry with respect to KxL blocks. Alternatively, blocks with a width greater than their height can be restricted to be used as reference blocks. Alternatively, the symmetry can be restricted from being utilized in the case of non-square blocks. In this case, non-square blocks can use a different number of inseparable transform sets and / or transform kernel candidates than square blocks, and a different mapping table can be used to select the set of inseparable transforms than square blocks.
[0133] An example of a mapping table used to select sets of inseparable transforms is as follows: [Table 1]
[0134] Table 1 shows an example of assigning a non-separable transform set to each intra-prediction mode when five non-separable transform sets are available. The value of predModeIntra indicates the value of the intra-prediction mode considering WAIP, and TrSetIdx is the index indicating a specific non-separable transform set. In Table 1, it can be confirmed that the same non-separable transform set is applied to modes located in the symmetrical direction according to the intra-prediction mode. Table 1 is merely an example using five non-separable transform sets and does not limit the total number of non-separable transform sets used for non-separable transforms.
[0135] Alternatively, as shown in Table 2, the non-separable transformation may not be applied to WAIP for compression performance.
[0136] [Table 2]
[0137] Alternatively, as shown in Table 3, instead of configuring a separate set of inseparable transforms for WAIP, it is possible to share the set of inseparable transforms corresponding to the prediction modes of adjacent frames.
[0138] [Table 3]
[0139] The set of inseparable transforms can include multiple transform kernel candidates, and one of these candidates can be used selectively. For this purpose, an index transmitted via a signal through a bitstream can be used. Alternatively, one of the multiple transform kernel candidates can be implicitly determined based on the context information of the current block. Here, the context information can refer to the size of the current block or whether an inseparable transform is applied to neighboring blocks. The size of the current block can be defined as width, height, the maximum / minimum of width and height, the sum of width and height, or the product of width and height.
[0140] The method for determining the transform kernel for the inverse transform of the current block will be described in detail below.
[0141] Example 1
[0142] As described above, inverse transformations can be divided into separable transformations and non-separable transformations. A separable transformation refers to performing transformations in the horizontal and vertical directions on a two-dimensional block, while a non-separable transformation refers to performing a single transformation on samples constituting the whole or part of the two-dimensional block. When expressing a separable transformation, it can be expressed as a pair of horizontal and vertical transformations, and in this disclosure, it is expressed as (horizontal transformation, vertical transformation).
[0143] Multiple transformation sets can be defined for the inverse transform of the current block. Each transformation set can include one or more transform kernel candidates.
[0144] For example, one of (DST-7, DST-7), (DCT-8, DST-7), (DST-7, DCT-8), or (DCT-8, DCT-8) can be applied as a separable transform, and the above four transform kernel candidates can be considered as a transform set. Furthermore, (DCT-2, DCT-2) can be considered as a transform set. A transform skip without applying a transform can also be considered as a transform set, while (DCT-2, DCT-2) and the transform skip can be considered as a transform set. In this disclosure, a transform kernel can refer to a single transform (e.g., DCT-2, DST-7) or can refer to two transform pairs (e.g., (DCT-2, DCT-2)).
[0145] As another example of a transform set, the aforementioned inseparable transform set can exist. In this disclosure, the inseparable transform to which the principal transform is applied can be represented as the Inseparable Principal Transform (NSPT). In the NSPT, multiple inseparable transform sets can be configured, and each inseparable transform set can include one or more transform kernels as transform kernel candidates. In the case of the NSPT, one of multiple inseparable transform sets is selected based on the intra-frame prediction mode, and the multiple inseparable transform sets used for the NSPT can be represented as a list of NSPT sets. This is as described above, and its detailed description will be omitted here.
[0146] A group of one or more transform sets that can be used for the current block can be configured from multiple predefined transform sets. A group of one or more transform sets can be configured within a predetermined region cell to which the current block belongs, and is referred to hereinafter as a set. Here, the predetermined region cell can be at least one of an image, a slice, a coding tree unit row (CTU row), or a coding tree unit (CTU).
[0147] For example, the transform set consisting of (DCT-2, DCT-2) is called S1, and the transform set consisting of (DST-7, DST-7), (DCT-8, DST-7), (DST-7, DCT-8), and (DCT-8, DCT-8) is called S2. Furthermore, the above list of NSPT sets can include N inseparable transform sets, and these N inseparable transform sets are respectively called S... 3,1 S 3,2 ... S 3,N Here, N can be 35, but is not limited to this.
[0148] When S3,13 is selected as the inseparable transform set of NSPT based on the intra-prediction mode of the current block, the transform kernel applicable to the current block can belong to S1, S2, or S... 3,13 One of them. In this case, the set available for the current block can be represented as {S1, S2, S...}. 3,13}
[0149] As described above, since the set according to this disclosure is a group of one or more transform sets available for the current block, the set can be configured differently based on the context of the current block. Here, the context can include at least one of shape, size, or intra-prediction mode. If a total of K contexts are defined, K sets can be generated, and each set can be represented as Ci (i=1, 2, ..., N). For example, when NSPT applies to blocks of sizes of 4x4, 8x8, 16x16, and 32x32 and one of a total of 35 inseparable transform sets is selected based on the intra-prediction mode, a total of 4x35=140 contexts can be defined if different transform kernels are applied to each block size.
[0150] A set can be configured based on the context of the current block, and in this case, the processes of selecting one of multiple transform sets belonging to the set and selecting one of multiple transform kernel candidates belonging to the selected transform set can be performed. Here, the selection of transform sets and transform kernel candidates can be performed implicitly based on the context of the current block, or it can be performed explicitly based on an index sent by a signal. Alternatively, the processes of selecting one of multiple transform sets belonging to the set and selecting one of multiple transform kernel candidates belonging to the selected transform set can be performed separately. For example, an index for selecting a transform set can be sent first, and one of multiple transform sets belonging to the set can be selected based on that index. Then, an index indicating one of the multiple transform kernel candidates belonging to the transform set can be sent by a signal, and one of the transform kernel candidates can be selected from the transform set based on the index sent by the signal. The transform kernel of the current block can be determined based on the selected transform kernel candidate. Alternatively, the selection of a transform set from the set can be performed implicitly based on the context of the current block, and the selection of a transform kernel candidate from the selected transform set can be performed based on an index sent by a signal. Alternatively, a transform set can be selected from the set based on an index signaled by a signal, and a transform kernel candidate can be implicitly selected from the selected transform set based on the context of the current block. Alternatively, a transform set can be selected from the set implicitly based on the context of the current block, and a transform kernel candidate can be selected from the selected transform set implicitly based on the context of the current block. Of course, when the number of transform sets belonging to the set is 1, an index for selecting the transform set does not need to be signaled. Similarly, when the number of transform kernel candidates belonging to the selected transform set is 1, an index for indicating the transform kernel candidate does not need to be signaled. Alternatively, an index indicating one of all transform kernel candidates belonging to the current set can be signaled. In this case, the process of selecting a transform set from the set can be omitted. In this case, priority can be considered when mixing all transform sets belonging to the set. For example, in the case of assigning a small-length binary code to a small-value index such as a truncated unary code, assigning the small-value index to a transform kernel candidate that is more conducive to improving coding performance may be advantageous. When shuffling all transformation kernel candidates belonging to a set according to priority, different shuffling can be applied to each set. Alternatively, instead of shuffling all transformation kernel candidates belonging to a set, one can selectively shuffle only some of them.
[0151] Example 2
[0152] The transform kernel for the inverse transform of the current block can be determined based on MTS (Multiple Transform Selection).
[0153] The MTS according to this disclosure can use at least one of DST-7, DCT-8, DCT-5, DST-4, DST-1 or IDT (identity transformation) as the transformation kernel. Furthermore, the MTS according to this disclosure may further include a DCT-2 transformation kernel.
[0154] In this disclosure, multiple MTS sets can be defined for MTS. One of the multiple MTS sets can be determined based on the current block size and / or intra-prediction mode. For example, when determining an MTS set, 16 transform block sizes can be considered, and for directional modes, the shape of the transform blocks and the symmetry between the intra-prediction modes can be considered. For WAIP (Wide Angle Intra-Prediction) modes (i.e., -1 to -14 (or -15), 67 to 80 (or 81)), the MTS set corresponding to mode 2 can be applied to modes -1 to -14 (or -15), and the MTS set corresponding to mode 66 can be applied to modes 67 to 80 (or 81). A separate MTS set can be assigned to MIP (Matrix-Based Intra-Prediction) modes.
[0155] For example, an MTS set can be assigned / defined based on the transform block size and intra-prediction mode, as shown in Table 4 below.
[0156] [Table 4]
[0157] Table 4 shows the assignments based on 16 transform block sizes and intra-prediction mode (MTS) sets. There are 80 predefined MTS sets, and the index indicating one of the 80 MTS sets can have values from 0 to 79, as shown in Table 4.
[0158] [Table 5]
[0159] Table 5 shows the transform kernel candidates included in each MTS set described in Table 4. Each MTS set can consist of six transform kernel candidates. The transform kernel candidate index has a value from 0 to 5 and can indicate one of the six transform kernel candidates. Here, each transform kernel candidate can be a combination of horizontal and vertical transform kernels for separable transforms, and 25 transform kernel candidates with indices from 0 to 24 can be defined.
[0160] [Table 6]
[0161] Table 6 provides examples of the 25 transform kernel candidates described in Table 5. Specifically, the horizontal and vertical transforms of the transform kernel candidates are expressed as (horizontal transform, vertical transform). For each transform kernel candidate index, the horizontal / vertical transform when the intra-prediction mode is less than 35 can be the opposite of the horizontal / vertical transform when the intra-prediction mode is greater than or equal to 35. When the value of the intra-prediction mode is greater than or equal to 35, a mode symmetrical to mode 34 can be derived, and the MTS set can be selected from Table 4 based on this mode. Additionally, the symmetry of the block shape can be considered. When the original transform block has a size of WxH, it can be symmetrically treated as having a size of HxW, and the MTS set can be selected from Table 4. Here, the value of the intra-prediction mode can be the value of the modified intra-prediction mode. That is, as the mode value of WAIP, for values from -14 (or -15) to -1, it is modified to mode 2; for values from 67 to 80 (or 81), it is modified to mode 66; and for the remaining modes, the value of the original intra-prediction mode can be set to the value of the modified intra-prediction mode. In this case, since the extended mode of WAIP is also configured symmetrically with respect to mode 34, the symmetry with respect to mode 34 can be used for all directional modes except for planar mode and DC mode.
[0162] For example, when predicting a 16x32 block based on pattern 54, pattern 14 (=68-54) can be derived as a pattern symmetric to pattern 54, and the block size can be considered as 32x16. In this case, an MTS set with index 72 can be selected, as defined in Table 4.
[0163] When applying MIP mode, the MTS set assigned to the MIP mode can be selected based on the size of the current block without considering the symmetry of the block shape. Alternatively, when applying MIP mode, the MTS set assigned to the MIP mode can be selected based on the size of the symmetrical block that takes into account the symmetry of the block shape. For example, when applying MIP mode to an 8x16 block, the 8x16 block can be considered as its symmetrical 16x8 block, and an MTS set with index 49 can be selected, as defined in Table 4. Alternatively, when applying MIP mode, the intra-prediction mode can be considered as a planar mode. In this case, the MTS set assigned to the MIP mode can be selected based on the size of the current block without considering the symmetry of the block shape. Alternatively, the MTS set assigned to the MIP mode can be selected based on the size of the symmetrical block that takes into account the symmetry of the block shape.
[0164] For MIP mode, a flag can be used to indicate whether MIP mode is applied in transposed mode. When MIP mode is applied to an MxN current block and the flag indicates that transposed mode is applied, intra-prediction mode can be treated as planar mode, and the MxN current block can be treated as an NxM block. That is, from Table 4, the MTS set corresponding to NxM block size and planar mode can be selected. As described in Table 6, when the value of intra-prediction mode is greater than or equal to 35, the horizontal and vertical transforms are swapped, but because the intra-prediction mode of the current block is treated as planar mode, the horizontal and vertical transforms of the transform kernel candidate can remain unchanged. Alternatively, when MIP mode is applied to an MxN current block and the flag indicates that transposed mode is applied, intra-prediction mode can be treated as planar mode, and the MxN current block can be treated as an NxM block. That is, from Table 4, the MTS set corresponding to NxM block size and MIP mode can be selected.
[0165] In Table 5, a transform kernel candidate selected by the transform kernel candidate index can be set as the transform kernel of the current block. Alternatively, based on the size of the current block, at least one of the horizontal or vertical transforms of the selected transform kernel candidate can be changed to another transform kernel. For example, when the transform kernel candidate index is 3 and both the width and height of the current block are less than or equal to 16, at least one of the horizontal or vertical transforms of the transform kernel candidate corresponding to transform kernel candidate index 3 can be changed to another transform kernel. In this case, the horizontal and vertical transforms can be changed independently of each other. When the difference (or the absolute value of the difference) between the value of the intra-prediction mode and the value of the horizontal mode of the current block is less than or equal to a predetermined threshold, the vertical transform of the selected transform kernel candidate can be changed to IDT (identity transform). When the difference (or the absolute value of the difference) between the value of the intra-prediction mode and the value of the vertical mode of the current block is less than or equal to a predetermined threshold, the horizontal transform of the selected transform kernel candidate can be changed to IDT (identity transform). Here, the threshold can be determined based on the width and height of the current block, as shown in Table 7 below.
[0166] [Table 7]
[0167] Table 7 is used to change the horizontal and / or vertical transforms of the transform kernel candidates selected by the transform kernel candidate index to another transform kernel, and a threshold is defined based on the size of the transform block.
[0168] The six transform kernel candidates that make up an MTS set can be distinguished by transform kernel candidate indices from 0 to 5, as defined in Table 5. The transform kernel candidate indices can be signaled via a bitstream. Flags indicating whether the MTS set is available / applied (MTS enable flag or MTS flag) can be signaled, and transform kernel candidate indices can be signaled when the flags indicate that the MTS set is available / applied. MTS flags can consist of a bin, and one or more contexts (hereinafter referred to as CABAC contexts) used for CABAC-based entropy coding can be assigned to the bin. For example, different CABAC contexts can be assigned to non-MIP mode and MIP mode respectively.
[0169] Based on the context of the current block, the number of transform kernel candidates available for the current block can be set differently. For example, as the context of the current block, the sum of the absolute values of all or some of the transform coefficients in the current block can be considered. The sum of the absolute values of the transform coefficients is called AbsSum. When AbsSum is less than or equal to T1, only one transform kernel candidate corresponding to transform kernel candidate index 0 is available. When AbsSum is greater than T1 and less than or equal to T2, transform kernel candidates corresponding to transform kernel candidate indices 0 to 3 may be available. When AbsSum is greater than T2, six transform kernel candidates corresponding to transform kernel candidate indices 0 to 5 may be available. Here, T1 can be 6 and T2 can be 32, but this is just an example.
[0170] When AbsSum is less than or equal to T1, since there is only one transform kernel candidate available for the current block, the transform kernel candidate corresponding to transform kernel candidate index 0 can be set as the transform kernel for the current block without signaling the transform kernel candidate index. When AbsSum is greater than T1 and less than or equal to T2, since four transform kernel candidates are available, one of the four transform kernel candidates can be selected based on the transform kernel candidate index with two bins. That is, transform kernel candidate indices 0 to 3 can be signaled as 00, 01, 10, and 11, respectively. For these two bins, the MSB (most significant bit) can be signaled first, and the LSB (least significant bit) can be signaled later. Different CABAC contexts can be assigned to each bin. For example, CABAC contexts other than the one assigned to the MTS flag can be assigned to each bin for both bins. Alternatively, bypass coding can be applied if no CABAC context is assigned to either bin. When AbsSum is greater than T2, the transform kernel candidate index has values from 0 to 5, therefore the transform kernel candidate index cannot be expressed using only two bins. In this case, the transform kernel candidate index can be expressed by assigning two or more bins, such as truncated binary encoding. For each bin assigned by the truncated binary encoding method, a CABAC context can be assigned, or bypass encoding can be applied if no CABAC context is assigned. Alternatively, a CABAC context can be assigned to some bins among multiple bins (e.g., the first bin, or the first and second bins), and bypass encoding can be applied to the remaining bins.
[0171] Example 3
[0172] The transform kernel of the current block can be determined based on a transform set that includes one or more transform kernel candidates. The transform kernel of the current block can be derived as one of one or more transform kernel candidates belonging to the transform set.
[0173] The process of determining the transform kernel of the current block may include at least one of the following: 1) determining the transform set of the current block, or 2) selecting a transform kernel candidate from the transform set of the current block. The process of determining the transform set may be a process of selecting one of a plurality of identical predefined transform sets in the encoding and decoding apparatus. Alternatively, the process of determining the transform set may be a process of configuring one or more transform sets available for the current block from a plurality of identical predefined transform sets in the encoding and decoding apparatus, and selecting one of the configured transform sets. Alternatively, the process of determining the transform set may be a process of configuring a transform set based on transform kernel candidates available for the current block from a plurality of identical predefined transform kernel candidates in the encoding and decoding apparatus.
[0174] When the transform set of the current block includes multiple transform kernel candidates, a process can be performed to select one of the multiple transform kernel candidates for the current block. However, when the transform set of the current block includes only one transform kernel candidate (i.e., when the number of transform kernel candidates available for the current block is 1), the transform kernel of the current block can be set as the corresponding transform kernel candidate.
[0175] The transform set according to this disclosure may refer to the (inseparable) transform set in Embodiment 1 above, or it may refer to the MTS set in Embodiment 2. Alternatively, the transform set may be defined separately from the (inseparable) transform set in Embodiment 1 or the MTS set in Embodiment 2. In this case, the transform set may include one or more specific transform kernels as transform kernel candidates. A specific transform kernel may be defined as a pair of transform kernels for horizontal transformation and transform kernels for vertical transformation, or it may be defined as a single transform kernel that is applied to both horizontal and vertical transformations.
[0176] In embodiments of this disclosure, the process of applying an NSPT (Inseparable Transform) to the master transform is described in detail. The NSPT can be applied to the entire transform block or a portion of the transform block. Based on the forward NSPT, residual samples present in the region where the NSPT is applied can be used as 1D vector inputs to the NSPT. In other words, residual samples present in whole or in part in a single transform block (referred to in this disclosure as the Region of Interest (ROI)) can be collected as 1D vectors and configured as inputs. Then, when the forward NSPT is applied, the master transform coefficients can be obtained. Conversely, when the backward NSPT is applied to the master transform coefficients, a 1D vector output can be obtained. Residual samples for the ROI can be obtained by arranging the element values of the configured output vectors at defined positions within the 2D transform block.
[0177] For an inseparable transform kernel used in NSPT, the matrix dimension can be determined based on the size of the ROI. In this disclosure, the transform kernel can be referred to as a transform type or transform matrix, and an inseparable transform kernel used in NSPT can be referred to as an NSPT kernel. For example, when the current block is an MxN transform block, the ROI is the region covering the entire MxN transform block, and a square NSPT is applied, the dimension of the corresponding transform matrix can be MN x MN. For example, when the ROI is the region covering the entire 8x8 transform block, the dimension of the NSPT kernel can be 64 x 64.
[0178] According to embodiments of this disclosure, when NSPT is applied to residuals generated by intra-frame prediction, the NSPT kernel can be adaptively determined based on the intra-frame prediction mode. Since the statistical properties of the residual block may vary depending on the intra-frame prediction mode, compression efficiency can be improved by adaptively determining the NSPT kernel based on the intra-frame prediction mode.
[0179] It can be configured to share the NSPT kernel applied to at least one intra-prediction mode. As described above, the set of inseparable transforms can be determined based on the intra-prediction modes and mapping table of the current block. The mapping table can define the mapping relationship between predefined intra-prediction modes and the set of inseparable transforms. The predefined intra-prediction modes can include two non-directional modes and 65 directional modes.
[0180] As an example, intra-prediction modes can be grouped into intra-prediction mode groups. One NSPT kernel can be assigned to an intra-prediction mode group, or multiple NSPT kernels can be assigned to an intra-prediction mode group. In other words, an inseparable transform set (NSPT set) including at least one NSPT kernel can be assigned to an intra-prediction mode group. The inseparable transform set can be mapped to an intra-prediction mode, and one of the N NSPT kernels included in the inseparable transform set can be selected.
[0181] As an example, intra-prediction groups may include adjacent prediction modes (e.g., modes 17, 18, and 19). Additionally, intra-prediction groups may include modes with symmetry. For example, in the above... Figure 5 In this context, directional modes can be symmetrical about a diagonal mode (i.e., intra-prediction mode 34). In this case, two symmetrical modes can be configured as a group (or a pair). For example, modes 18 and 50 can be included in the same group because they are symmetrical about mode 34. However, for modes with symmetry, a process can be used to add a transposed 2D input block before applying the feedforward NSPT kernel and then configure the 1D input vector. For example, when the intra-prediction mode is less than or equal to 34, a one-dimensional input vector can be derived from the corresponding input block in row-major order without transposing the 2D input block. When the intra-prediction mode is greater than 34, the 1D input vector can be configured either by first transposing the 2D input block and then reading the corresponding input block in row-major order, or by keeping the 2D input block as is and reading the corresponding input block in column-major order.
[0182] Table 8 below illustrates the mapping table used for assigning NSPT sets based on intra-prediction modes. Referring to Table 8, a total of 35 NSPT sets from 0 to 34 can be defined. The NSPT set assigned to the most recent general directional mode can be assigned to the extended WAIP mode (i.e., Figure 5 (Modes -14 to -1 and modes 67 to 80 in the dataset). In other words, NSPT set 2 can be assigned to extended WAIP modes.
[0183] [Table 8]
[0184] The NSPT set may include at least one NSPT core (or core candidate). In other words, the NSPT set may include N NSPT core candidates. For example, N may be set to a value greater than or equal to 1, such as 1, 2, 3, 4, etc. The core applicable to the current block among the at least one NSPT core included in the NSPT set can be signaled using an index. In this disclosure, the corresponding index may be referred to as the NSPT index. For example, the NSPT index may have values of 0, 1, 2, ..., N-1.
[0185] Additionally, as an example, when the number of NSPT core candidates is 1, the NSPT index value can be fixed at 0. In this case, the NSPT index can be inferred without being sent separately by a signal. Furthermore, the flag indicating whether to apply NSPT can be sent separately by a signal from the NSPT index. In this disclosure, the corresponding flag can be referred to as the NSPT flag.
[0186] NSPT can be applied when the NSPT flag value is 1. NSPT can be omitted when the NSPT flag value is 0. When the NSPT flag is not signaled, its value can be inferred as 0. For example, when the NSPT flag value is 1, the NSPT index can be applied. One of the N kernel candidates included in the NSPT set selected by the intra-prediction mode can be specified based on the NSPT index signaled.
[0187] In embodiments, the entropy encoding method for NSPT indices can be defined in various ways by considering the number (N) of NSPT kernels included in the NSPT set. For example, as a method for mapping values from 0 to N-1 to bin strings (i.e., a binarization method), truncated unary binarization, truncated binarization, and fixed-length binarization methods can be used.
[0188] For example, when the number of kernel candidates N in the configured NSPT set is 2, one of the two candidates can be specified using a single bin. For example, 0 can indicate the first candidate and 1 can indicate the second candidate. Furthermore, when N is 3 and truncated unary binarization is applied, the candidates can be specified using two bins. For example, the first, second, and third candidates can be binarized to 0, 10, and 11, respectively, and transmitted as signals. As an example, the binarized bins can be encoded using context coding or bypass coding.
[0189] This disclosure describes a Reduced Principal Transform (RPT) method using a dimension-reduced transform kernel as the principal transform. As described above, when applying forward NSPT, samples belonging to a 2D residual block can be arranged (or rearranged) into 1D vectors according to row-major order (or column-major order). The transformation matrix used for NSPT can then be multiplied by the rearranged vectors. When the corresponding 2D residual block is M x N (M is the horizontal length and N is the vertical length), the length of the rearranged 1D vector can be M. N. In other words, the corresponding 2D residual block can also be represented as having dimension M. An N x 1 column vector. In this disclosure, for convenience, M... N can be expressed as MN. In this case, the dimension of the corresponding transformation matrix can be MN x MN. In summary, the forward NSPT transformation can be performed by multiplying the left side of the MN x 1 vector by the corresponding MN x MN transformation matrix to obtain the MN x 1 transformation coefficient vector.
[0190] When applying the RPT transform, r transform coefficients can be obtained by multiplying by an r x MN matrix, instead of multiplying by an MN x MN matrix as in the aforementioned forward NSPT transform matrix. Here, r represents the number of rows in the transform matrix, and MN represents the number of columns in the transform matrix. According to embodiments of this disclosure, the value of r can be set to be less than or equal to MN. In other words, the existing forward NSPT transform matrix comprises MN rows, and each row consists of a 1 x MN row vector and the transform basis vector of the corresponding NSPT transform matrix. The corresponding transform coefficients can be obtained by multiplying each transform basis vector by an MN x 1 sample column vector.
[0191] Because the existing forward NSPT transformation matrix consists of MN row vectors, applying the forward NSPT transformation yields MN transformation coefficients (i.e., MN x 1 transformation coefficient column vectors). Meanwhile, for the forward RPT, the transformation matrix can consist of r transformation basis vectors instead of MN. Therefore, when applying the forward RPT transformation, r transformation coefficients (i.e., r x 1 transformation coefficient column vectors) can be obtained instead of MN.
[0192] The RPT kernel can be configured by selecting r transform basis vectors, which are part of the transform basis vectors used to configure the MN x MN forward NSPT kernel. In this disclosure, the transform kernel can be referred to as a transform type or transform matrix, and the inseparable transform kernel used for NSPT can be referred to as the RPT kernel. In other words, when selecting r 1x MN row vectors from the MN x MN forward NSPT kernel, it may be advantageous from a coding performance perspective to select the most important transform basis vectors. Specifically, in terms of energy compression through transforms, more energy can be concentrated on the first-appearing transform coefficients by multiplying by the forward NSPT transform matrix. In other words, the transform basis vectors positioned at the top of the forward NSPT transform matrix can generate transform coefficients with greater energy. With this in mind, an rx MN forward RPT kernel can be configured (or derived) by taking r from the top of the forward NSPT kernel.
[0193] The RPT according to this disclosure only takes a portion (i.e., r) of the transform coefficients obtained by applying the existing NSPT, and therefore the energy of the original signal may be partially lost. In other words, distortion between the original signal and the natural signal may occur through the corresponding process. However, since only r transform coefficients are generated by applying RPT instead of MN, the number of bits required to encode the corresponding transform coefficients can be reduced. Therefore, for signals where a large amount of energy is concentrated on a few transform coefficients (e.g., image residual signals), the gain obtained by reducing the number of signal bits can be significantly large, thereby improving coding performance.
[0194] The backward NSPT is a transformation matrix, and can be the transpose of the aforementioned forward NSPT kernel. In this case, the input data can be the transform coefficient signal, rather than a sample signal such as a residual signal. Specifically, when the forward NSPT transformation matrix is G and the sample signal is x rearranged into a 1D vector, the transform coefficient vector obtained by multiplying the corresponding transformation matrix by the left side can be expressed as shown in Equation 4.
[0195] [Formula 4]
[0196] Referring to Equation 4, x and y can be MN x 1 column vectors. G can be in the form of an MN x MN matrix. The backward NSPT process can be expressed using the same variables as in Equation 5 below.
[0197] [Formula 5]
[0198] In Equation 5, G TThis refers to the transpose of G. The forward RPT and backward RPT operations according to this disclosure can also be expressed by these two equations. However, when applying RPT, y is an r x 1 column vector instead of an MN x 1 column vector, and G is an r x MN matrix instead of an MM x MN matrix. In other words, even when applying RPT instead of NSPT, the dimension of the sample signal (e.g., the image residual signal) does not change, which may mean that the original number of sample signals (i.e., MN sample signals) can be reconstructed using only r transform coefficients via backward RPT. In other words, by encoding only r transform coefficients less than MN, the original MN sample signals can be reconstructed, which improves coding performance.
[0199] In embodiments of this disclosure, an RPT structure is proposed that defines the value of r by considering the statistical properties of the residual block, and derives a residual block of the existing transform block size from a residual block of reduced size determined according to the defined r value. If another additional transform (i.e., an auxiliary transform) is applied to predict the statistical distribution of the master transform coefficients, a quantization process is applied to the master transform coefficients, so the quantized non-zero coefficients may be concentrated in a relatively low frequency domain. Therefore, the auxiliary transform for reducing the statistical distribution of the master transform coefficients can define the statistical properties of the master transform coefficients relatively simply by setting the r value for a given low frequency domain. However, the RPT according to this disclosure is fundamentally different from the auxiliary transform, as a technique for defining the r value by considering the statistical properties of samples within the residual block that have properties quite different from the distribution of the master transform coefficients. Hereinafter, various embodiments for determining the RPT kernel as the transform matrix as the reduction dimension are described. In other words, a method for determining or defining the r value in the RPT is described below.
[0200] In embodiments of this disclosure, the value of r in the RPT can be determined by considering the worst-case complexity allowed by the transformation system. As an example, the worst-case complexity can be calculated based on the number of multiplications per sample. Applying the RPT in both the forward and backward directions requires MN based on an M x N block. r multiplications. Since the 2D block consists of a total of MN samples, the number of multiplications for each sample can be calculated as (MN... r) / MN = r. Therefore, the value of r can be configured to remain less than or equal to the maximum number of multiplications allowed per sample. For example, when the maximum possible number of multiplications per sample is set to 16 for a 16x16 block, the value of r can be determined to be less than or equal to 16. In other words, the forward RPT kernel can be set to 16 x 256.
[0201] In another embodiment, memory usage can be viewed as a measure of worst-case complexity. For example, the allowed memory size per core can be set. For instance, when each core coefficient (in this disclosure, each element configuring the transform core is called a core coefficient) requires p bytes, and memory usage is set to be less than or equal to q bytes per core, the value of r can be set to be less than or equal to q / (MN). For example, when p is 1 byte for a 16x16 block forward RPT core and memory usage is set to less than or equal to 8KB per core (q = 8 KB = 2...), 13 When the number of bytes is 32, the value of r can be set to less than or equal to 32.
[0202] Additionally, as another example, memory usage and / or the number of multiplications per sample can be considered as a measure of worst-case complexity. For instance, when the maximum possible number of multiplications per sample for a 16x16 block is set to 16 and memory usage is set to less than or equal to 8KB per core (core coefficients are expressed as 1 byte), the value of r can be set to less than or equal to 16.
[0203] Furthermore, in this embodiment, the r value for configuring the RPT core can be determined by specific information. In other words, the r value for configuring the RPT core can be determined based on predefined coding parameters. For example, the r value can be determined based on the block size. In other words, the RPT core can be variably determined based on the block size. Here, the block can be at least one of a coding block, a transform block, and a prediction block. Additionally, for example, the r value can be determined based on prediction information. Here, prediction information can include information about inter-frame / intra-frame prediction, intra-frame prediction mode information, etc. Furthermore, for example, the r value can be determined based on information transmitted with signals (the values of syntax elements). For example, the r value can be variably determined based on quantization parameter values. Additionally, for complexity improvement, a predefined fixed value can be used as the r value, and this predefined fixed value can be determined based on information transmitted with signals.
[0204] When the sample signal is multiplied by the RPT kernel r x MN, r transform coefficients are obtained. These r transform coefficients can be arranged according to a predefined scan order (e.g., forward / backward zigzag scan order, forward / backward horizontal scan order, forward / backward vertical scan order, forward / backward diagonal scan order, scan order specified based on the intra-frame prediction mode, etc.). When the transform coefficients obtained by applying the forward RPT are arranged according to this scan order (e.g., a scan order in units of coefficient groups (CGs) can also be applied), if the value of r is less than MN, the M x N block may not be completely filled by the r transform coefficients, thus potentially resulting in blank spaces. As an embodiment of this disclosure, the aforementioned blank spaces can be predicted by considering the characteristics of the residual signal in the following manner.
[0205] – The blank space can be filled with the values of the available neighboring pixels.
[0206] – The values of the blank space can be filled based on the values of available neighboring pixels and the intra-prediction mode. For example, the values of the blank space can be predicted by performing intra-prediction based on the values of available neighboring pixels and the intra-prediction mode.
[0207] – Values can be used to fill empty spaces with predefined fixed values (e.g., 0).
[0208] – Values can be used to fill the blank space from available neighboring pixels by using a predetermined intra-frame prediction mode (e.g., planar mode).
[0209] In this disclosure, filling the blank space with 0 in the above example can be referred to as a zero-out process. When filling the blank space with 0, the following embodiments can be applied. When a non-zero transform coefficient is detected (or resolved) in the corresponding blank space portion during the resolution of transform coefficients on the decoding device side, it can be considered (or inferred) that RPT has not been applied. In other words, when a non-zero transform coefficient exists in a predefined region representing the corresponding blank space, it can be considered that RPT has not been applied. In this case, signal transmission (or resolution) for a flag indicating whether RPT is applied and / or an index specifying one of multiple RPT core candidates can be omitted. As an example, when a non-zero transform coefficient exists in a predefined region representing the corresponding blank space, a predefined variable value can be updated, and it can be inferred that RPT has not been applied based on the updated variable value.
[0210] In embodiments of this disclosure, the application of RPT can be determined based on the size and / or format of the block. Furthermore, the RPT core can be variably determined based on the size and / or format of the block. Because the value of r may differ depending on the size and / or format of the block (i.e., for each M x N block), the blanking space may also differ depending on the size and / or format of the block. Therefore, the region used to check for the detection of non-zero transform coefficients can be defined differently for each block size and / or format. In other words, the zeroing region can be variably determined.
[0211] For example, when a 16x64 matrix is applied as the forward RPT matrix for an 8x8 block, the value of r can be 16. In this case, when CG is a 4x4 sub-block, only the top-left 4x4 block can be filled with non-zero RPT transform coefficients, and the remaining three 4x4 sub-blocks (i.e., the top-right, bottom-left, and bottom-right sub-blocks) can be filled with 0 values. In this case, when non-zero transform coefficients are detected in the corresponding remaining three 4x4 sub-block regions during the decoding process, it can be considered that RPT has not been applied. Furthermore, as mentioned above, the flag indicating whether RPT is applied or the index specifying one of the multiple RPT core candidates may not be sent by signaling.
[0212] Additionally, for example, when a 32x128 matrix is applied as the forward RPT matrix for a 16x8 block (i.e., r is 32) and the CG is a 4x4 sub-block, only two CGs can be filled with non-zero RPT transform coefficients in scan order. For example, the top-left 4x4 sub-block and the 4x4 sub-block adjacent to the bottom of the top-left sub-block can be filled with the corresponding RPT transform coefficients. Regions filled with 0s as blank spaces can be determined as the remaining regions excluding the corresponding two 4x4 sub-blocks. The RPT kernel can be variably determined according to the block size and / or form, and as mentioned above, the blank spaces can be determined differently for 8x8 and 16x8 blocks.
[0213] As an example, when the value of r is a multiple of the CG size and the transform coefficients are scanned in units of CGs, if a non-zero transform coefficient is detected in a CG belonging to the empty space, the flag and / or index associated with RPT can be scanned without signaling. In other words, the transform coefficients within a CG can be scanned in the order specified for each CG, and after moving to the next CG in the scan order for CGs, the transform coefficients within a CG can be scanned in the same way. In existing image compression techniques, because a flag indicating the presence of non-zero transform coefficients in the corresponding CG is first signaled for each CG, it is possible to determine whether to apply RPT using only the corresponding information, which reduces signaling overhead and associated implementation complexity.
[0214] As mentioned above, when applying RPT, if a non-zero transform coefficient is detected in a blank space region filled with zeros, RPT can be omitted. In this case, the signal transmission for RPT-related information can be omitted. However, since it cannot be determined whether RPT should be applied when no non-zero transform coefficient is detected in the corresponding blank space, the flag indicating whether RPT should be applied can be analyzed after parsing (or signaling) the relevant transform coefficients to finally determine whether RPT should be applied.
[0215] As an example, a forward auxiliary transformation can be additionally applied to the transform coefficients generated by applying the RPT. Alternatively, the forward auxiliary transformation can be additionally applied to the region where the corresponding generated transform coefficients are located in the M x N block. In this disclosure, from the perspective of the forward auxiliary transformation, the corresponding region or a portion of the corresponding region can be referred to as the ROI. For the backward direction, a backward auxiliary transformation can be applied first, followed by a backward RPT. Specifically, the region or a portion of the region where the r transform coefficients generated by applying the forward RPT are arranged can be set as the ROI to apply the forward auxiliary transformation. In this case, when a 16x64 forward RPT transformation matrix is applied to an 8x8 region, the 16 generated transform coefficients can be located in the upper left 4x4 sub-block, and the corresponding sub-block region can be set as the ROI to apply the forward auxiliary transformation to the corresponding ROI.
[0216] Furthermore, the RPT kernel can adjust its coefficient values by incorporating operations such as integer or fixed-point arithmetic. In other words, the RPT kernel can be configured to perform the transformation in a practical encoding / decoding system by appropriately scaling the kernel coefficients belonging to the corresponding kernel, rather than by performing the theoretical orthogonal or non-orthogonal transformation (here, orthogonal and non-orthogonal transformations refer to transformations in which the norm of each transform basis vector is 1) through integer (or fixed-point) arithmetic. This is also reflected when RPT is applied with as many scaling factors as when applying separable transformations in existing image compression techniques. In this case, separable or non-separable transformations (including RPT) can be performed while other processes besides the transformation (e.g., quantization and dequantization processes) are preserved.
[0217] By multiplying the transform basis vector by the scaling value described above, the integer coefficients of the RPT kernel can be obtained. As an example, multiplying by the scaling value may include applying operations such as rounding, flooring, and flooring to each kernel coefficient. In other words, the integerized RPT kernel obtained by the above method can be defined and used in the transform / inverse transform process. As mentioned above, when the scaled integer kernel coefficients are obtained through operations such as rounding, flooring, and flooring, the maximum and minimum values of all kernel coefficients can be obtained, thus yielding a sufficient number of bits to represent all kernel coefficients. For example, when the maximum value is less than or equal to 127 and the minimum value is greater than or equal to -128, all integer kernel coefficients can be represented using 8 bits (specifically, through two's complement, etc.).
[0218] In general terms, when the maximum value is less than or equal to (2 (N-1) - 1) and the minimum value is greater than or equal to -2 (N-1) When all integer kernel coefficients can be represented using N bits, when the maximum value is greater than (2^N, ...), all integer kernel coefficients can be represented using N bits. (N-1) - 1) or the minimum value is less than -2 (N-1) Not all integer kernel coefficients can be represented using N bits. In this case, 1) all kernel coefficients can be additionally multiplied by a scaling value to adjust them to fall within the N-bit range; or 2) the number of bits required to represent the kernel coefficients can be increased (i.e., N+1 bits or more). When all kernel coefficients need to be multiplied by 2... -p (p>= 1) When expressed using N bits, it can be done by subsequently multiplying by 2. -p This compensates for them, allowing them to be integrated into the existing encoding / decoding process. As an example, multiply by 2. p This can be achieved by performing an additional left-shifting operation by p bits or by reducing the right shift applied during quantization or dequantization by p.
[0219] All kernel coefficients can be expressed in 8 bits, 9 bits, 10 bits, etc. using the methods described above. Of course, the scaling value of the kernel coefficients can be set differently for each block size or kernel, and the number of bits used to express the kernel coefficients can also be set differently.
[0220] The aforementioned NSPT can be applied based on at least one of the current block size, tree type, or component type. As an example, it can be determined whether to apply an NSPT based on at least one of the current block size, tree type, or component type. The NSPT index can be sent using a signal based on at least one of the current block size, tree type, or component type. An NSPT set or NSPT core can be derived based on at least one of the current block size, tree type, or component type.
[0221] The predefined allowed transform block sizes in the decoding device can be broadly divided into two groups. Either group (hereinafter referred to as the first group) can refer to the set of block sizes applicable to NSPT. The first group can consist of any one of the allowed transform block sizes, or it can consist of two or more of the allowed block sizes. An NSPT-applicable block size can be defined as a block size where at least one of its width and height is less than or equal to a predetermined threshold. Alternatively, an NSPT-applicable block size can be defined as a block size where the product of its width and height is less than or equal to a predetermined threshold. Alternatively, an NSPT-applicable block size can be defined as a block size where the maximum value of its width and height is less than or equal to a predetermined threshold. The threshold can be an integer of 4, 8, 16, 32, 64, 128, or higher.
[0222] The other group (hereinafter referred to as the second group) can refer to the set of block sizes for which NSPT is not applied. The aforementioned separable principal transformation can be applied to block sizes belonging to the second group. Additionally, the non-separable secondary transformation can be applied to all or part of the block sizes belonging to the second group.
[0223] For example, when the current block size belongs to the first group, a backward NSPT can be applied to the (dequantized) transform coefficients of the current block. When the current block size belongs to the second group, a backward separable principal transform can be applied to the (dequantized) transform coefficients of the current block. Alternatively, when the current block size belongs to the second group, a backward non-separable auxiliary transform (e.g., low-frequency non-separable transform, LFNST) can be applied to the (dequantized) transform coefficients of the current block, and a backward separable principal transform (e.g., DCT-2) can be applied to the transform coefficients obtained therefrom.
[0224] As an example, the first group, as the set of block sizes applicable to NSPT, can be defined as a set of 4x4, 4x8, 8x4, and 8x8. Alternatively, the first group can be defined as a set of 4x8, 8x4, and 8x8. Alternatively, the first group can be defined as a set of 4x8 and 8x4. Alternatively, the first group can be defined as a set of 4x4, 4x8, 4x16, 8x4, 8x8, and 16x4. Alternatively, the first group can be defined as a set of 4x8, 4x16, 8x4, 8x8, and 16x4. Alternatively, the first group can be defined as a set of 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, and 16x16. Alternatively, the first set can be defined as a set of 4x4, 4x8, 8x4, 8x8, 8x16, and 16x8. Alternatively, the first set can be defined as a set of 4x8, 8x4, 8x8, 8x16, and 16x8. Alternatively, the first set can be defined as a set of 4x8, 8x4, 8x16, 16x8, 16x16, 16x32, 32x16, and 32x32. Alternatively, the first set can be defined as a set of 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, and 32x16. Alternatively, the first set can be defined as a set of 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, and 32x16. Alternatively, the first set can be defined as a set of 4x8, 8x4, 8x16, 16x8, 16x16, 16x32, and 32x16. Alternatively, the first set can be defined as a set of 4x4, 4x8, 4x16, 8x4, and 16x4. Alternatively, the first set can be defined as a set of 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x4, and 16x8. Alternatively, the first set can be defined as a set of 4x8, 4x16, 8x4, 8x8, 8x16, 16x4, and 16x8. Alternatively, the first set can be defined as a set of 4x4, 4x8, 4x16, 8x4, 8x16, 16x4, and 16x8. Alternatively, the first set can be defined as a set of 4x8, 4x16, 8x4, 8x8, 8x16, 16x4, 16x8, and 16x16. Alternatively, the first set can be defined as a set of 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x4, 16x8, and 16x16.Alternatively, the first group can be defined as a set of 4x8, 4x16, 8x4, 8x8, 8x16, 16x4, 16x8, and 16x16. Alternatively, the first group can be defined as a set of 4x4, 4x8, 4x16, 8x4, 8x16, 16x4, 16x8, and 16x16. Alternatively, the first group can be defined as a set of 4x8, 4x16, 8x4, 8x16, 16x4, 16x8, and 16x16. Alternatively, the first group can be defined as a set of 4x4, 4x8, 4x16, 4x32, 8x4, 8x16, 8x32, 16x4, 16x8, 32x4, and 32x8. Alternatively, the first group can be defined as a set of 4x8, 4x16, 4x32, 8x4, 8x16, 8x32, 16x4, 16x8, 32x4, and 32x8. Alternatively, the first group can be defined as a set of 4x4, 4x8, 4x16, 4x32, 8x4, 8x8, 8x16, 8x32, 16x4, 16x8, 16x16, 32x4, and 32x8. Alternatively, the first group can be defined as the set of 4x4, 4x8, 4x16, 4x32, 8x4, 8x8, 8x16, 8x32, 16x4, 16x8, 16x16, 16x32, 32x4, 32x8, and 32x16. Alternatively, the first group can be defined as the set of 4x8, 4x16, 4x32, 8x4, 8x16, 8x32, 16x4, 16x8, 16x32, 32x4, 32x8, and 32x16. Alternatively, the first group can be defined as a set of 4x4, 4x8, 4x16, 4x32, 8x4, 8x8, 8x16, 8x32, 16x4, 16x8, 16x16, 16x32, 32x4, 32x8, 32x16, and 32x32.
[0225] An NSPT matrix (or NSPT kernel) with a predetermined dimension can be applied to the block size belonging to the first group. Here, the NSPT matrix can be expressed as a matrix with dimensions PxQ as the backward transformation matrix, and the PxQ matrix represents a matrix with P rows and Q columns, respectively.
[0226] As examples, a 16x16 NSPT matrix can be applied to a 4x4 block. A 32x20 NSPT matrix can be applied to at least one of a 4x8 or 8x4 block. A 64x24 NSPT matrix can be applied to at least one of a 4x16 or 16x4 block. A 64x32 NSPT matrix can be applied to an 8x8 block. A 128x40 NSPT matrix can be applied to at least one of an 8x16 or 16x8 block. A 256x44 NSPT matrix can be applied to a 16x16 block. A 128x36, 128x38, or 128x40 NSPT matrix can be applied to a 4x32 or 32x4 block. A 256x48 NSPT matrix can be applied to an 8x32 or 32x8 block. A 512x52 or 512x54 NSPT matrix can be applied to a 16x32 or 32x16 block.
[0227] For at least one of 4x32 blocks or 32x4 blocks, 128x36, 128x38, 128x40, or 128x(36-(4) can be applied. The NSPT matrix is n). Alternatively, for at least one of 4x32 blocks or 32x4 blocks, 128x(36-(4)) can be applied. The NSPT matrix is n), not a 128x36, 128x38, or 128x40 NSPT matrix. Here, This represents multiplication, and n can be an integer greater than or equal to 0.
[0228] As an example, as an NSPT matrix for at least one of 4x32 blocks or 32x4 blocks, at least one of 128x36 matrices, 128x32 matrices, 128x28 matrices, 128x24 matrices, 128x20 matrices, 128x16 matrices, 128x12 matrices, 128x8 matrices or 128x4 matrices can be used.
[0229] Additionally, for at least one of 8x32 blocks or 32x8 blocks, 256x48 or 256x(48-(4) can be applied. The NSPT matrix is m). Alternatively, for at least one of 8x32 blocks or 32x8 blocks, a 256x(48-(4)) matrix can be applied. The NSPT matrix is m), not a 256x48 NSPT matrix. Here, This represents multiplication, and m can refer to an integer greater than or equal to 0.
[0230] As an example, as an NSPT matrix for at least one of 8x32 blocks or 32x8 blocks, at least one of the following matrices can be used: 256x48 matrix, 256x44 matrix, 256x40 matrix, 256x36 matrix, 256x32 matrix, 256x28 matrix, 256x24 matrix, 256x20 matrix, 256x16 matrix, 256x12 matrix, 256x8 matrix, or 256x4 matrix.
[0231] The structures of the NSPT matrices for 4x32 and 32x4 blocks and the structures of the NSPT matrices for 8x32 and 32x8 blocks can be configured by combining the matrices described above. That is, the combination of the structures of the NSPT matrices for 4x32 and 32x4 blocks and the structures of the NSPT matrices for 8x32 and 32x8 blocks can be defined as one or more 128x(36-(4 n)) NSPT matrix and one or more 256x(48-(4 The combination of NSPT matrices m). Here, n can be one or more integers in the range of 0 to 8, and m can be one or more integers in the range of 0 to 11.
[0232] As an example, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks can be a 128x24 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks can be a 256x36 matrix.
[0233] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x24 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x32 matrix.
[0234] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x24 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x28 matrix.
[0235] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x24 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x24 matrix.
[0236] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x24 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x20 matrix.
[0237] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x24 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x16 matrix.
[0238] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x20 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x36 matrix.
[0239] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x20 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x32 matrix.
[0240] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x20 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x28 matrix.
[0241] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x20 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x24 matrix.
[0242] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x20 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x20 matrix.
[0243] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x20 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x16 matrix.
[0244] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x16 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x36 matrix.
[0245] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x16 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x32 matrix.
[0246] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x16 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x28 matrix.
[0247] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x16 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x24 matrix.
[0248] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x16 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x20 matrix.
[0249] Alternatively, the NSPT matrix used for at least one of 4x32 blocks or 32x4 blocks may be a 128x16 matrix, and the NSPT matrix used for at least one of 8x32 blocks or 32x8 blocks may be a 256x16 matrix.
[0250] Because the PxQ matrix is a backward NSPT matrix, the Px1 output vector can be obtained by applying the PxQ matrix to the Qx1 input vector (i.e., (PxQ matrix) x (Qx1 input vector)). Here, the Qx1 input vector can correspond to the (dequantized) transform coefficients within the current block to which the NSPT is applied. In this case, the value of Q can refer to the number of transform coefficients to which the NSPT is applied, and can be less than or equal to the product of the width and height of the current block. The value of Q can be variably determined based on the size of the current block, which belongs to the first group of block sizes mentioned above. Alternatively, the value of Q can be set similarly for block sizes belonging to the first group. The Px1 output vector can correspond to the residual signal (or, the decoded residual sample). The value of P can be equal to the product of the width and height of the current block.
[0251] Conversely, the forward NSPT matrix can be expressed as a QxP matrix, which is the transpose of the PxQ matrix. The Qx1 output vector can be obtained by applying the QxP matrix to the Px1 input vector (i.e., (QxP matrix) x (Px1 input vector)). Here, the Px1 input vector can correspond to the residual samples within the current block to which NSPT is applied. The value of P can be equal to the product of the width and height of the current block. The Qx1 output vector can correspond to the transform coefficients within the current block derived by NSPT. In this case, the value of Q can refer to the number of transform coefficients output by NSPT and can be less than or equal to the product of the width and height of the current block. Similarly, the value of Q can be variably determined based on the size of the current block belonging to the first group of block sizes described above. Alternatively, the value of Q can be set similarly for block sizes belonging to the first group.
[0252] As in the example, NSPT can be applied to MxN and NxM blocks that are non-square blocks. For example, NSPT can be applied to 4x8 and 8x4 blocks. Alternatively, NSPT can be applied to 4x16 and 16x4 blocks, or NSPT can be applied to 8x16 and 16x8 blocks, or NSPT can be applied to 16x32 and 32x16 blocks.
[0253] By applying NSPT to a specific block size belonging to the first group, the transform can be performed more precisely and coding performance can be improved. When applying forward LFNST, the principal transform coefficients of the remaining regions except the region where LFNST is applied (i.e., the region of interest, ROI) can be zeroed out. Additionally, LFNST can consist of a small number of transform basis vectors. In this case, performance degradation may occur when separable principal transforms such as DCT-2 and non-separable auxiliary transforms such as LFNST are applied to the corresponding block size instead of NSPT. When applying NSPT instead of LFNST in the corresponding case, the zeroing process can be omitted, and coding performance can be improved compared to applying LFNST. Furthermore, performance improvement can be expected through the method used to apply NSPT. NSPT or LFNST can be applied using the symmetry described below. Here, for LFNST, the transpose operation is performed on the corresponding input block by applying symmetry only to the ROI region. On the other hand, for NSPT, the transpose operation is performed on the entire block by using symmetry. Therefore, for NSPT, a more precise symmetry can be used to train and apply the corresponding NSPT kernel, thereby expecting performance improvement.
[0254] Furthermore, when applying LFNST instead of NSPT to an 8x8 block, a 32x64 transform matrix can be applied instead of a 16x64 transform matrix from the perspective of forward transform. This can be achieved by sampling the first 16 rows of the 32x64 transform matrix. When applying LFNST based on a 16x64 transform matrix to an 8x8 block, 16 multiplications are required per sample to apply LFNST, but when using a 32x64 transform matrix, 32 multiplications are required per sample. However, when using a 32x64 transform matrix in this way, an improvement in coding performance can be expected.
[0255] When the current block's tree type is single-tree, NSPT can be applied to the luma component of the current block, but cannot be applied to the chroma component. When the current block's tree type is two-tree, NSPT can be applied to both the luma and chroma components of the current block.
[0256] Alternatively, regardless of whether the current block's tree type is single-tree, NSPT can be applied to the luma component of the current block, and NSPT can be excluded from application to the chroma component of the current block. Alternatively, regardless of whether the current block's tree type is single-tree, NSPT can be applied to both the luma and chroma components of the current block.
[0257] As an example, when the current block's tree type is single-tree and NSPT is allowed for both the luma and chroma components, and the current block size belongs to the first group, an NSPT index can be signaled, and the luma and chroma components of the current block can share the corresponding NSPT index. Here, the NSPT index can be any of the transform kernel candidates used for NSPT. When the luma and chroma blocks of the current block belong to the first group, the transform kernel candidate selected by the same NSPT index can be applied to both the luma and chroma components. When the current block's tree type is single-tree and NSPT is applied only to the luma component, LFNST can be omitted, and a split transform can be applied to the chroma component of the current block. Alternatively, when the current block's tree type is single-tree and NSPT is applied only to the luma component, LFNST can be applied to the chroma component of the current block.
[0258] In a single-tree configuration, there can be a high correlation between the luma and chroma components. In this case, unnecessary signaling can be reduced and compression efficiency improved by applying the NSPT only to the luma component or by applying the transform kernel candidate selected by a single NSPT index to both the luma and chroma components. On the other hand, for non-single-tree configurations, the luma and chroma components have independent partitioning and coding structures. In this case, sending the NSPT index as a signal for each component can reflect the characteristics of each component and improve compression efficiency.
[0259] An NSPT kernel for NSPT can be derived based on at least one of the symmetries between intra-prediction modes or between block shapes. As an example, an NSPT kernel can be derived as an NSPT kernel corresponding to at least one of the following: a mode symmetrical to the intra-prediction mode of the current block or a block shape symmetrical to the block shape of the current block. Alternatively, an NSPT kernel can be derived based on an NSPT set including one or more NSPT kernel candidates, wherein the NSPT set can be derived as an NSPT set corresponding to at least one of the following: a mode symmetrical to the intra-prediction mode of the current block or a block shape symmetrical to the block shape of the current block. Any one of the one or more NSPT kernel candidates belonging to the NSPT set can be set as the NSPT kernel for the current block. For this purpose, an NSPT index specifying any one of the one or more NSPT kernel candidates belonging to the NSPT set can be used. The NSPT index can be transmitted via a bitstream signal or can be derived based on the aforementioned symmetries.
[0260] Symmetry may exist between at least two of the predefined intra-prediction modes in the decoding device. Hereinafter, for ease of description, the symmetry around the top-left diagonal mode (i.e., mode 34) is described. Reference Figure 5 Symmetry exists among the orientation patterns. Excluding the planar pattern (number 0) and the DC pattern (number 1), all patterns have a predicted direction. Patterns 2 through 66 can be named ordinary orientation patterns (which can be expressed as [2, 66]), and patterns -14 through -1 (which can be expressed as [-14, -1]) and patterns 67 through 80 (which can be expressed as [67, 80]) can be named wide orientation patterns. Wide orientation patterns can include at least one of patterns having a value less than -14 or a value greater than 80. (See reference...) Figure 5All patterns excluding patterns 0 and 1 are symmetric about pattern 34. Specifically, pattern x and pattern (68 - x) are symmetric about pattern [2, 66], and pattern x and pattern (66 - x) are symmetric about patterns [-14, -1] and [67, 80]. The same symmetry can be established between patterns [N, -1] and [67, 66 - N]. Here, N can be an integer less than or equal to -14.
[0261] Furthermore, regarding the symmetry between block shapes, MxN blocks and NxM blocks can be defined as symmetrical blocks. Here, M and N can be the same or different. Alternatively, M1xN1 blocks and M2xN2 blocks can be defined as symmetrical blocks when the aspect ratio (M1 / N1) of M1xN1 block is the same as the aspect ratio (N2 / M2) of M2xN2 block.
[0262] Within a square block, mutually symmetrical patterns can share at least one of the NSPT set, NSPT index, or NSPT kernel. In other words, at least one of the NSPT set, NSPT index, or NSPT kernel used for any one symmetrical pattern can be equally applied to another symmetrical pattern.
[0263] As an example, symmetrical patterns can share a single NSPT kernel. However, for any one of the symmetrical patterns, the corresponding NSPT kernel can be applied to the input data, and for the other symmetrical pattern, the corresponding NSPT kernel can be applied after the transpose operation is applied to the input data. Specifically, when pattern x belongs to pattern [2, 33], for pattern x, a 1D vector can be configured for the MxM block of input data according to column priority, and the NSPT kernel can be applied to the corresponding 1D vector. Here, configuring the 1D vector according to column priority involves reading the input data column by column from the MxM block of input data to obtain M columns, and arranging them in order to configure the 1D vector. On the other hand, for the pattern (68-x) symmetrical to pattern x, a 1D vector can be configured according to row priority, and the corresponding identical NSPT kernel can be applied to the corresponding 1D vector. Here, configuring the 1D vector according to row priority involves reading the input data row by row from the MxM block of input data to obtain M rows, and arranging them in order to configure the 1D vector. When mode x belongs to mode [N, -1] (N≤-14), 1D vectors can be configured according to row priority for the mode (66–x) symmetrical to mode x, and the same NSPT kernel as for mode x can be applied to the corresponding 1D vector. Column priority or row priority can be applied to modes 0 and 1, and column priority or row priority can also be applied to mode 34. Additionally, row priority can be applied to intra-prediction modes belonging to mode [2, 33], and column priority can be applied to modes symmetrical to the corresponding intra-prediction modes. Row priority can be applied to intra-prediction modes belonging to mode [N, -1], and column priority can be applied to modes symmetrical to them.
[0264] For non-square blocks, in addition to the symmetry between intra-prediction modes, the symmetry between block shapes can be further considered. A non-square block with width and height of M and N can be considered to have a symmetric relationship with other non-square blocks with width and height of N and M. For example, in mode [2, 66], mode x of an MxN block may have symmetry with mode (68-x) of an NxM block. Similarly, when mode x of an MxN block belongs to mode [N, -1] (N ≤ -14), mode x of an MxN block may have symmetry with mode (66-x) of an NxM block.
[0265] The method for configuring a 1D vector from the input data block is as described above. In other words, when applying column priority to pattern x, row priority can be applied to the pattern symmetrical to it. Alternatively, when applying row priority to pattern x, column priority can be applied to the pattern symmetrical to it. Specifically, when applying column priority to pattern x, M columns can be obtained by reading input data column by column from the MxN block as input data, and they can be arranged in order to configure a 1D vector. Here, the length of each column can be N. For the pattern symmetrical to pattern x, N rows can be obtained by reading input data row by row from the MxN block as input data, and they can be arranged in order to configure a 1D vector. Here, the length of each row can be M. Alternatively, when applying row priority to pattern x, N rows can be obtained by reading input data row by row from the MxN block as input data, and they can be arranged in order to configure a 1D vector. Here, the length of each row can be M. For a pattern symmetric to pattern x, M columns can be obtained by reading input data column by column from an MxN block that serves as input data, and then arranging them sequentially to configure a 1D vector. Here, the length of each column can be N.
[0266] When the current block is an MxN block with mode x and the aforementioned symmetry is applied to the current block, the NSPT set and / or NSPT kernel of the current block can be determined based on at least one of an intra-prediction mode symmetric to mode x or an NxM block size symmetric to the MxN block size. Here, the NSPT kernel can be set to the NSPT kernel for the NxM block, rather than the NSPT kernel for the MxN block. In other words, when the symmetry is applied to the current block, the NSPT set and / or NSPT kernel of the block with symmetry to the current block can be used in the same manner. As described above, 1D vectors can be configured from the input data block according to a predetermined priority, which can correspond to the input of the NSPT kernel.
[0267] Additionally, there may be a restriction that symmetry is only used when the value of the intra-prediction mode of the current block is greater than 34. In other words, when the value of the intra-prediction mode of the current block is greater than 34, the transpose operation can be applied when configuring 1D vectors from the input data block, and an NSPT set or NSPT kernel corresponding to the block shape and / or mode with symmetry for the current block can be used. Specifically, symmetry may not be applied to the current block when the intra-prediction mode of the current block belongs to modes [N, -1] and [2, 34]. On the other hand, symmetry may be applied to the current block when the intra-prediction mode of the current block belongs to modes [35, 66] and [67, 66 -N]. Here, N can be an integer less than or equal to -14.
[0268] The symmetry-based NSPT set or NSPT kernel can be adaptively derived based on the size of the current block. As an example, for 4x4 blocks and 8x8 blocks, the NSPT set or NSPT kernel can be derived based on symmetry; and for 4x8 blocks and 8x4 blocks, the NSPT set or NSPT kernel can be derived without symmetry.
[0269] The number of available NSPT sets may vary depending on whether symmetry is used. For example, the number of available NSPT sets can be 35 when symmetry is used, and 67 when symmetry is not used.
[0270] Table 9 below illustrates an example of determining the NSPT set using symmetry and shows the mapping between the NSPT set and the intra-frame prediction mode when the number of available NSPT sets is 35.
[0271] [Table 9]
[0272] Referring to Table 9, when the value (X) of the intra-prediction mode of the current block is less than 0, the NSPT set of the current block can be determined as the NSPT set with NSPT set index 2 among the 35 NSPT sets. When the value (X) of the intra-prediction mode of the current block is greater than or equal to 0 and less than or equal to 34, the NSPT set of the current block can be determined as the NSPT set with NSPT set index X among the 35 NSPT sets. When the value (X) of the intra-prediction mode of the current block is greater than or equal to 35 and less than or equal to 66, the NSPT set of the current block can be determined as the NSPT set with NSPT set index (68-X) among the 35 NSPT sets. When the value (X) of the intra-prediction mode of the current block is greater than or equal to 35 and less than or equal to 66, the NSPT set of the current block can be the same as the NSPT set corresponding to the value (68-X) of the symmetric mode of the intra-prediction mode of the current block. Similarly, when the value (X) of the intra-prediction mode of the current block is greater than 66, the NSPT set of the current block can be determined as the NSPT set with NSPT set index 2 among the 35 NSPT sets. When the value (X) of the intra-prediction mode of the current block is greater than 66, the NSPT set of the current block can be the same as the NSPT set corresponding to the symmetric mode of the intra-prediction mode of the current block.
[0273] Table 10 below provides an example of determining the NSPT set without using symmetry, and shows the mapping between the NSPT set and the intra-prediction mode when the number of available NSPT sets is 67.
[0274] [Table 10]
[0275] Referring to Table 10, when the value (X) of the intra-prediction mode for the current block is less than 0, the NSPT set for the current block can be determined as the NSPT set with NSPT set index 2 among the 67 NSPT sets. When the value (X) of the intra-prediction mode for the current block is greater than or equal to 0 and less than or equal to 66, the NSPT set for the current block can be determined as the NSPT set with NSPT set index X among the 67 NSPT sets. Similarly, when the value (X) of the intra-prediction mode for the current block is greater than 66, the NSPT set for the current block can be determined as the NSPT set with NSPT set index 66 among the 67 NSPT sets.
[0276] Symmetry can be leveraged to save memory size required to store the transform core, while maintaining performance depending on the application of the transform. For example, by using 35 NSPT sets instead of 67 NSPT sets, the memory size required to store the NSPT core can be significantly reduced.
[0277] The number of available NSPT sets and / or the number of NSPT kernel candidate sets belonging to an NSPT set can vary depending on the block size. For example, the number of available NSPT sets for a 4x4 block can be 35, the number of available NSPT sets for 4x8 and 8x4 blocks can be 19, and the number of available NSPT sets for an 8x8 block can be 10. An NSPT set for a 4x4 block can consist of three NSPT kernel candidates, an NSPT set for 4x8 and 8x4 blocks can consist of three or two NSPT kernel candidates, and an NSPT set for an 8x8 block can consist of one NSPT kernel candidate.
[0278] The size of the transform kernel can increase with the block size. Therefore, the number of available NSPT sets and / or the number of NSPT kernel candidates belonging to the NSPT sets can be reduced to save memory space required to store the transform kernels. Furthermore, as the block size increases, the characteristics of the residual signal within the corresponding block tend to become more generalized. Therefore, reducing the number of available NSPT sets and / or the number of NSPT kernel candidates belonging to the NSPT sets can help maintain compression efficiency while reducing implementation complexity by reflecting these statistical characteristics.
[0279] Example 4
[0280] An inseparable transformation can be applied to the current block to which a subblock transformation (SBT) is applied. Here, an inseparable transformation can include at least one of the aforementioned LFNST or NSPT.
[0281] The SBT according to this disclosure can be applied to blocks coded by inter-frame prediction (hereinafter referred to as inter-blocks). SBT refers to a method for partitioning a block into two sub-blocks and applying the transform to the residual data of only one of the two sub-blocks. In this case, the residual data of the other sub-block can be set to 0.
[0282] A block can be divided into two sub-blocks horizontally or vertically. A block can be divided in a ratio of 1:1, 1:3, or 3:1. Information indicating the location of the sub-blocks to which the (inverse) transform is applied within the two sub-blocks can be encoded and transmitted as a signal via a bitstream. For example, when dividing a block horizontally, the sub-block in the top or bottom sub-block that encodes / decodes the residual data can be specified based on the information transmitted as a signal. When dividing a block vertically, the sub-block in the left or right sub-block that encodes / decodes the residual data can be specified based on the information transmitted as a signal. In the following text, for ease of description, the sub-block that decodes the residual data will be referred to as the first sub-block, and the sub-block with residual data of 0 will be referred to as the second sub-block.
[0283] When SBT is applied to the current block, the transformation type (trTypeHor, trTypeVer) for the SBT can be determined based on the size of the first sub-block within the current block. Here, trTypeHor represents the horizontal transformation type applied to the first sub-block in the horizontal direction, and trTypeVer represents the vertical transformation type applied to the first sub-block in the vertical direction. For example, if either the width or height of the first sub-block within the current block is greater than or equal to 64, the DCT-2 pair (i.e., the transformation type of (DCT-2, DCT-2)) can be applied to the first sub-block. Otherwise, one of the combinations of DST-7 and DCT-8 can be adaptively applied to the first sub-block. Here, the combination of DST-7 and DCT-8 can include at least one of (DST-7, DST-7), (DST-7, DCT-8), (DCT-8, DST-7), or (DCT-8, DCT-8). One of the above combinations of DST-7 and DCT-8 can be selected based on information indicating whether it is a horizontal partition or the position of the first sub-block. When SBT is applied to the current block, one of the combinations of DST-7 and DCT-8 can be applied only to the luminance component of the current block, and the DCT-2 pair can be applied to the chrominance component of the current block.
[0284] When the SBT is applied to the current block, the inseparable transformation can be applied to the first sub-block within the current block.
[0285] As an example, SBT can be applied to the current block, and the inseparable transform of LFNST can be applied to the first sub-block of the current block. In this case, the transform type for the master-inverse transform of the first sub-block can be determined as a DCT-2 pair. For example, when the LFNST index of the current block (or the first sub-block) is greater than 0, the signaling for the MTS index of the current block (or the first sub-block) can be omitted, and the MTS index can be derived as 0. An MTS index of 0 can indicate a DCT-2 pair. From the encoder's perspective, when it is determined that the auxiliary transform of LFNST is applied to the first sub-block, the master transform based on the DCT-2 pair can be applied to the first sub-block, and then the auxiliary transform of LFNST can be applied. Alternatively, even when the first sub-block has a block size that allows the application of the combination of DST-7 and DCT-8 described above, when it is determined that the auxiliary transform of LFNST is applied to the first sub-block, the master transform based on the DCT-2 pair can be applied to the first sub-block, and then the auxiliary transform of LFNST can be applied.
[0286] Alternatively, when the SBT is applied to the current block and the transform type for the master-inverse transform of the first sub-block is a DCT-2 pair, an inseparable LFNST transform can be allowed for the first sub-block, and the LFNST index can be signaled for the current block (or the first sub-block). From the encoder's perspective, when it is determined that a DCT-2 pair is used as the transform type for the master transform of the first sub-block, an inseparable LFNST transform can be allowed for the first sub-block, and for the current block (or the first sub-block), the LFNST index can be encoded and signaled via a bitstream.
[0287] Alternatively, the SBT can be applied to the current block, and the inseparable transform of LFNST can be applied to the first sub-block of the current block. In this case, the transform type for the main inverse transform of the first sub-block can be determined as one of the combinations of DST-7 and DCT-8 described above. For example, when the LFNST index for the current block (or the first sub-block) is greater than 0, the MTS index can be sent to the current block (or the first sub-block) using a signal. Here, the MTS index can indicate one of the combinations of DST-7 and DCT-8 described above. From the encoder's perspective, when it is determined that the auxiliary transform of LFNST is applied to the first sub-block, the main transform can be applied to the first sub-block based on one of the combinations of DST-7 and DCT-8, and then the auxiliary transform of LFNST can be applied.
[0288] Alternatively, when SBT is applied to the current block and the transform type for the master-inverse transform of the first sub-block is one of a combination of DST-7 and DCT-8, an inseparable transform of LFNST can be allowed for the first sub-block, and the LFNST index can be signaled for the current block (or the first sub-block). From the encoder's perspective, when one of the combinations of DST-7 and DCT-8 is determined to be used as the transform type for the master transform of the first sub-block, an inseparable transform of LFNST can be allowed for the first sub-block, and for the current block (or the first sub-block), the LFNST index can be encoded and signaled via a bitstream.
[0289] Alternatively, the SBT can be applied to the current block, and the inseparable transform of LFNST can be applied to the first sub-block of the current block. In this case, the transform type for the main inverse transform of the first sub-block can be determined as one of a plurality of transform type candidates. Here, the plurality of transform type candidates may include at least one of a DCT-2 pair or a combination of DST-7 and DCT-8. For example, when the LFNST index for the current block (or the first sub-block) is greater than 0, the MTS index can be signaled for the current block (or the first sub-block). Here, the MTS index can indicate one of the plurality of transform type candidates. From the encoder's perspective, when it is determined that the auxiliary transform of LFNST is applied to the first sub-block, the main transform can be applied to the first sub-block based on one of the plurality of transform type candidates, and then the auxiliary transform of LFNST can be applied.
[0290] Alternatively, when the SBT is applied to the current block and the transform type for the master-inverse transform of the first sub-block is one of several transform type candidates, an inseparable transform of LFNST can be allowed for the first sub-block, and the LFNST index can be signaled for the current block (or the first sub-block). From the encoder's perspective, when one of the several transform type candidates is determined to be used as the transform type for the master transform of the first sub-block, an inseparable transform of LFNST can be allowed for the first sub-block, and for the current block (or the first sub-block), the LFNST index can be encoded and signaled via a bitstream.
[0291] When SBT is applied to inter-frame blocks, a transform set for the inseparable transform should be selected to apply the inseparable transform to the corresponding block. Here, the transform set can be called the LFNST set or the NSPT set. For intra-frame blocks, the transform set can be selected based on the intra-prediction mode of the corresponding block using a pre-determined mapping table (or mapping rule). In the encoding and decoding apparatus, the transform set applied to each intra-prediction mode can be predefined. However, since inter-frame blocks do not have intra-prediction modes, the intra-prediction mode for selecting the transform set can be derived. The transform set can be selected based on the derived intra-prediction mode using the same or similar method as for intra-frame blocks. Even for inter-frame blocks, the same transform kernel for the inseparable transform can be applied as for intra-frame blocks, or the same mapping table (or mapping rule) can be applied.
[0292] The following section describes a method for deriving an intra-prediction mode for selecting the transform set of inter-frame blocks. This intra-prediction mode for selecting the transform set can also be referred to as a Virtual Intra-Prediction Mode (VIPM).
[0293] Intra-predictive mode can be derived for the current block using the decoder-side intra-mode derivation (DIMD) method. Here, the current block can be a block to which at least one of inter-frame prediction or SBT is applied. When SBT is applied to the current block, the region to which the (inverse) transform is applied can correspond to a portion of the entire block. Here, the entire block can be a coded block for the luma component (when SBT is applied to the luma component) or a coded block for the chroma component (when SBT is applied to the chroma component). The portion can be one of two sub-blocks belonging to the entire block (i.e., the first sub-block).
[0294] The virtual intra-prediction mode for the current block (or the first sub-block) can be derived by applying the DIMD method based on the prediction block of the first sub-block within the entire block, rather than by applying the DIMD method based on the prediction block of the entire block. Alternatively, the virtual intra-prediction mode for the current block (or the first sub-block) can be derived by applying the DIMD method based on the prediction block of the entire block, rather than by applying the DIMD method based only on the prediction block of the first sub-block within the entire block. In this case, because the virtual intra-prediction mode is derived by utilizing more samples, the accuracy of the intra-prediction mode can be increased, thereby improving coding performance. In the following text, for ease of description, the DIMD method will be applied based on the current block (i.e., the entire block). However, this is not a limitation, and of course, the "current block" can be understood by being replaced by "first sub-block".
[0295] According to the DIMD method, horizontal and / or vertical gradient values can be calculated by applying pre-determined filters to at least one of the neighboring regions of the current block or in the prediction block, and a specific intra-frame prediction mode can be derived based on the calculated gradient values. The set of transforms to be applied to the current block can be selected based on the specific intra-frame prediction mode. References will be made below. Figure 6 This section describes in detail the method for deriving intra-prediction modes based on the DIMD method.
[0296] refer to Figure 6 The initial sample position can be set, and the cumulative gradient value for all intra-prediction modes can be initialized (S600). The gradient value for the current sample position can be calculated (S610). The intra-prediction mode for accumulating the calculated gradient value can be selected (S620). The gradient value calculated in S610 can be added to the cumulative gradient value for the selected intra-prediction mode (S630). The aforementioned processes S610 to S630 can be performed for each of all or some sample positions belonging to the current block until the next sample position does not exist. When the next sample position does not exist, the intra-prediction mode with the largest cumulative gradient value can be selected (S640).
[0297] The process of applying the DIMD method to the neighboring regions of the current block is described in detail below. For ease of description, it is assumed that the neighboring regions are blocks with a width of W and a height of H. However, as mentioned above, the DIMD method can be applied based on the predicted blocks of the current block, and in the following text, "neighboring regions" can be understood by replacing it with "predicted blocks".
[0298] A filter of size PxQ can be applied inside a WxH block. The PxQ filter can be a 2D filter or a 1D filter. However, for samples adjacent to the block boundary, the filter may deviate from the block boundary. The filter can be applied only to the internal region of the WxH block that excludes samples adjacent to the block boundary. Specifically, the filter can be applied only to sample locations belonging to the (W-P+1)x(H-Q+1) region that is the internal region of the WxH block. As an example, when the filter size is 3x3, a 3x3 2D filter can be applied only to sample locations belonging to the (W-2)x(H-2) block that excludes samples from the WxH block and has an edge of length 1. In such a case, the 2D filter can be applied to the 3x3 region at that sample location where the sample (hereinafter referred to as the base sample) is the center sample.
[0299] A base sample and at least one neighboring sample adjacent to the base sample can be input into a 2D filter. Here, a neighboring sample can include a sample that is adjacent to at least one of the following: top, left, bottom, right, upper left, upper right, lower left, or lower right of the base sample.
[0300] The gradient values for a specific intra-prediction mode can be derived by applying a filter to each sample location belonging to the interior region. The derived gradient values can be added to a pre-derived gradient value for that specific intra-prediction mode, and through this process, a cumulative gradient value can be derived for that specific intra-prediction mode. When the filter is applied to all sample locations belonging to the interior region, cumulative gradient values can be derived for all intra-prediction modes (or directional modes excluding non-directional modes). The intra-prediction mode with the largest cumulative gradient value can be selected, and this can be set to the intra-prediction mode derived based on the DIMD method (hereinafter referred to as the DIMD mode).
[0301] like Figure 7 The two types of 3x3 filters shown can be used as filters for deriving DIMD modes, and these will be referred to as filterY and filterX, respectively. When the position where filterY and filterX are applied (i.e., the position of the base sample) is E, the positions of the samples associated with the application of the two filters can be represented as A, B, C, D, E, F, G, H, and I. When the values obtained by applying filterY and filterX are represented as iDy and iDx, respectively, iDy and iDx can be calculated as follows.
[0302] [Formula 6]
[0303] When iDy and iDx are 0, the subsequent processes (i.e., obtaining gradient values, selecting a specific intra-prediction mode, and adding the gradient values to the accumulated gradient values for the specific intra-prediction mode) can be omitted, and the process can proceed to the next sample position where the filter is to be applied.
[0304] The function "abs" can be a function that obtains and returns the absolute value with respect to the input value. The value of iAmp can be calculated as follows. This can correspond to Figure 6 Step S610.
[0305] [Formula 7]
[0306] When at least one of iDx or iDy is 0, iAngUneeven, as the value of the specific intra-frame prediction mode to be selected, can be determined as shown in Equation 8 below. For the case where at least one of iDx or iDy is 0, this can correspond to... Figure 6 Step S620.
[0307] [Formula 8]
[0308] In Equation 8, VER_IDX and HOR_IDX can represent vertical mode and horizontal mode, respectively. For example, VER_IDX and HOR_IDX can correspond to... Figure 5 The values for iDx and iDy are in patterns 50 and 18, respectively. A value of 0 for iDx can mean that the variation in the vertical direction is zero or negligible. In other words, it can mean that the samples are considered to have the same sample value in the vertical direction, and therefore the prediction performs well in the vertical direction. Conversely, a value of 0 for iDy (i.e., when the value of iDx is not 0) can mean that the variation in the horizontal direction is zero or negligible. In other words, it can mean that the samples are considered to have the same sample value in the horizontal direction, and therefore the prediction performs well in the horizontal direction.
[0309] When neither iDx nor iDy is 0, iAngUneeven, as the value to be selected for a specific intra-frame prediction mode, can be determined as follows. For the case where neither iDx nor iDy is 0, this can correspond to... Figure 6 Step S620.
[0310] First, the values of the intra-prediction mode (especially the values of the directional mode) can be classified into four groups as follows.
[0311] The first group (region 0) can consist of a specific number of patterns with horizontal orientation. As an example, the first group can consist of patterns with values less than or equal to the horizontal pattern. Figure 5 In this context, the horizontally oriented patterns are patterns 2 through 34 (assuming pattern 34 can be excluded from the horizontally oriented patterns), and the value of the horizontal pattern is 18 (i.e., the horizontal pattern is pattern 18). When the specific quantity is N, the first group can consist of {18, 18-1, 18-2, ..., 18-(N-1)} patterns. When N is 17, the first group can consist of {18, 17, 16, ..., 2} patterns.
[0312] The second group (region 1) can consist of a specific number of patterns with horizontal orientation. As an example, the second group can consist of patterns with values greater than or equal to the horizontal pattern. When the specific number is N, the second group can consist of {18, 18+1, 18+2, ..., 18+(N-1)} patterns. When N is 17, the second group can consist of {18, 19, 20, ..., 34} patterns.
[0313] The third group (region 2) can consist of a specific number of patterns with vertical orientation. As an example, the third group can consist of patterns with values less than or equal to the vertical pattern value. Figure 5 In this context, the vertically oriented patterns are patterns 34 to 66 (assuming pattern 34 is not included in the vertically oriented patterns), and the value of the vertical pattern is 50 (i.e., the vertical pattern is pattern 50). When the specific quantity is N, the third group can consist of {50, 50-1, 50-2, ..., 50-(N-1)} patterns. When N is 17, the third group can consist of {50, 49, 48, ..., 34} patterns.
[0314] The fourth group (region 3) can consist of a specific number of patterns with vertical orientation. As an example, the fourth group can consist of patterns with values greater than or equal to the vertical pattern. When the specific number is N, the fourth group can consist of {50, 50+1, 50+2, ..., 50+(N-1)} patterns. When N is 17, the fourth group can consist of {50, 51, 52, ..., 66} patterns.
[0315] When groups for intra-frame prediction modes are defined as described above, identifiers indicating specific groups can be calculated as shown in Table 11 below. The identifiers corresponding to groups one through four are 0, 1, 2, and 3, respectively.
[0316] [Table 11]
[0317] In Table 11, gtY indicates whether the change in the vertical direction is greater than the change in the horizontal direction. That is, gtY is derived as 1 when absx is greater than absy, and otherwise, gtY is derived as 0. Here, when gtY is 1, it indicates that the change in the vertical direction is greater than the change in the horizontal direction, and when gtY is 0, it indicates that the change in the vertical direction is less than or equal to the change in the horizontal direction. A large change in the vertical direction may mean a low probability of performing a good prediction in the vertical direction. In this case, region 0 or region 1 can be selected. That is, mapXgrY1[signy][signx] can be selected as the identifier for the region. A large change in the horizontal direction may mean a low probability of performing a good prediction in the horizontal direction. In this case, region 2 or region 3 can be selected. That is, mapXgrY0[signy][signx] can be selected as the identifier for the region.
[0318] Additionally, for the horizontal direction, a positive change indicates the right, and a negative change indicates the left. For the vertical direction, a positive change indicates the bottom, and a negative change indicates the top. When the symbols used for iDx and iDy are the same, region 1 or region 2 can be selected; and when the symbols used for iDx and iDy are different, region 0 or region 3 can be selected.
[0319] Next, the ratio values that have been scaled to integer values can be obtained, as shown in Table 12 below.
[0320] [Table 12]
[0321] In Table 12, (1 << 16) means that 1 has been shifted 16 to the left, which represents 65536, that is, 2 16 The `intFunc` function can be used to convert `fRatioScaled`, expressed as a decimal value, to an integer value. To convert to an integer value, operations such as rounding, rounding up, or rounding down can be applied. Alternatively, casting functions such as `int` provided by the C / C++ library can be used to perform the conversion to an integer value. In Table 12, a division operation (` / `) might be needed to obtain the ratio of `absy` and `absx`. However, when implementing a codec (especially in hardware), there are issues with the costly or difficult implementation of division operations. Therefore, approximating division using a combination of several integer operations is often advantageous in terms of implementation. Thus, the formula used to obtain the ratio can be approximated by the formulas in Table 13 below.
[0322] [Table 13]
[0323] The integerized ratio value can be determined using the process shown in Table 13. The position in the "angTable" to which this ratio value is closest can be determined, as shown in Table 14 below.
[0324] [Table 14]
[0325] According to Table 14, the angTable can consist of 17 entries. When each of the aforementioned groups consists of 17 modes, each of the intra-prediction modes constituting each group can correspond to an entry. In Table 14, the entry of the angTable that is closest to the ratio value is found, and the value of idx is derived based on the index of the angTable corresponding to that entry.
[0326] [Table 15]
[0327] Table 15 is described using C / C++ syntax. The value of the intra-prediction mode, which accumulates gradient values for the current sample position within the current block where the filter is applied, can be assigned to iAngUneven. The previously derived region can be an identifier indicating any of the four groups. offsets[region] can represent the starting value for the intra-prediction mode of the group indicated by that region, and dirs[region] can represent the directionality of the corresponding intra-prediction mode increase or decrease. idx can represent the position within angleTable. Therefore, the value of iAngUneven can be determined by the equations shown in Table 15.
[0328] The procedure for adding cumulative gradient values for a specific intra-prediction mode selected can be performed as follows.
[0329] [Formula 9]
[0330] In Equation 9, the "piHistogram" array is an array storing the cumulative gradient values for all intra-prediction modes. After being initialized to 0, it allows gradient calculation while iterating through the sample locations within the current block where the filter is applied. In this case, the gradient value calculated for the current sample location is added to the cumulative gradient value for the selected specific intra-prediction mode. As described above, the value of the intra-prediction mode selected for the current sample location where the filter is applied can be stored in iAngUneven, and the gradient value calculated for the corresponding sample location can be stored in iAmp. The cumulative gradient value for the intra-prediction mode indicated by iAngUneven can be stored in piHistogram[iAngUneven]. Equation 9 can correspond to... Figure 6 Step S630.
[0331] When the loop completes for all sample locations within the internal region of the block to which the filter is applied, the cumulative gradient values for all intra-prediction modes are stored in the "piHistogram" array. One or more modes with the largest cumulative gradient values can be selected. The selected mode can be referred to as the DIMD mode. This can correspond to... Figure 6 Step S640 in the process.
[0332] [Table 16]
[0333] According to Table 16 of the C / C++ syntax description, ENUM_LUMA_MODE can represent the number of all intra-prediction modes available as DIMD modes. For example, ENUM_LUMA_MODE can be 67. According to Table 16, the value of the intra-prediction mode with the largest cumulative gradient value can be assigned to the "firstMode" variable. Therefore, the intra-prediction mode corresponding to the last determined value of the "firstMode" variable can be set to the DIMD mode used for the WxH block.
[0334] As described above, when SBT is applied to the current block, the current block can be divided into two sub-blocks, and the (inverse) transform can be applied only to the first sub-block. The residual data of the second sub-block can be set to 0. Based on the size of the first sub-block, it can be determined whether LFNST or NSPT is applied to the current block (or the first sub-block). Based on the size of the first sub-block, it can be determined whether LFNST and NSPT are not applied.
[0335] For example, the size where NSPT is allowed instead of LFNST can be predefined in the encoding and decoding devices. When the size of the first sub-block within the current block corresponds to the size where NSPT is allowed instead of LFNST, NSPT can be applied and LFNST can be omitted from the first sub-block. Assume that the size where NSPT is allowed instead of LFNST includes a 4x8 block. If an 8x8 encoded block is partitioned vertically and then divided into two 4x8 sub-blocks, NSPT can be applied and LFNST can be omitted from the 4x8 block corresponding to the first sub-block.
[0336] The sizes in which LFNST and NSPT are not allowed can be predefined in the encoding and decoding devices. When the size of the first sub-block generated by partitioning the current block corresponds to the size in which LFNST and NSPT are not allowed, LFNST and NSPT may not be applied to the first sub-block.
[0337] When encoding the current block using a single-tree structure, the current block can include a luma block and a chroma block. The chroma block can include Cb blocks and Cr blocks. SBT can be applied to the luma and chroma blocks of the current block, respectively. In this case, inseparable transformations are allowed for the luma block, and inseparable transformations are not allowed for the chroma block. Alternatively, inseparable transformations can be allowed for the luma and chroma blocks separately. Even when inseparable transformations are allowed for the luma and chroma blocks separately, inseparable transformations may not be applied to the corresponding component blocks if the size of the first sub-block within each component block corresponds to the size where inseparable transformations are not allowed. For example, when the color format is 4:2:0, the luma block size can be M x N, and the chroma block size can be (M / 2) x (N / 2). It is assumed that inseparable transformations are allowed only when both the width and height of the transform block are greater than or equal to 4. When the luma block size is 8x8 and the luma block is partitioned into two 8x4 sub-blocks due to the application of SBT, inseparable transformations can be applied to the 8x4 sub-blocks corresponding to the first sub-block. On the other hand, the chroma block corresponding to the luma block can have a block size of 4×4 and can be partitioned into two 4x2 sub-blocks using the same partitioning method as the luma block. In this case, because the height of the sub-blocks of the chroma block is less than 4, the inseparable transformation can be excluded from being applied to the 4x2 sub-block corresponding to the first sub-block of the chroma block.
[0338] The selected transform set can include multiple transform kernel candidates. An index indicating any one of the multiple transform kernel candidates can be sent by signaling. Depending on the type of the inseparable transform, this index can be expressed as an NSPT index or an LFNST index. Alternatively, the index can be expressed as a transform index.
[0339] An index can be sent using signals based on whether a predetermined clearing condition is met. For the encoder, when applying an inseparable transform to the current block (or the first sub-block), the current block (or the first sub-block) must meet the clearing condition. For the SBT, since the non-zero transform coefficients are not present in the second sub-block, the clearing condition can be checked only for the first sub-block.
[0340] The zeroing conditions will be described in detail below. When the forward inseparable transform is applied to an MxN block, R transform coefficients are generated. For LFNST, LFNST can be applied to the region of interest (ROI) within the MxN block. When the input data is multiplied by an RxS matrix that serves as the forward transform matrix, R transform coefficients are generated (where R ≤ S ≤ M). The MxN block contains R transform coefficients arranged sequentially according to a predetermined scan order. An RxS matrix can refer to a transform matrix with an input length of S and an output length of R. Here, the input length can represent the number of transform coefficients (or residual samples) input to the inseparable transform, and the output length can represent the number of transform coefficients output from the inseparable transform. For example, the R transform coefficients can be arranged sequentially from the DC position according to a diagonal scan order, a horizontal scan order, or a vertical scan order. The region within the MxN block where the R transform coefficients are not arranged (hereinafter referred to as the remaining region) can be filled with 0. Transform coefficients belonging to the remaining region can be set to 0. When the forward inseparable transform is applied to the MxN block, all transform coefficients belonging to the remaining region are required to have a value of 0. Therefore, for the decoder, when a non-zero transform coefficient does not exist in the remaining region within the MxN block, it corresponds to the case where the zeroing condition is met. Conversely, when at least one non-zero transform coefficient exists in the remaining region within the MxN block, it corresponds to the case where the zeroing condition is not met.
[0341] When encoding the current block using a single-tree structure, the current block can include multiple color component blocks. These multiple color component blocks can include luma blocks and chroma blocks. An inseparable transform can be applied to a specific component, and it may not be applied to other components. In such cases, it can be configured to check whether a zeroing condition is met only for the specific component to which the inseparable transform is applied (e.g., the luma component). When the zeroing condition is met for the specific component to which the inseparable transform is applied, an index for the inseparable transform can be signaled.
[0342] Alternatively, even when the inseparable transform is applied only to specific components, it can be configured to check whether the zeroing condition is met for all components. However, the zeroing condition for components to which the inseparable transform is not applied can correspond to the zeroing condition when the transform kernel used for the inseparable transform is assumed to be applied to the corresponding color component block. The transform kernel in this document can be determined based on the size of the transform block (or first sub-block) corresponding to the corresponding color component block. When the zeroing condition is met for all components, an index can be sent to the current block (or first sub-block) with a signal.
[0343] When the SBT is applied to the current block, an index can be sent by signaling when a non-zero transform coefficient exists at a sample position other than the DC position in at least one of the color component blocks of the current block. The DC position can refer to the top-left sample position within a color component block (or the first sub-block of the color component block). Alternatively, an index can also be sent by signaling when a non-zero transform coefficient exists at a sample position other than the DC position in a color component block (or the first sub-block) for a specific component (e.g., the luminance component) to which the non-separable transform is applied.
[0344] When the SBT is applied to the current block, an index can be sent by signaling if non-zero transform coefficients exist in at least one first sub-block among all color component blocks of the current block. Alternatively, an index can be sent by signaling if non-zero transform coefficients exist in the first sub-block of the color component block for a specific component (e.g., the luminance component) to which the inseparable transform is applied.
[0345] Intra-block copy (IBC) modes can be applied to intra-slices (e.g., slice I) and inter-slices (e.g., slices P and B). Inseparable transforms can be applied to the blocks to which the IBC mode is applied. Here, an inseparable transform can include at least one of LFNST or NSPT.
[0346] The IBC mode is functionally similar to the inter-frame prediction mode, but the difference lies in that the reference block belongs to the pre-reconstructed region within the current image to which the current block belongs. For example, the reference block can be specified based on a block vector other than the motion vector. When the block vector's resolution is not an integer number of pixels, the reference block can be specified through an interpolation process. However, it is similar to inter-frame prediction because it derives the prediction block most similar to the current block. Therefore, even when applying the IBC mode, the method of deriving the Virtual Intra-Frame Prediction Mode (VIPM) based on the aforementioned DIMD method and selecting the transform set based on the VIPM can be applied in the same way.
[0347] When the IBC mode is applied to the current block, the inseparable transform can be applied to the luma component of the current block, but not to the chroma component. Alternatively, the IBC mode can also be applied to the chroma component. In this case, the inseparable transform can also be applied to the chroma component.
[0348] In a single-tree structure, the IBC mode can be applied to both the luma and chroma components. Here, a block vector for the luma component can be transmitted using signals. A block vector for the chroma component can be derived from the block vector for the luma component. For example, the block vector for the chroma component can be derived from the block vector for the luma component by considering color formats, etc.
[0349] When the IBC mode is applied to the current block and the current block is encoded in a single-tree structure, the inseparable transform can be applied to the luma component, but not to the chroma component. Alternatively, when the IBC mode is applied to the current block and the current block is encoded in a single-tree structure, the inseparable transform can be applied to both the luma and chroma components.
[0350] When encoding the current block using a single-tree structure, an inseparable transform can be applied to both the luma and chroma components based on an index. For example, one of several transform kernel candidates belonging to a transform set can be selected using an index. In this case, for the luma component, a transform set applicable to the size of the luma block can be selected, and for the chroma component, a transform set applicable to the size of the chroma block can be selected.
[0351] In a two-tree structure, the IBC mode can be applied to the luma component and may not be applied to the chroma component. Inseparable transforms can be applied to the luma component and may not be applied to the chroma component. For example, an IBC mode can be applied to blocks where the tree type is two-tree luma, and may not be applied to blocks where the tree type is two-tree chroma. Alternatively, the IBC mode can also be applied to the chroma component in a two-tree structure. In such a case, inseparable transforms can also be applied to the chroma component.
[0352] The following section will describe in detail the method for signaling the index for the non-separable transformation in IBC mode.
[0353] Depending on the tree type of the current block, the current block may include one or more color component blocks. For example, when the tree type of the current block is single-tree, the current block may include one luma block and two chroma blocks. Here, the two chroma blocks may include a Cb block and a Cr block. When the tree type of the current block is two-tree (specifically, when the tree type of the current block is two-tree luma), the current block may include one luma component block. When the tree type of the current block is two-tree (specifically, when the tree type of the current block is two-tree chroma), the current block may include two chroma blocks. The current block may be a coding unit or a coding block, and each color component block belonging to the current block may be a coding block or a transform block.
[0354] The current block or each color component block may include one or more transform blocks. For example, when an intra-fraction sub-partition (ISP) mode is applied to the current block, the current block can be divided into multiple transform blocks. Alternatively, when an ISP mode is applied to a specific color component block, that color component block can be divided into multiple transform blocks.
[0355] For all color component blocks belonging to the current block, an index can be sent to the current block if the non-zero transform coefficient exists at a sample position within the color component block other than the DC position (e.g., the top-left sample position within the corresponding color component block). An index can be omitted for the current block if there exists at least one color component block where the non-zero transform coefficient exists only at the DC position among all color component blocks belonging to the current block.
[0356] Alternatively, when a non-zero transform coefficient exists at a sample position other than the DC position within any of the multiple color component blocks belonging to the current block, an index can be sent to the current block using a signal.
[0357] The inseparable transformation may be permitted / applied only to any one of the color component blocks (e.g., the luma component block), and may not be permitted / applied to the other color component blocks. Alternatively, the inseparable transformation may be permitted / applied to two or more of the color component blocks.
[0358] When the current block is a block that utilizes inter-frame predictive coding, the current block can have a single-tree structure. The current block can be indexed by signal transmission when non-zero transform coefficients exist at sample locations other than the DC position within at least one of the multiple color component blocks belonging to such a current block.
[0359] When the tree type of the current block is single-tree, a signaled index can be used for the current block when at least one non-zero transform coefficient exists in the luma block of the current block (e.g., when the value of the coded block flag (CBF) used for the luma block is not 0). In the case of a single-tree tree, when it is limited to applying only the non-separable transform to the luma block, whether to use a signaled index can be determined by checking only the CBF used for the luma block. When no signaled index is used, the value of the index can be inferred as 0. When the value of the index is 0, it can indicate that the non-separable transform is not applied.
[0360] When the tree type of the current block is single-tree, an index can be sent to the current block when at least one non-zero transform coefficient exists in any of the multiple color component blocks belonging to the current block. In the case of a single-tree, the inseparable transform can be applied to all color component blocks constituting a coding unit, or it can be applied to only some color component blocks (e.g., luma or chroma blocks). Regardless of which color component block the inseparable transform is applied to, an index can be sent to the current block when at least one non-zero transform coefficient exists in any color component block. For example, an index can be sent to the current block when the CBF value of at least one of the color component blocks used for the current block is 1.
[0361] When the current block is a block using inter-frame predictive coding, the current block can have a dual-tree structure. When the tree type of the current block is dual-tree chroma (DUAL_TREE_CHROMA), the current block can be indexed by signal transmission only if there is at least one non-zero transform coefficient for an assigned color component block.
[0362] Here, the assigned color component block is assigned equally in both the encoding and decoding devices, and can be either a Cb or Cr block. Alternatively, information indicating the assigned color component block can be transmitted separately by signaling. Alternatively, the assigned color component block can be derived based on the adjacent context. For example, when applying the joint Cb-Cr mode, the color component block for which the transform coefficients are actually transmitted by signaling can be derived as the assigned color component block.
[0363] Alternatively, when the tree type of the current block is DUAL_TREE_CHROMA, an index can be sent to the current block if at least one non-zero transform coefficient exists in either the Cb block or the Cr block. For example, an index can be sent to the current block, which includes the corresponding Cb block and Cr block, even if at least one non-zero transform coefficient exists in either the Cb block or the Cr block but not in the other.
[0364] The current block can be a block using inter-frame predictive coding and includes multiple color component blocks. When no transform skip is used to encode all color component blocks belonging to the current block, an inseparable transform is allowed for the current block. When no transform skip is used to encode all color component blocks belonging to the current block, an inseparable transform can be applied to at least one of the color component blocks belonging to the current block. When transform skip is used to encode any one of the multiple color component blocks belonging to the current block, an inseparable transform may not be allowed / applied to all color component blocks belonging to the current block. For example, if the tree type of the current block is a single tree, when no transform skip is used to encode the luma and chroma blocks of the current block, an inseparable transform may be allowed / applied to at least one of the luma and chroma blocks of the current block. In other words, when transform skip is used to encode any one of the luma and chroma blocks of the current block, an inseparable transform may not be allowed / applied to the luma and chroma blocks of the current block.
[0365] When the current block is a block using inter-frame predictive coding, it can be determined whether to apply at least one of the multiple color component blocks belonging to the current block: transform skipping or non-separable transform. In this way, transform skipping or non-separable transform can be applied to each of the multiple color component blocks belonging to the current block. As an example, when the current block is a block using inter-frame predictive coding and the tree type of the current block is single-tree, the current block can consist of a luma block, a Cb block, and a Cr block, and each component block can consist of a transform block. In such a case, it can be determined whether to apply at least one of the transform skipping or non-separable transform for each color component block. When a non-separable transform is applied to at least one color component block, an index can be sent to the current block using a signal. On the other hand, when transform skipping is applied to all color component blocks (e.g., when the transform skipping flag for all color component blocks is 1), no index is sent to the current block using a signal, and non-separable transform may not be applied to all color component blocks.
[0366] When the current block using inter-frame predictive coding meets a predetermined zeroing condition, an index can be sent to the current block using a signal. For the encoder, when applying the non-separable transform to the current block using inter-frame predictive coding, the current block needs to meet the zeroing condition. The zeroing condition will be described in detail below. When applying the forward non-separable transform to an MxN block, R transform coefficients can be generated. In the case of LFNST, LFNST can be applied to the ROI (Region of Interest) region within the MxN block. The R transform coefficients can be generated by multiplying the input data by an RxS matrix (where R ≤ S ≤ M) which serves as the forward transform matrix. The transform coefficients are generated from the input length S and the output length R. The RxS matrix can be interpreted as a transform matrix with an input length S and an output length R. Here, the input length indicates the number of transform coefficients (or residual samples) input to the inseparable transform, and the output length indicates the number of transform coefficients output from the inseparable transform. For example, the R transform coefficients can be arranged sequentially from the DC position according to a diagonal scan order, a horizontal scan order, or a vertical scan order. Regions within the MxN block where the R transform coefficients are not arranged (hereinafter referred to as the remaining region) can be filled with 0. Transform coefficients belonging to the remaining region can be set to 0. When the forward inseparable transform is applied to the current block, all transform coefficients belonging to the remaining region are required to have a value of 0. Therefore, for the decoder, the case where no non-zero transform coefficient exists in the remaining region of the current block corresponds to the case where the zeroing condition is met. Conversely, the case where at least one non-zero transform coefficient exists in the remaining region of the current block corresponds to the case where the zeroing condition is not met.
[0367] When the tree type of the current block is a single tree, the current block can consist of multiple color component blocks. In this case, the inseparable transform can be applied to any one of the multiple color component blocks, and the inseparable transform may not be applied to the other color component blocks. For color component blocks that have not applied the inseparable transform, they can be configured such that a zeroing condition should be met regardless of whether the inseparable transform has been applied. However, the zeroing condition for color component blocks that have not applied the inseparable transform can correspond to the zeroing condition under the assumption that the transform kernel for the inseparable transform is applied to the corresponding color component block. Here, the transform kernel can be determined based on the size of the transform block corresponding to the corresponding color component block. For example, even when the inseparable transform is applied to the luma block of the current block but not to the Cb and Cr blocks, an index can still be sent to the current block when the zeroing condition is met for all color component blocks.
[0368] Alternatively, when the tree type of the current block is single-tree, the current block can consist of multiple color component blocks. In this case, the inseparable transform can be applied to any one of the multiple color component blocks, and the inseparable transform may not be applied to other color component blocks. When sending the index for the inseparable transform using a signal, it can be configured not to check whether the clearing condition is met for color component blocks to which the inseparable transform has not been applied. For example, when the current block is a block using inter-frame predictive coding and the tree type of the current block is single-tree, the inseparable transform can be applied only to the luma blocks of the current block. In this case, regardless of whether the clearing condition is met for Cb and Cr blocks to which the inseparable transform has not been applied, an index can be sent for the current block when the clearing condition is met for the luma blocks.
[0369] The index of a block that utilizes inter-frame predictive coding can be signaled in a different manner than that used for blocks utilizing intra-frame predictive coding.
[0370] As an example, the index of a block used for intra-frame predictive coding can be represented by applying a fixed-length binarization. For example, the index can have any value from 0 to 3. Here, an index of 0 indicates that no inseparable transform is applied, and indices of 1 to 3 can represent transform kernel candidates for the inseparable transform, respectively. The binary codes 00, 01, 10, and 11 can be assigned to indices of 0 to 3, respectively.
[0371] The index of a block used for inter-frame predictive coding can be expressed by applying truncated univariate binarization. For example, the index can have any value from 0 to 3. Here, an index of 0 indicates that no inseparable transform is applied, and indices of 1 to 3 can represent transform kernel candidates for the inseparable transform, respectively. The binary codes 0, 10, 110, and 111 can be assigned to indices of 0 to 3, respectively.
[0372] The context (e.g., CABAC context) assigned to the index for encoding / decoding can be determined based on at least one of the tree type of the current block or the size of the current block.
[0373] As an example, the index of the block used for inter-frame predictive coding can be expressed by applying truncated univariate binarization. Here, it is assumed that the index has any value from 0 to 3. An index of 0 indicates that no inseparable transform is applied, and indices of 1 to 3 can respectively represent transform kernel candidates for the inseparable transform. The binary codes 0, 10, 110, and 111 can be assigned to indices 0 to 3, respectively. Six contexts can be assigned to encode a total of three bins. In the following text, for ease of description, the six contexts will be named by classifying them as contexts one through six.
[0374] The context for the first bin can be determined based on the tree type of the current block. For example, a first context can be assigned when the tree type of the current block is a single tree, and otherwise (e.g., when the tree type is a two-tree), a second context can be assigned. However, when the index encoded based on truncated unary binarization is applied only to blocks that utilize inter-frame predictive coding and the corresponding blocks are encoded only as single trees, the second context assigned for the two-tree case can be removed from the six contexts.
[0375] The context for the second bin can be determined based on the size of the current block. Here, the size of the current block can be defined as at least one of width, height, the minimum of width and height, the maximum of width and height, the product of width and height, or the sum of width and height. For example, a fifth context can be assigned if the width and height of the current block are greater than or equal to a predetermined threshold size, and otherwise, a third context can be assigned. Here, the threshold size can be 16. However, it is not limited to this, and the threshold size can be 8, 32, or 64.
[0376] The context for the third bin can be determined based on the size of the current block. Here, the size of the current block can be defined as at least one of width, height, the minimum of width and height, the maximum of width and height, the product of width and height, or the sum of width and height. The size in the third bin can be defined the same as or differently from the size in the second bin. For example, a sixth context can be assigned if the width and height of the current block are greater than or equal to a predetermined threshold size; otherwise, a fourth context can be assigned. Here, the threshold size is as described above.
[0377] As disclosed above, block sizes are categorized into two groups, and the context can be determined based on the group to which the current block belongs. Alternatively, block sizes can be categorized into three or more groups, and different contexts can be assigned to each group. For example, block sizes can be categorized into four groups. Here, the first group can include 4x4, 4x8, 8x4, and 8x8 blocks; the second group can include 4x16, 16x4, 8x16, 16x8, and 16x16 blocks; the third group can include 4x32, 32x4, 8x32, 32x8, 16x32, 32x16, and 32x32 blocks; and the fourth group can include blocks with a width or height greater than 32. For the four groups, different contexts can be assigned to each bin.
[0378] When coding tools that perform prediction within sub-block units (e.g., Overlapping Block Motion Compensation (OBMC), affine modes) are applied, sub-block boundaries can be significant. Specifically, because the predictions are made using different motion vectors within the sub-block unit or the data used for prediction within the sub-block unit differ, differences arise in the boundary values of neighboring sub-blocks, which can make sub-block boundaries significant. In this case, a separate transform set or transform kernel can be defined. A separate transform set for inter-frame prediction modes used to perform prediction within sub-block units can be defined, and the transform set (or transform kernel) can be selected based on a virtual intra-frame prediction mode.
[0379] Depending on the motion vector resolution, the residual blocks of the current block can have different statistical properties. Therefore, the transform set for the non-separable transform can be determined based on the motion vector resolution associated with the current block. The transform kernel for the non-separable transform can be determined based on the motion vector resolution associated with the current block. A virtual intra-frame prediction mode for selecting the transform set (or transform kernel) can be derived based on the motion vector resolution associated with the current block.
[0380] Motion vector resolution can be derived from information transmitted via a bitstream signal, or it can be implicitly derived from the encoding / decoding device. The interpolation filter used may vary depending on the motion vector resolution. For example, because the values of motion vectors that may not be expressed as 1 / 4 pixel resolution (e.g., 3 / 8 pixel) exist at 1 / 8 pixel resolution, in such cases, an interpolation filter different from the one applied at 1 / 4 pixel resolution may be required.
[0381] For each coding tool used for inter-frame prediction, a flag can be signaled relating to whether the Inseparable Transform (LFNST or NSPT) can be applied together when the corresponding coding tool is applied. The corresponding flag can be signaled in the high-level syntax. Here, the high-level syntax may include at least one of the Sequence Parameter Set (SPS), Picture Parameter Set (PPS), Picture Header (PH), or Slice Header (SH).
[0382] Additionally, flags related to whether an indivisible transform is applied when applying the corresponding coding tools can be signaled at the block level (e.g., CTU, CU, or TU level). Whether coding tools for inter-frame prediction are applied can be determined, and whether an indivisible transform is applied can be signaled at the block level. For example, flags related to whether coding tools for inter-frame prediction are applied and flags related to whether an indivisible transform is applied can be signaled separately. Alternatively, whether coding tools for inter-frame prediction are applied and whether an indivisible transform is applied can be indicated together by a single flag.
[0383] Alternatively, for a specific coding tool used for inter-frame prediction, it can be determined whether (or whether) an inseparable transform can be applied, based on whether certain conditions are met. For example, when certain conditions are met, it can be determined whether an inseparable transform can be applied or applied, without signaling the aforementioned flag. These specific conditions will be described later.
[0384] Alternatively, for a particular coding tool used for inter-frame prediction, at least one of the following may differ from other coding tools: the number of available transform sets or transform kernel candidates belonging to a transform set. Furthermore, considering the characteristics of the residual data generated by inter-frame prediction, at least one of the following can be determined differently: the number of transform sets available for the current block or the number of transform kernel candidates belonging to a transform set. Specifically, when the pattern of the residual data for a particular coding tool has a constant form or shows a slightly simpler form, the number of available transform sets or transform kernel candidates can be set small. This can reduce the cost of indexing with signal transmission and improve coding performance.
[0385] The properties of residual data can be defined as the sum of the absolute values of the transform coefficients. The class to which the sum of the absolute values of the transform coefficients belongs can be derived, and the number of available transform sets or transform kernel candidates can be set in response to the corresponding class. However, this is merely an example, and the properties of residual data can be defined as the average or pattern of the absolute values of the transform coefficients. Alternatively, the properties of residual data can also be defined as the differences between adjacent transform coefficients.
[0386] The following configuration can be applied to non-separable transformations when certain conditions are met.
[0387] (Configuration 1) Do not apply LFNST and NSPT.
[0388] (Configuration 2) Adjust the number of transformation coefficients generated by applying the forward inseparable transformation.
[0389] (Configuration 3) Adjust the number of available transform kernel candidates belonging to the transform set.
[0390] (Configuration 1) corresponds to the following situation, where LFNST and NSPT are disabled because the effect of determining the inseparable transformation is negligible when certain conditions are met. Because LFNST and NSPT are not applied, the complexity of the encoding device can be reduced, and the energy required to perform the inseparable transformation can be saved.
[0391] In (Configuration 2), the number of transform coefficients generated when certain conditions are met may be less than the number generated in other ways. Because this configuration sends a small amount of transform coefficient data via signaling, the bit rate can be reduced. Alternatively, for (Configuration 2), the number of transform coefficients generated when certain conditions are met can be greater than the number generated in other ways. This configuration allows for a more precise representation of the residual data (or master transform coefficient data), thereby improving image quality.
[0392] In (Configuration 3), the number of available transform kernel candidates may be less than the number of available transform kernel candidates in other aspects when certain conditions are met. This configuration can reduce the signaling cost of indexing for non-separable transforms. Alternatively, in (Configuration 3), the number of available transform kernel candidates may be greater than the number of available transform kernel candidates in other aspects when certain conditions are met. This configuration can utilize the increased number of transform kernel candidates to cover more diverse patterns of residual data (or master transform coefficient data), thereby improving image quality.
[0393] When a specific condition is met, at least one of (Configuration 1) to (Configuration 3) is applied, and the specific condition here may include at least one of the following conditions 1 to 36.
[0394] [Condition 1] When applying Local Illumination Compensation (LIC)
[0395] [Condition 2] When selecting non-adjacent spatial candidates
[0396] [Condition 3] When applying (enhanced) TMVP (including SbTMVP)
[0397] [Condition 4] When the application template matches
[0398] [Condition 5] When applying multi-pass decoder-side motion vector refinement
[0399] [Condition 6] When applying adaptive decoder-side motion vector refinement
[0400] [Condition 7] When OBMC is applied
[0401] [Condition 8] When applying template-matching-based OBMC
[0402] [Condition 9] When applying affine model inheritance based on historical parameters or applying non-adjacent affine patterns
[0403] [Condition 10] When applying sample-based BDOF
[0404] [Condition 11] When applying Multiple Hypothesis Prediction (MHP)
[0405] [Condition 12] When applying adaptive reordering (ARMC-TM) with template-matched merge candidates.
[0406] [Condition 13] When applying ARMC based on MV candidate types
[0407] [Condition 14] When applying TM-based reordering for MMVD and affine merged motion vector difference (MMVD)
[0408] [Condition 15] When selecting an affine candidate derived using regression-based affine candidate derivation.
[0409] [Condition 16] When applying the Geometric Partitioning Mode (GPM) with merged motion vector differences (GPM-MMVD)
[0410] [Condition 17] When applying a geometric partitioning pattern with template matching (GPM-TM)
[0411] [Condition 18] When applying GPM with inter-frame and intra-frame prediction
[0412] [Condition 19] When applying template-matching-based reordering for the GPM splitting pattern
[0413] [Condition 20] When applying bidirectional prediction of GPM
[0414] [Condition 21] When applying the bidirectional matching AMVP merging pattern
[0415] [Condition 22] When IBC is applied
[0416] [Condition 23] When the application has a template-matching IBC (IBC-TM)
[0417] [Condition 24] When applying fractional pixel IBC
[0418] [Condition 25] When applying filtered IBC prediction
[0419] [Condition 26] When OOB is detected due to the application of enhanced bidirectional motion compensation
[0420] [Condition 27] When applying Refactored Reordered IBC (RR-IBC)
[0421] [Condition 28] When applying the IBC merge mode with block vector difference (IBC-MBVD)
[0422] [Condition 29] When applying combined intra-block copying and intra-prediction (CIIP)
[0423] [Condition 30] When applying IBC with geometric partitions
[0424] [Condition 31] When applying IBC BVP merging and bidirectional predictive IBC merging
[0425] [Condition 32] When using IBC MBVD list export
[0426] [Condition 33] When applying an IBC with local illumination compensation
[0427] [Condition 34] When applying a template-matching BCW index export for a merge pattern.
[0428] [Condition 35] When applying decoder-side motion vector refinement (DMVR) for affine merging coding blocks
[0429] [Condition 36] When applying the Inter-Convolutional Cross-Component Intra-Prediction Model (InterCCCM)
[0430] The NSPT kernel can be configured with 8 bits of precision. The coefficients within the NSPT kernel can range from -128 to 127. When the precision increases to more than 8 bits, the result obtained through matrix multiplication can be shifted to the right by the increased precision. For example, if the value obtained after matrix multiplication based on an NSPT kernel with 8 bits of precision is shifted to the right by S bits and stored in a buffer, then if the kernel coefficients are configured with N bits of precision, they can be shifted to the right by (S + (N-8)) bits and stored in a buffer.
[0431] When the NSPT core is configured with 8 bits of precision, it can prevent excessive increase in internal precision in the encoder / decoder that performs the transformation, thereby reducing implementation complexity in terms of memory requirements and computational load, while minimizing the reduction in compression efficiency.
[0432] When a backward NSPT is applied to a current block of size N x N, the size of the NSPT kernel (or NSPT matrix) can be expressed as MN x r. Here, MN can refer to the product of the width and height of the current block. This can refer to the output length of the NSPT or the number of residual samples generated by the NSPT. Additionally, r can refer to the input length of the NSPT or the number of (dequantized) transform coefficients to which the NSPT is applied. r can be an integer greater than or equal to 0 and less than or equal to MN. Below is an example of an NSPT matrix of MN x r based on the block size.
[0433] The NSPT matrix used for 4x4 blocks can be composed of a 16x16 matrix. The NSPT matrix used for 4x8 and 8x4 blocks can be composed of 32x20, 32x16, 32x24, 32x28, or 32x32 matrices. The NSPT matrix used for 8x8 blocks can be composed of 64x16, 64x24, 64x32, 64x40, 64x48, 64x56, or 64x64 matrices. The NSPT matrix used for 4x16 and 16x4 blocks can be composed of 64x16, 64x24, 64x32, 64x40, 64x48, 64x56, or 64x64 matrices. The NSPT matrix used for 8x16 and 16x8 blocks can consist of a 128x96, 128x64, 128x48, or 128x32 matrix. The NSPT matrix used for 16x16 blocks can consist of a 256x128, 256x96, or 256x64 matrix. The NSPT matrix used for 16x32 and 32x16 blocks can consist of a 512x256 or 512x128 matrix. The NSPT matrix used for 32x32 blocks can consist of a 1024x512, 1024x256, or 1024x128 matrix.
[0434] Alternatively, a 16x16 matrix can be applied to 4xN blocks and Nx4 blocks. Here, N can be an integer greater than or equal to 4. A 64x16 matrix can be applied to 8x8 blocks. A 64x32 matrix can be applied to 8xN blocks and Nx8 blocks. Here, N can be an integer greater than or equal to 16. A 96x32 matrix can be applied to 16xN blocks and Nx16 blocks. Here, N can be an integer greater than or equal to 16.
[0435] Alternatively, the value of r in the NSPT matrix of MN xr can be determined according to predetermined criteria. These criteria may be (1) ensuring that the sum of the computational cost for the primary transformation and the computational cost for the secondary transformation is less than or equal to a certain level, and (2) ensuring that the number of multiplications per sample required for the NSPT operation is less than or equal to a certain number.
[0436] Based on the inverse transform, when performing a separable principal transform via matrix multiplication over an MxN block, each sample requires (M+N) multiplications to perform the corresponding principal transform. Furthermore, when LFNST is applied to a specific region of interest (ROI), assuming the LFNST matrix of the inverse transform is a PxQ matrix, each sample requires (P... Q) / (M N) multiplications. Here, a PxQ matrix can refer to a matrix with P rows and Q columns.
[0437] When NSPT instead of DCT-2 transform (or a separable transform such as KLT) and LFNST are applied to an MxN block, the value of r that ensures the number of multiplications per sample for the case of applying the corresponding NSPT is less than or equal to the number of multiplications per sample for the case of applying DCT-2 transform and LFNST can be determined as follows.
[0438] [Formula 10]
[0439] When the value of r is set to the maximum value while satisfying Equation 10 above (i.e., r = M + N + (P)), Q) / (M N), the value of r in the NSPT matrix for each block size can be set as follows. In Equation 10, when (P Q) / (M When the value of N is not an integer, a value close to (P) can be used. Q) / (M The integer value of N). As an example, the floor operation can be applied to (P). Q) / (M The value of N). In this case, r can be set to (M+N+floor((P)). Q) / (M N). Here, floor(x) can refer to the largest integer not greater than x. Alternatively, rounding can be applied to (P). Q) / (M The value of N). In this case, r can be set to (M+N+round((P)). Q) / (M N). Here, round(x) can refer to the value obtained by rounding x. Alternatively, the ceil operation can be applied to (P). Q) / (M The value of N). In this case, r can be set to (M+N+ceil((P)). Q) / (M N). Here, ceil(x) can refer to the smallest integer greater than or equal to x. When the floor function is applied, the inequality in Equation 10 above can be satisfied. However, when the rounding or floor function is applied, the inequality in Equation 10 above may not be satisfied.
[0440] For NSPT used with 4x4 blocks, the maximum value of r is 24. However, since the value of r must be less than or equal to 16, the value of r can be set to 16.
[0441] For NSPT used for 4x8 and 8x4 blocks, the maximum value of r is 20. The value of r can be set to 20.
[0442] For NSPT used with 8x8 blocks, the maximum value of r is 32. The value of r can be set to 32.
[0443] For NSPT used with 4x16 blocks and 16x4 blocks, the maximum value of r is 24. The value of r can be set to 24.
[0444] For NSPT used with 8x16 and 16x8 blocks, the maximum value of r is 40. The value of r can be set to 40.
[0445] For NSPT used with 16x16 blocks, the maximum value of r is 44. The value of r can be set to 44.
[0446] For NSPT used with 16x32 and 32x16 blocks, the maximum value of r is 54. The value of r can be set to 54.
[0447] For NSPT used with 32x32 blocks, the maximum value of r is 67. The value of r can be set to 67.
[0448] For NSPT used with 4x32 and 32x4 blocks, the maximum value of r is 38. The value of r can be set to 38. Alternatively, the value of r can be set to 20.
[0449] For NSPT used in 8x32 and 32x8 blocks, the maximum value of r is 48. The value of r can be set to 48. Alternatively, the value of r can be set to 24.
[0450] In the NSPT matrix used for each of the above block sizes, there may be cases where the value of r is not a multiple of 4. For ease of implementation, it may be advantageous to set the value of r to a multiple of 4. For example, in parallel processing implemented through Single Instruction Multiple Data (SIMD) instructions, when processing the inner product of four transform basis vectors simultaneously (i.e., generating four transform coefficients simultaneously) during the application of forward NSPT, setting the value of r to a multiple of 4 may be advantageous.
[0451] As an example, for NSPTs used for 16x32 and 32x16 blocks, the value of r can be set to 52 or 56 instead of 54. For NSPTs used for 32x32 blocks, the value of r can be set to 64 or 68 instead of 67. For NSPTs used for 4x32 and 32x4 blocks, the value of r can be set to 36 or 40 instead of 38.
[0452] More generally, the value of r can be set to a multiple of K. Here, K can be an integer greater than or equal to 0. As an example, the value of r can be set to a multiple of K that satisfies the inequality in Equation 11 below.
[0453] [Equation 11]
[0454] In Equation 11 above, func() can be rounded down, rounded to the nearest integer, or rounded up as described above.
[0455] The value of r according to the aforementioned predetermined standard involves the case where zeroing is not considered. In other words, when applying forward LFNST, the transform coefficients of the principal transform in the remaining region outside the region where LFNST is applied are zeroed, so the actual computational cost required to apply DCT-2 and LFNST may be less than the computational cost mentioned above. Therefore, when zeroing is considered, the value of r can be set to a value less than the predetermined value according to the predetermined standard.
[0456] Because zeroing is not performed on the 4x4 block, the value of r can be set to a value less than or equal to 16.
[0457] For a 4x8 block, zeroing can be performed on the remaining region except for the top-left 4x4 block based on the forward transform, and a 16x16 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 4x4 block. When performing such zeroing, the number of multiplications required per sample in the forward separable master transform is 8 (((4x4x8)+(4x8x4)) / (4x8)=8), and the number of multiplications required per sample in the LFNST is 8 ((16x16) / 32=8). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 16, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 16.
[0458] For an 8x4 block, the remaining region except for the top-left 4x4 block can be zeroed out based on the forward transform, and a 16x16 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 4x4 block. When performing such zeroing, the number of multiplications per sample required in the forward separable master transform is 6 ((4x8x4) + (4x4x4) / (8x4) = 6), and the number of multiplications per sample required in the LFNST is 8 ((16x16) / 32 = 8). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 14, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 14.
[0459] For an 8x8 block, zeroing can be omitted from the separable master transformation, and in this case, the value of r can be set to a value less than or equal to 32.
[0460] For an 8x16 block, the remaining region except for the top-left 8x8 block can be zeroed out based on the forward transform, and a 64x32 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 8x8 block. When performing such zeroing, the number of multiplications required per sample in the forward separable master transform is 16 ((8x8x16) + (8x16x8) / (8x16) = 16), and the number of multiplications required per sample in the LFNST is 16 ((64x32) / 128 = 16). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 32, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 32.
[0461] For a 16x8 block, the remaining region except for the top-left 8x8 block can be zeroed out based on the forward transform, and a 64x32 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 8x8 block. When performing such zeroing, the number of multiplications required per sample in the forward separable master transform is 12 ((8x16x8) + (8x8x8) / (16x8) = 12), and the number of multiplications required per sample in the LFNST is 16 ((64x32) / 128 = 16). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 28, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 28.
[0462] For a 16x16 block, zeroing can be performed on the remaining region except for the top-left 12x12 block based on the forward transform, and a 96x32 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 12x12 block. When performing such zeroing, the number of multiplications per sample required in the forward separable master transform is 21 ((12x16x16) + (12x16x12) / (16x16) = 21), and the number of multiplications per sample required in the LFNST is 12 ((96x32) / 256 = 12). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 33, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Since the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 33.
[0463] For a 4x16 block, the remaining region except for the top-left 4x4 block can be zeroed out based on the forward transform, and a 16x16 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 4x4 block. When performing such zeroing, the number of multiplications per sample required in the forward separable master transform is 8 (((4x16×16)+(4x16×4) / (4x16)=8)), and the number of multiplications per sample required in the LFNST is 4 ((16x16) / 64=4). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 12, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 12.
[0464] For a 16x4 block, zeroing can be performed on the remaining region except for the top-left 4x4 block based on the forward transform, and a 16x16 matrix as the forward LFNST matrix can be applied to the top-left 4x4 block. When performing such zeroing, the number of multiplications required per sample in the forward separable master transform is 5 (((4x16x4)+(4x4x4)) / (16x4)=5), and the number of multiplications required per sample in the LFNST is 4 ((16x16) / 64=4). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 9, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 9.
[0465] For a 4x32 block, zeroing can be performed on the remaining region except for the top-left 4x4 block based on the forward transform, and a 16x16 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 4x4 block. When performing such zeroing, the number of multiplications per sample required in the forward separable master transform is 8 (((4x4x32)+(4x32x4) / (4x32)=8), and the number of multiplications per sample required in the LFNST is 2 ((16x16) / 128=2). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 10, which is the sum of the number of multiplications per sample in the separable master transform and the number of multiplications per sample in the LFNST. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 10.
[0466] For a 32x4 block, zeroing can be performed on the remaining region except for the top-left 4x4 block based on the forward transformation, and a 16x16 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 4x4 block. When performing such zeroing, the number of multiplications per sample required in the forward separable master transformation is 4.5 (((4x32x4)+(4x4x4) / (32x4)=4.5), and the number of multiplications per sample required in the LFNST is 2 ((16x16) / 128=2). Therefore, when the separable master transformation and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 6.5. Because the same amount of computation is required even when applying the backward separable master transformation and LFNST, the value of r can be set to less than or equal to 6.5. Here, the value of r is the value of the configuration matrix dimension, so it can be set to an integer of 6 or 7 instead of 6.5.
[0467] For an 8x32 block, the remaining region except for the top-left 8x8 block can be zeroed out based on the forward transform, and a 64x32 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 8x8 block. When performing such zeroing, the number of multiplications per sample required in the forward separable master transform is 16 (((8x8x32)+(8x32x8) / (8x32)=16), and the number of multiplications per sample required in the LFNST is 8 ((64x32) / 256=8). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 24. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 24.
[0468] For a 32x8 block, the remaining region except for the top-left 8x8 block can be zeroed out based on the forward transform, and a 64x32 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 8x8 block. When performing such zeroing, the number of multiplications required per sample in the forward separable master transform is 10 (((8x32x8)+(8x8x8) / (32x8)=10), and the number of multiplications required per sample in the LFNST is 8 ((64x32) / 256=8). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 18. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 18.
[0469] For a 16x32 block, zeroing can be performed on the remaining region except for the top-left 12x12 block based on the forward transform, and a 96x32 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 12x12 block. When performing such zeroing, the number of multiplications per sample required in the forward separable master transform is 2^1 (((12x16x32)+(12x32x12) / (16x32)=2^1), and the number of multiplications per sample required in the LFNST is 6 ((96x32) / 5^12=6). Therefore, when the separable master transform and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 27. Because the same amount of computation is required even when applying the backward separable master transform and LFNST, the value of r can be set to less than or equal to 27.
[0470] For a 32x16 block, zeroing can be performed on the remaining region except for the top-left 12x12 block based on the forward transformation, and a 96x32 matrix, which serves as the forward LFNST matrix, can be applied to the top-left 12x12 block. When performing such zeroing, the number of multiplications required per sample in the forward separable master transformation is 13.5 (((12x32x12)+(12x16x12) / (32x16)=13.5), and the number of multiplications required per sample in the LFNST is 6 ((96x32) / 512=6). Therefore, when the separable master transformation and LFNST are replaced with NSPT, the value of r can be set to less than or equal to 19.5. Because the same amount of computation is required even when applying the backward separable master transformation and LFNST, the value of r can be set to less than or equal to 19.5. Here, the value of r is the value of the configuration matrix dimension, so it can be set to an integer of 19 or 20 instead of 19.5.
[0471] In the NSPT matrix used for each block size mentioned above, there may be cases where the value of r is not a multiple of K. In such cases, for ease of implementation, the value of r can be set to a multiple of K. When the value of r preset using the above method is r prev In this case, the value of r, which is a multiple of K, can be set as follows.
[0472] [Equation 12]
[0473] In Equation 12 above, func() can be rounded down, rounded to the nearest integer, or rounded up as described above.
[0474] As mentioned above, the value of r in the NSPT matrix for an MxN block can be different from the value of r in the NSPT matrix for an NxM block. For example, the backward NSPT matrix for a 4x8 block can be a 32x16 matrix, and the backward NSPT matrix for an 8x4 block can be a 32x14 matrix. In such cases, the NSPT matrix can be determined by using the symmetry between the MxN and NxM blocks.
[0475] Assume the current block is an MxN block with pattern x. When the aforementioned symmetry is applied to the current block, instead of applying the NSPT matrix corresponding to either the MxN block size or pattern x, an NSPT matrix corresponding to at least one of the patterns symmetric to pattern x or the NxM block size symmetric to the MxN block size can be applied. In this case, the NSPT matrix corresponding to the NxM block size can be applied to the current block as is. Alternatively, the NSPT matrix corresponding to the NxM block size is applied, but for the value of r, the value of r in the NSPT matrix corresponding to the MxN block size can be used.
[0476] As an example, the backward NSPT matrix for a 4x8 block can be a 32x16 matrix (i.e., the value of r in the NSPT matrix is 16), and the backward NSPT matrix for an 8x4 block can be a 32x14 matrix (i.e., the value of r in the NSPT matrix is 14). When the current block is an 8x4 block with pattern x, an NSPT matrix for at least one of a pattern symmetric to pattern x or a 4x8 block symmetric to the 8x4 block can be applied to the current block. In this case, the 32x16 matrix used as the backward NSPT matrix for the 4x8 block can be used as is, or a 32x14 matrix with a value of r in the backward NSPT matrix for the 8x4 block can be used. Here, the 32x14 matrix can be derived by sampling 14 rows from the left in the 32x16 matrix. In this way, when the 32x14 matrix is applied to the current block with a block size of 8x4, the aforementioned predetermined criteria are satisfied.
[0477] Conversely, when the current block is a 4×4 block with pattern x, an NSPT matrix can be applied to the current block for at least one of a pattern symmetric to pattern x or an 8×8 block symmetric to the 4×4 block. In such a case, a 32x14 matrix for the 8x4 block, not a 32x16 matrix for the 4x8 block, can be applied to the current block. This allows NSPT to be performed using fewer multiplications than is permitted for a 4×4 block.
[0478] For NSPTs used for MxN and NxM blocks, when the values of r that satisfy the aforementioned predetermined conditions are r1 and r2 respectively, the backward NSPT matrix used for MxN and NxM blocks can be set to MN x max(r1, r2). Here, max(r1, r2) can refer to selecting values greater than or equal to r1 and r2.
[0479] As an example, the backward NSPT matrix for a 4x8 block can be a 32x16 matrix (i.e., the value of r in the NSPT matrix is 16), and the backward NSPT matrix for an 8x4 block can be a 32x14 matrix (i.e., the value of r in the NSPT matrix is 14). When the current block is a 4x8 block with pattern x, the NSPT matrix for at least one of the patterns symmetric to pattern x or the 8x4 block symmetric to the 4x8 block can be applied to the current block. In this case, a 32x16 matrix can be used, which is the backward NSPT matrix for the 8x4 block. When the backward NSPT matrix is not configured as MN x max(r1, r2), the 32x14 matrix will be used as the NSPT matrix for the 8x4 block. However, when the NSPT matrix for the 4x8 block and the NSPT matrix for the 8x4 block are configured as 32 x max(16, 14) matrices, the 32x16 matrix can be applied entirely.
[0480] Conversely, when the current block is an 8x4 block with pattern x, an NSPT matrix for at least one of a pattern symmetric to pattern x or a 4x8 block symmetric to the 8x4 block can be applied to the current block. In this case, a 32x16 matrix, which is the backward NSPT matrix for the 4x8 block, can be used, or a 32x14 matrix can be used. Here, the 32x14 matrix can be derived by sampling 14 rows from the left in the 32x16 matrix.
[0481] When the NSPT matrix is configured as described above, a transformation consisting of the maximum number of transform basis vectors can be applied while satisfying predetermined conditions, thereby maximizing coding performance.
[0482] In the above embodiments, the value of r can be set to a multiple of 16. For example, for backward NSPT used for 4x8 and 8x4 blocks, a 32x16 matrix can be applied instead of a 32x20 matrix. The transform coefficients of the transform block can be encoded in units of predetermined coefficient groups (CGs). Here, a CG can be defined as a set of 16 transform coefficients, and for example, a CG can be a sub-block of a size such as 4x4, 2x8, or 8x2. A CG may not contain any non-zero transform coefficients, and in such cases, the encoding process for the corresponding CG can be skipped. Therefore, setting the value of r to a multiple of 16 has the advantage of reducing implementation complexity.
[0483] Transform coefficients can be derived by applying forward NSPT to the residual samples of an MxN block. In this case, due to zeroing, the number of derived transform coefficients may be less than or equal to (M... The value of N). In other words, the forward NSPT matrix can be defined as rx(M). N) matrix, where r can refer to the output length of NSPT or the number of transform coefficients derived by NSPT, and (M) N) can refer to the input length of NSPT or the number of residual samples to which NSPT is applied.
[0484] The derived transform coefficients can be arranged in an MxN block according to a predetermined scan order, and unfilled transform coefficient regions can be filled with 0 (i.e., cleared to zero). Therefore, during the scanning of transform coefficients in the decoding device, if a non-zero transform coefficient is found in a region where 0 is filled with 0 if NSPT is applied (or, when the scan position of the last valid coefficient in the MxN block is greater than or equal to r), it is considered that NSPT has not been applied to the corresponding MxN block, and the NSPT index can be sent without signaling. An index representing a scan position of 0 can be assigned to the top-left coefficient (i.e., DC component coefficient) within the MxN block, and an index incremented by 1 can be assigned to the remaining coefficients within the MxN block in a predetermined order.
[0485] One or more values of r can be defined for the block sizes to which NSPT applies. As an example, one or more values of r can be defined for each block size to which NSPT applies. Alternatively, one value of r can be defined for each block size to which NSPT applies, and the value of r for any one of the block sizes to which NSPT applies can be different from the value of r for another block size. Alternatively, one value of r can be defined for some of the block sizes to which NSPT applies, and at least two values of r can be defined for the remaining ones.
[0486] When multiple values for r are available, an index specifying any one of the multiple values of r or r itself can be sent using a signal. The corresponding index can be sent using a high-level syntax (HLS) such as VPS, SPS, PPS, PH, or SH, or it can be sent at the block level such as CTU, CU, or TU. When the value of r falls within a specific range, enough bits to encompass the corresponding range can be allocated and sent using a signal. For example, when the value of r is in the range of 1 to 256, 8 bits can be assigned as a fixed length and sent using a signal.
[0487] The transform kernel of the current block can be determined based on any one of embodiments 1 to 4 described above. Alternatively, to the extent that the inventions according to embodiments 1 to 4 do not conflict with each other, the transform kernel of the current block can be determined based on a combination of at least two of embodiments 1 to 4.
[0488] The transform index for the inverse transform of the current block can be sent using a signal. Here, the transform index can specify any one of one or more transform kernels (or transform matrices) belonging to the transform set. Alternatively, the transform index can specify an NSPT index belonging to one or more NSPT kernels belonging to the NSPT set. Or, the transform index can specify an LFNST index belonging to one or more LFNST kernels belonging to the LFNST set.
[0489] Whether a transform index corresponds to an NSPT index can be determined based on whether the current block size belongs to any of the block sizes in the first group mentioned above. Assume that the block sizes to which NSPT applies and LFNST applies are different. In this case, when the current block size belongs to the first group, the transform index signaled for the current block can correspond to the NSPT index, and the NSPT core can be determined from the NSPT set based on the corresponding transform index. On the other hand, when the current block size does not belong to the first group, the transform index signaled for the current block can correspond to the LFNST index, and the LFNST core can be determined from the LFNST set based on the corresponding transform index. When the current block size does not belong to the first group, it may mean that the current block size belongs to the second group mentioned above. Alternatively, when the current block size does not belong to the first group, it may mean that the current block size corresponds to the block size to which LFNST applies among the block sizes belonging to the second group. In this way, the NSPT index and the LFNST index can be configured as an integrated syntax, rather than as separate syntaxes.
[0490] As an example, assume that the block sizes for which NSPT applies in the first group are 4x4, 4x8, 8x4, and 8x8. For block sizes belonging to the first group, NSPT can be applied instead of LFNST. Specifically, NSPT can be applied instead of a combination of separable master transform (e.g., DCT-2, separable KLT) and LFNST. NSPT indices can be signaled for the four block sizes belonging to the first group, and LFNST indices can be signaled for the remaining block sizes (where LFNST is allowed).
[0491] In this way, the amount of information encoded can be reduced when the NSPT and LFNST indices are sent as a grammar using signals. Furthermore, implementation complexity can be reduced by applying at least one of the binarization, CABAC context, or initial values used for entropy coding equally to the NSPT / LFNST indices.
[0492] Alternatively, NSPT and LFNST indices can be signaled separately as individual syntaxes. This may increase implementation complexity to some extent, but compression performance can be improved by performing optimized entropy encoding on each index.
[0493] When the number of LFNST kernel candidates belonging to the LFNST set is the same as the number of NSPT kernel candidates belonging to the NSPT set, the same binarization can be applied to both the LFNST and NSPT indices. The same CABAC context (or, CABAC context increment) can be assigned to the bins of both the LFNST and NSPT indices.
[0494] Different binarization and / or CABAC contexts can be used for the LFNST and NSPT indices. Different CABAC initialization values can be assigned to the LFNST and NSPT indices. As an example, either the LFNST or NSPT index can be binarized based on a fixed-length binarization, and the other can be binarized based on a truncated unary binarization. Different CABAC contexts and / or CABAC initialization values can be assigned even when the binarizations of the LFNST and NSPT indices are the same. Different binarization and / or CABAC contexts can be used for the LFNST and NSPT indices when the number of LFNST kernel candidates belonging to the LFNST set and the number of NSPT kernel candidates belonging to the NSPT set are different from each other.
[0495] The number of NSPT kernel candidates belonging to the NSPT set can be set differently for each block size. Alternatively, the block sizes belonging to the first group can be divided into multiple subgroups. In this case, the number of NSPT kernel candidates belonging to the NSPT set can be set differently for each subgroup. Here, at least one subgroup can include multiple different block sizes.
[0496] The binarization applied to the NSPT index can vary depending on the number of NSPT kernel candidates belonging to the NSPT set.
[0497] As an example, when the number of NSPT kernel candidates in the NSPT set for a specific block size is 3, the NSPT index can have any value from 0 to 3. When the value of the NSPT index is 0, it can indicate that the NSPT is not applied to the current block. When the value of the NSPT index is not 0, it can represent the NSPT kernel candidate corresponding to the corresponding NSPT index among the three NSPT kernel candidates. A bin can be assigned to distinguish between the case where the NSPT is applied and the case where the NSPT is not applied. The case where the value of the corresponding bin is 0 can correspond to the case where the value of the NSPT index is 0. On the other hand, the case where the value of the corresponding bin is 1 can correspond to the case where the value of the NSPT index is 1, 2, or 3. In this case, truncated univariate binarization can be applied to distinguish the three NSPT kernel candidates. In other words, two bins can be assigned to distinguish the three NSPT kernel candidates as 0, 10, and 11.
[0498] When the number of NSPT kernel candidates in the NSPT set for a given block size is 2, the NSPT index can have any value from 0 to 2. A value of 0 indicates that the NSPT is not applied to the current block. A value other than 0 indicates the NSPT kernel candidate corresponding to the corresponding NSPT index among the two NSPT kernel candidates. A bin can be assigned to distinguish between cases where NSPT is applied and cases where it is not. Two NSPT kernel candidates can be distinguished by assigning a bin representing either one of them.
[0499] When the number of NSPT kernel candidates in the NSPT set for a specific block size is 1, the NSPT index can have a value of either 0 or 1. A value of 0 indicates that the NSPT is not applied to the current block. A value of 1 indicates that an NSPT kernel candidate is available. In this case, whether to apply the NSPT and the NSPT kernel candidate can be specified using only one bin.
[0500] The inverse transform of the current block can be a separable master transform and / or an LFNST-based inverse transform. In other words, backward LFNST can be applied to all or part of the (dequantized) transform coefficients of the current block, and then backward separable master transform can be applied to the transform coefficients derived by LFNST to derive residual samples. As an example, backward LFNST can be applied to the (dequantized) transform coefficients belonging to a portion of the current block. Here, a portion refers to the region to which forward LFNST is applied, and hereinafter, it is referred to as the region of interest (ROI). The transform coefficients derived by LFNST can be arranged in the ROI region according to a predetermined scan order. The predetermined scan order can be row-major or column-major. The backward separable master transform can be applied to the transform coefficients derived by LFNST and the transform coefficients belonging to the remaining regions within the current block excluding the ROI region. Alternatively, during the forward transform, when zeroing is performed on the remaining regions within the current block excluding the ROI region (i.e., when the transform coefficients in the remaining regions are set to 0), the backward separable master transform can be applied to the transform coefficients derived via LFNST. The size of the ROI region can be determined based on at least one of the width or height of the current block. Here, the size of the ROI region can refer to at least one of the width or height of the ROI region, or it can refer to the number of sample locations belonging to the ROI region.
[0501] refer to Figure 4 The current block S420 can be reconstructed based on the residual samples of the current block.
[0502] The current block can be reconstructed based on the predicted block and the residual block.
[0503] The predicted block for the current block can be derived based on at least one of inter-frame prediction or intra-frame prediction. As an example, the current block can be divided into multiple partitions, and the predicted block for the current block can be generated based on the prediction for each partition.
[0504] The current block can be divided into multiple partitions based on one or more partition lines. Partition lines can include at least one of vertical or horizontal lines. Alternatively, when geometric partitioning is applied to the current block, the current block can be divided into two partitions using predetermined partition lines. Partition lines used for geometric partitioning can be defined based on predetermined partitioning direction (or partitioning angle) and distance from the center of the current block. The current block can be a coded block that has not been further partitioned using tree-based block partitioning.
[0505] Suppose the current block is divided into two partitions, namely, partition 1 and partition 2. In this case, the predicted block for the current block can be generated by a weighted sum of the first predicted block for partition 1 and the second predicted block for partition 2. Here, the first and second predicted blocks can be generated based on intra-frame prediction. Alternatively, the first and second predicted blocks can be generated based on inter-frame prediction. Alternatively, either the first or second predicted block can be generated based on intra-frame prediction, and the other can be generated based on inter-frame prediction.
[0506] As an example, when the current block is divided into two partitions based on geometric segmentation, prediction blocks for each partition can be generated based on inter-frame prediction, and prediction blocks for the current block can be generated based on the weighted sum of the generated prediction blocks.
[0507] Alternatively, when the current block is divided into two partitions based on geometric segmentation, prediction blocks for each partition can be generated based on intra-frame prediction, and prediction blocks for the current block can be generated based on a weighted sum of the generated prediction blocks. In the following text, this will be referred to as Spatial Geometric Segmentation Mode (SGPM).
[0508] When SGPM is applied to the current block, the partition type of the current block can be determined based on the partition type index, which specifies the partition direction and the position of the partition line. The partition type index can indicate any of the predefined partition type candidates. The intra-prediction mode for each partition within the current block can be derived based on the mode index, which indicates any of the multiple intra-prediction mode candidates. A mode index can be defined for each partition within the current block.
[0509] The partition type index and mode index for each partition can be signaled via a bitstream. In this case, the partition type index can be expressed as `partition_mode_idx`, and the mode index for each partition can be expressed as `intra_pred_mode0_idx` and `intra_pred_mode1_idx`, respectively. Alternatively, the partition type index and mode index for the first partition can be signaled via a bitstream, and the mode index for the second partition can be derived based on the mode index for the first partition. At least one of the aforementioned partition type index or mode index can be signaled based on a flag (`cu_sgpm_flag`) indicating whether SGPM is applied to the current block. As an example, when `cu_sgpm_flag` is 1, the partition type index and mode index can be signaled, and when `cu_sgpm_flag` is 0, the partition type index and mode index can be sent without signaling.
[0510] Alternatively, a candidate list for SGPM can be constructed. The candidate list includes multiple candidates, and each candidate may include a partition type index and two mode indices. As an example, multiple candidates belonging to the candidate list can be derived by combining predefined partition type candidates (e.g., 26 partition type candidates) with predetermined intra-prediction mode candidates (e.g., 3 intra-prediction mode candidates). The maximum number of candidates that can be included in the candidate list can be 16. Based on any one of the multiple candidates, a partition type index for the current block and a mode index for each partition can be derived. The candidate index indicating any one of the multiple candidates can be signaled via a bitstream.
[0511] The candidate list can be reordered based on a predefined template region. As an example, the SAD between the predicted sample and the reconstructed sample of the template region can be calculated. The SAD can be calculated for each of the multiple candidates belonging to the candidate list. The multiple candidates in the candidate list can be reordered in ascending order of SAD. The template region can include at least one of the upper neighbor region or the left neighbor region adjacent to the current block. The height of the upper neighbor region and the width of the left neighbor region can be fixed to a predefined length (e.g., 1).
[0512] Intra-prediction mode candidates can be constructed from the IPM list. An IPM list can be constructed for each partition within the current block. At least one intra-prediction mode candidate belonging to the IPM list of the first partition can be different from an intra-prediction mode candidate belonging to the IPM list of the second partition. Alternatively, an IPM list can be constructed for the current block, and partitions belonging to the current block can share an IPM list. Three or more intra-prediction mode candidates can be included in the IPM list.
[0513] SGPM can be applied when the size of the current block meets a predetermined condition. This condition may include at least one of the following conditions 1 to 5. Here, width and height may refer to the width and height of the current block, respectively.
[0514] (Condition 1) 4 <= width <= 64
[0515] (Condition 2) 4 <= height <= 64
[0516] (Condition 3) Width < Height 8
[0517] (Condition 4) Height < Width 8
[0518] (Condition 5) Width Height >= 32
[0519] A flag can be defined to indicate whether mixing is allowed between the prediction blocks of the first partition and the prediction blocks of the second partition for the current block.
[0520] Adaptive blending can be used in SGPM when a flag indicates that blending between predicted blocks is allowed (e.g., when the flag is false). The blending depth for adaptive blending can be derived based on the size of the current block. As an example, when the minimum width and height of the current block is 4, the blending depth can be derived as 1 / 2τ. When the minimum width and height of the current block is 8, the blending depth can be derived as τ. When the minimum width and height of the current block is 16, the blending depth can be derived as 2τ. When the minimum width and height of the current block is 32, the blending depth can be derived as 4τ. When the minimum width and height of the current block is greater than 32, the blending depth can be derived as 8τ. Here, τ can be any integer greater than 0.
[0521] On the other hand, when a flag indicates that blending between predicted blocks is not allowed (e.g., when the flag is true), the blending depth can be derived to a default value (e.g., 1 / 4τ). This indicates that blending is not used when the partition lines of the geometric division correspond to vertical or horizontal lines, and the width of the region to which blending is applied becomes relatively narrow when the partition lines of the geometric division do not correspond to both vertical and horizontal lines (i.e., when the partition lines have different partitioning directions).
[0522] Figure 8 The figure shows a schematic configuration of a decoding apparatus (300) for performing an image decoding method according to the present disclosure.
[0523] refer to Figure 8 The decoding apparatus (300) according to this disclosure may include a transform coefficient derivative (800), a residual sample derivative (810), and a reconstruction block generator (820). The transform coefficient derivative (800) may be configured in Figure 3 In the entropy decoder (310), the residual sample derivative (810) can be configured in Figure 3 The residual processor (320) and the reconstructed block generator (820) can be configured in Figure 3 In the adder (340).
[0524] The transform coefficient derivator (800) can obtain the residual information of the current block from the bit stream and decode it to derive the transform coefficients of the current block.
[0525] The residual sample derivator (810) can derive the residual sample of the current block by performing at least one of dequantization or inverse transformation on the transform coefficients of the current block.
[0526] The residual sample derivator (810) can determine the transformation kernel for the inverse transformation of the current block using a pre-determined transformation kernel determination method, and derive the residual samples of the current block based on this kernel. This is in contrast to deriving the residual samples of the current block through a reference... Figure 4 The descriptions are the same, and their detailed descriptions will be omitted here.
[0527] The reconstructed block generator (820) can reconstruct the current block based on the residual samples of the current block.
[0528] Figure 9 The illustration shows an image encoding method performed by an encoding apparatus (200) according to an embodiment of the present disclosure.
[0529] refer to Figure 9 It can export the residual sample of the current block (S900).
[0530] The residual samples of the current block can be derived by subtracting the predicted samples from the original samples of the current block. Here, the predicted samples can be derived based on inter-frame prediction or intra-frame prediction modes. The current block can be divided into multiple partitions, and a predicted block for the current block can be generated based on the predictions for each partition.
[0531] For reference Figure 4 As described, when the current block is divided into two partitions (i.e., a first partition and a second partition), a prediction block for the current block can be generated by a weighted sum of a first prediction block for the first partition and a second prediction block for the second partition. Here, each of the first and second prediction blocks can be generated based on intra-frame prediction or inter-frame prediction.
[0532] When the aforementioned SGPM is applied to the current block, the partition type of the current block and the intra-prediction mode of each partition can be determined. The partition type index, indicating the determined partition type, can be encoded in the bitstream. The mode index, indicating the determined intra-prediction mode among multiple intra-prediction mode candidates, can be encoded in the bitstream. The mode index can be encoded for each partition. The mode index for the second partition can be encoded based on the mode index for the first partition. At least one of the partition type index or mode index can be encoded based on a flag (cu_sgpm_flag) indicating whether SGPM is applied to the current block.
[0533] For reference Figure 4As described, a candidate list for SGPM can be constructed for the current block. Each candidate in the candidate list can include a partition type index and two pattern indices. Based on any one of the candidates in the candidate list, the partition type index for the current block and the pattern index for each partition can be derived. The candidate index indicating any one of the candidates can be encoded in the bitstream. Furthermore, the candidate list can be reordered based on a predetermined template region.
[0534] For reference Figure 4 As described, intra-prediction mode candidates can be constructed from a list of intra-prediction modes (IPM). SGPM can be applied when the size of the current block meets predetermined conditions.
[0535] It can be determined whether mixing between the prediction blocks of the first partition and the prediction blocks of the second partition is allowed for the current block. When it is determined that mixing between prediction blocks is allowed, adaptive mixing can be used in SGPM, and the mixing depth for adaptive mixing can be derived based on the size of the current block. On the other hand, when it is determined that mixing between prediction blocks is not allowed, the mixing depth can be derived to a default value (e.g., 1 / 4τ). Based on this determination, a flag indicating whether mixing between the prediction blocks of the first and second partitions is allowed can be encoded in the bitstream.
[0536] refer to Figure 9 The transformation coefficients of the current block can be derived by performing at least one of transformation or quantization on the residual samples of the current block (S910).
[0537] The transformation method according to this disclosure can be understood as referring to Figure 4 The description describes the inverse process of the inverse transform. The method used to determine the transform kernel for the transform is related to... Figure 4 The descriptions are identical. Detailed descriptions will be omitted here.
[0538] For example, one or more transform sets can be defined / configured for the transforms of the current block, and each transform set can include one or more transform kernel candidates. In this case, one of multiple transform sets can be selected as the transform set for the current block. One of multiple transform kernel candidates belonging to the transform set of the current block can be selected. The selection can be performed implicitly based on the context of the current block. Alternatively, the best transform set and / or transform kernel candidate for the current block can be selected, and its index can be signaled.
[0539] Alternatively, the transform kernel for the current block can be determined based on an MTS set. One of multiple MTS sets can be selected based on at least one of the current block size or intra-prediction mode. The selected MTS set may include one or more transform kernel candidates. One of the one or more transform kernel candidates can be selected, and the transform kernel for the current block can be determined based on the selected transform kernel candidate. Transform kernel candidate selection can be performed using a transform kernel candidate index derived from the context of the current block. Alternatively, the best transform kernel candidate for the current block can be selected, and the transform kernel candidate index indicating the selected transform kernel candidate can be signaled.
[0540] Alternatively, the transform kernel for the current block can be determined based on the non-separable master transform (NSPT) kernel. If the size of the current block belongs to the first group of block sizes to which NSPT is applicable, the forward NSPT can be applied to the current block; if the size of the current block belongs to the second group, the forward NSPT can be omitted. If the size of the current block belongs to the second group, the forward separable master transform (e.g., DCT-2) can be applied to the residual samples of the current block to derive the transform coefficients. The forward LFNST can be additionally applied to all or part of the transform coefficients derived through the separable master transform.
[0541] Additionally, NSPT can be applied based on at least one of the tree type or component type of the current block. The NSPT kernel (or NSPT matrix) used for NSPT can be determined by using symmetry between intra-prediction modes or symmetry between block shapes. When the current NSPT is applied to an M x N current block, the NSPT kernel can be expressed as r x MN. Here, r represents the output length of the NSPT or the number of transform coefficients generated by the NSPT, and MN is the product of the width and height of the current block, which can represent the input length of the NSPT or the number of residual samples to which the NSPT is applied. The method for determining the size of the NSPT kernel is similar to that by referencing... Figure 4 The descriptions are the same.
[0542] The LFNST and / or NSPT indices used for transformations can be encoded into an integrated grammar, or the LFNST and NSPT indices can be encoded separately and inserted into the bitstream. Binarization of the LFNST and NSPT indices, as well as the assignment of CABAC context and initial values, are handled via reference. Figure 4 The descriptions are the same.
[0543] The inseparable transform can be applied to the current block coded using inter-frame prediction (or intra-frame prediction), as shown in the reference. Figure 4 As described.
[0544] Additionally, a method for using reference is described. Figure 4The method of sending transform indices using signals can be similarly applied to methods for encoding transform indices.
[0545] refer to Figure 9 A bitstream can be generated by encoding the transform coefficients of the current block (S920).
[0546] Residual information about the transform coefficients can be generated based on the transform coefficients of the current block, and a bit stream can be generated by encoding the residual information.
[0547] Figure 10 The figure shows a schematic configuration of an encoding apparatus (200) for performing an image encoding method according to the present disclosure.
[0548] refer to Figure 10 The encoding apparatus (200) according to this disclosure may include a residual sample deriver (1000), a transform coefficient deriver (1010), and a transform coefficient encoder (1020). The residual sample deriver (1000) and the transform coefficient deriver (1010) may be configured in... Figure 2 The residual processor (230) and the transform coefficient encoder (1020) can be configured in Figure 2 In the entropy encoder (240).
[0549] The residual sample exporter (1000) can export the residual samples of the current block by subtracting the predicted samples from the original samples of the current block. Here, the predicted samples can be exported based on a predetermined intra-frame prediction mode.
[0550] The transform coefficient derivator (1010) can derive the transform coefficients of the current block by performing at least one of transform or quantization on the residual samples of the current block. The transform coefficient derivator 1010 can determine the transform kernel of the current block based on at least one of the embodiments 1 to 4 described above, and apply the transform kernel to the residual samples of the current block to derive the transform coefficients.
[0551] The transform coefficient encoder (1020) can encode the transform coefficients of the current block to generate a bit stream.
[0552] In the above embodiments, the method is described as a series of steps or blocks based on the flowchart. However, the corresponding embodiments are not limited to the order of the steps, and some steps may occur simultaneously or in a different order than the other steps described above. Furthermore, those skilled in the art will understand that the steps shown in the flowchart are not exclusive, and other steps may be included or one or more steps in the flowchart may be deleted without affecting the scope of the embodiments of this disclosure.
[0553] The methods described above according to embodiments of the present disclosure can be implemented in software, and the encoding and / or decoding apparatus according to the present disclosure can be included in devices performing image processing, such as TVs, computers, smartphones, set-top boxes, display devices, etc.
[0554] In this disclosure, when the embodiments are implemented as software, the above methods can be implemented as modules (processes, functions, etc.) performing the above functions. Modules can be stored in memory and can be executed by a processor. Memory can be located inside or outside the processor and can be connected to the processor by various well-known means. The processor may include application-specific integrated circuits (ASICs), another chipset, logic circuits, and / or data processing devices. Memory may include read-only memory (ROM), random access memory (RAM), flash memory, memory cards, storage media, and / or other storage devices. In other words, the embodiments described herein can be implemented on a processor, microprocessor, controller, or chip. For example, the functional units shown in each figure can be implemented on a computer, processor, microprocessor, controller, or chip. In this case, information for implementation (e.g., information about instructions) or algorithms can be stored in a digital storage medium.
[0555] Furthermore, the decoding and encoding devices using embodiments of this disclosure can be included in multimedia broadcasting transmitting and receiving devices, mobile communication terminals, home theater video devices, digital cinema video devices, surveillance cameras, video conferencing devices, real-time communication devices such as video communication, mobile streaming devices, storage media, cameras, devices for providing video-on-demand (VoD) services, over-the-top (OTT) devices, devices for providing internet streaming services, three-dimensional (3D) video devices, virtual reality (VR) devices, augmented reality (AR) devices, videophone video devices, transportation terminal devices (e.g., vehicle (including autonomous vehicles) terminals, aircraft terminals, ship terminals, etc.), and medical video devices, and can be used to process video signals or data signals. For example, over-the-top (OTT) devices can include game consoles, Blu-ray players, networked TVs, home theater systems, smartphones, tablets, digital video recorders (DVRs), etc.
[0556] Furthermore, the processing methods applying embodiments of this disclosure can be generated in the form of a computer-executable program and can be stored in a computer-readable recording medium. Multimedia data having data structures according to embodiments of this disclosure can also be stored in a computer-readable recording medium. Computer-readable recording media include all types of storage devices and distributed storage devices that store computer-readable data. Computer-readable recording media can include, for example, Blu-ray discs (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical media storage devices. Additionally, computer-readable recording media include media implemented in carrier wave form (e.g., transmitted via the Internet). Furthermore, bitstreams generated by encoding methods can be stored in a computer-readable recording medium or transmitted via wired or wireless communication networks.
[0557] Furthermore, the embodiments of this disclosure can be implemented by a computer program product using program code, and this program code can be executed on a computer by the embodiments of this disclosure. The program code can be stored on a computer-readable medium.
[0558] Figure 11 Examples of content streaming systems to which embodiments of the present disclosure may be applied are shown.
[0559] refer to Figure 11 The content streaming system using embodiments of this disclosure may mainly include an encoding server, a streaming server, a web server, media storage, user equipment, and multimedia input devices.
[0560] An encoding server generates a bitstream by compressing content input from multimedia input devices such as smartphones, cameras, and camcorders into digital data, and then sends it to a streaming server. As another example, when multimedia input devices such as smartphones, cameras, and camcorders generate bitstreams directly, the encoding server can be omitted.
[0561] A bitstream can be generated by applying the encoding method or bitstream generation method of the embodiments of this disclosure, and the streaming server can temporarily store the bitstream during the sending or receiving of the bitstream.
[0562] A streaming server sends multimedia data to a user's device via a web server based on the user's request, and the web server acts as a medium to notify the user what services are available. When a user requests a service from the web server, the web server delivers it to the streaming server, and the streaming server sends the multimedia data to the user. In this scenario, the content streaming system may include a separate control server, which in this case controls the commands / responses between each device in the content streaming system.
[0563] A streaming server can receive content from media storage and / or encoding servers. For example, when receiving content from an encoding server, the content can be received in real time. In this case, to provide a smooth streaming service, the streaming server can store bitstreams for specific time periods.
[0564] Examples of user equipment may include mobile phones, smartphones, laptops, digital broadcast terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation devices, tablet PCs, tablet computers, ultrabooks, wearable devices (e.g., smartwatches, smart glasses, head-mounted displays (HMDs), digital TVs, desktop computers, digital signage, etc.).
[0565] In a content streaming system, each server can be operated as a distributed server, and in this case, data received from each server can be distributed and processed.
[0566] The claims set forth herein can be combined in various ways. For example, the technical features of the method claims of this disclosure can be combined and implemented as a device, and the technical features of the device claims of this disclosure can be combined and implemented as a method. Furthermore, the technical features of the method claims and the technical features of the device claims of this disclosure can be combined and implemented as a device, and the technical features of the method claims and the technical features of the device claims of this disclosure can be combined and implemented as a method.
Claims
1. A method comprising: Derive the transform coefficients of the current block from the bitstream; The residual sample of the current block is derived based on the inverse transformation of the transformation coefficients of the current block; as well as Reconstruct the current block based on the residual samples of the current block. Wherein, the current block is a block based on intra-frame predictive coding, and The transform set for the inverse transform is selected based on a pre-determined intra-prediction mode for the current block.
2. The method according to claim 1, wherein, The inverse transformation is performed based on at least one of a separable transformation or a non-separable transformation.
3. The method according to claim 2, wherein, When a sub-block transformation is applied to the current block and the non-separable transformation is applied to the current block, the transformation type of the separable transformation used for the current block is determined to be DCT-2.
4. The method according to claim 2, wherein, When a sub-block transformation is applied to the current block and the non-separable transformation is applied to the current block, the transformation type of the separable transformation used for the current block is determined to be one of a combination of DST-7 and DCT-8.
5. The method according to claim 2, wherein, Based on the predicted block of the current block, the intra-prediction mode is derived by applying the decoder-side intra-mode derivation (DIMD) method.
6. The method according to claim 2, wherein, When a sub-block transformation is applied to the current block, it is determined whether the inseparable transformation is applied to the current block based on whether the first sub-block of the current block corresponds to a size in which the inseparable transformation is allowed.
7. The method according to claim 6, wherein, Based on the first sub-block of the current block, the intra-prediction mode is derived by applying the decoder-side intra-mode derivation (DIMD) method.
8. The method according to claim 2, wherein, The transform set includes multiple transform kernel candidates, and The index of one of the plurality of transformation kernel candidates is indicated by a signal transmission.
9. The method according to claim 8, wherein, The index is sent by signaling based on whether the current block meets the predetermined zeroing condition.
10. The method according to claim 2, wherein, When the intra-block copy (IBC) mode is applied to the current block, the inseparable transform is applied to the current block.
11. The method according to claim 2, wherein, An inter-frame prediction mode based on whether prediction is performed in a sub-block unit is applied to the current block to determine the set of transforms for the inseparable transform.
12. A method comprising: Export the residual samples of the current block; The transformation coefficients of the current block are derived based on the transformation of the residual samples of the current block; as well as The transform coefficients of the current block are encoded. Wherein, the current block is a block based on inter-frame predictive coding, and The transform set for the transform is selected based on a pre-determined intra-prediction mode for the current block.
13. A computer-readable storage medium storing a bit stream generated by the method according to claim 12.
14. A method comprising: Obtain a bitstream for image information, wherein the bitstream is generated based on: deriving residual samples of the current block, deriving transform coefficients of the current block based on a transform of the residual samples of the current block, and encoding the transform coefficients of the current block; and Send data including the bit stream. Wherein, the current block is a block based on inter-frame predictive coding, and The transform set for the transform is selected based on a pre-determined intra-prediction mode for the current block.