Image encoding / decoding method, image data transmission method, and storage medium
By using an image coding method based on reduced quadratic transform, the problem of high transmission and storage costs for high-resolution images/videos is solved, achieving more efficient image/video compression and coding.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING XIAOMI MOBILE SOFTWARE CO LTD
- Filing Date
- 2019-07-08
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies face increased costs due to the increased amount of information when transmitting and storing high-resolution, high-quality images/videos, and lack efficient image encoding methods to meet the needs of immersive media.
An image coding method based on reduced quadratic transform (RST) is adopted, which derives the transform coefficients through inverse reduced quadratic transform and inverse first transform, and improves coding efficiency by combining the intra-frame prediction mode and the transformation kernel matrix mapping relationship.
By concentrating non-zero transform coefficients in low-frequency components through efficient transformation, the amount of residual processing data is reduced, thereby improving image/video compression efficiency and residual coding efficiency.
Smart Images

Figure CN117336470B_ABST
Abstract
Description
[0001] This application is a divisional application of the original invention patent application No. 201980057941.X (International Application No.: PCT / KR2019 / 008377, Application Date: July 8, 2019, Invention Title: Transformation-Based Image Coding Method and Apparatus). Technical Field
[0002] This disclosure generally relates to image coding techniques, and more specifically, to a transform-based image coding method and apparatus in an image coding system. Background Technology
[0003] Today, the demand for high-resolution and high-quality images / videos, such as 4K, 8K, or even higher Ultra High Definition (UHD) images / videos, is constantly growing across various fields. As image / video data becomes higher resolution and higher quality, the amount of information or bits transmitted increases compared to traditional image data. Therefore, transmission and storage costs increase when using media such as traditional wired / wireless broadband lines to transmit image data or when using existing storage media to store image / video data.
[0004] In addition, there is increasing interest and demand for immersive media such as virtual reality (VR) and artificial reality (AR) content or holograms, and broadcasting of images / videos with image characteristics that differ from real images such as game images is on the rise.
[0005] Therefore, there is a need for efficient image / video compression techniques to effectively compress, transmit, store, and reproduce information with high resolution and high quality images / videos that have the various characteristics described above. Summary of the Invention
[0006] Technical issues
[0007] One technical objective of this disclosure is to provide methods and apparatus for increasing image coding efficiency.
[0008] Another technical objective of this disclosure is to provide methods and apparatus for increasing conversion efficiency.
[0009] Another technical objective of this disclosure is to provide a method and apparatus for increasing the efficiency of residual coding through transformation.
[0010] Another technical objective of this disclosure is to provide an image coding method and apparatus based on Reduced Quadratic Transform (RST).
[0011] Another technical objective of this disclosure is to provide an image coding method and apparatus based on transform sets that can increase coding efficiency.
[0012] Technical solution
[0013] According to examples of this disclosure, an image decoding method performed by a decoding device is provided. The method includes: deriving quantized transform coefficients of a target block from a bitstream; deriving transform coefficients by dequantization based on the quantized transform coefficients of the target block; deriving modified transform coefficients based on an inverse reduced quadratic transform (RST) for the transform coefficients; deriving residual samples of the target block based on an inverse first transform for the modified transform coefficients; and generating a reconstructed block based on the residual samples of the target block and prediction samples derived based on intra-prediction modes of the target block, wherein an inverse RST is performed based on transform kernel matrices selected from a transform set comprising multiple transform kernel matrices, a transform set is determined based on a mapping relationship according to intra-prediction modes applied to the target block, and multiple intra-prediction modes comprising the intra-prediction modes of the target block are mapped to a transform set.
[0014] According to another example of this disclosure, a decoding apparatus for performing image decoding is provided. The decoding apparatus includes: an entropy decoder that derives information about the prediction and quantized transform coefficients of a target block from a bitstream; a predictor that generates prediction samples of the target block based on intra-prediction modes included in the information about the prediction; a dequantizer that derives transform coefficients by dequantization based on the quantized transform coefficients of the target block; an inverse transformer that includes an inverse reduced quadratic transformer that derives modified transform coefficients based on an inverse RST of the transform coefficients and an inverse first transformer that derives residual samples of the target block based on an inverse first transform of the modified transform coefficients; and an adder that generates a reconstructed image based on the residual samples and the prediction samples, wherein the inverse reduced quadratic transformer performs an inverse RST based on a transform kernel matrix included in a transform set having a mapping relationship with the intra-prediction modes, determines the transform set based on the mapping relationship according to the intra-prediction modes applied to the target block, and maps multiple intra-prediction modes including the intra-prediction modes of the target block to a transform set.
[0015] According to an example of this disclosure, an image coding method performed by an encoding device is provided. The method includes: deriving prediction samples based on an intra-prediction mode applied to a target block; deriving residual samples of the target block based on the prediction samples; deriving transform coefficients of the target block based on a first transform applied to the residual samples; deriving modified transform coefficients based on a reduced quadratic transform (RST) applied to the transform coefficients; and deriving quantized transform coefficients by performing quantization based on the modified transform coefficients, wherein the RST is performed based on a transform kernel matrix selected from a transform set comprising multiple transform kernel matrices, the transform set is determined based on a mapping relationship according to the intra-prediction mode applied to the target block, and multiple intra-prediction modes comprising the intra-prediction mode of the target block are mapped to a transform set.
[0016] According to another example of this disclosure, a digital storage medium may be provided in which image data including encoded image information generated according to an image encoding method performed by an encoding device is stored.
[0017] According to another example of this disclosure, a digital storage medium may be provided in which image data including encoded image information that causes a decoding device to perform an image decoding method is stored.
[0018] Technical effect
[0019] According to this disclosure, the overall image / video compression efficiency can be increased.
[0020] According to this disclosure, through efficient transformation, the amount of data that must be transmitted for residual processing can be reduced, and the efficiency of residual coding can be increased.
[0021] According to this disclosure, non-zero transform coefficients can be concentrated in low-frequency components through a quadratic transform in the frequency domain.
[0022] According to this disclosure, image coding efficiency can be increased by performing image coding based on a transform set. Attached Figure Description
[0023] Figure 1 Examples of video / image coding systems that can be applied to this disclosure are illustrated schematically.
[0024] Figure 2 This is a diagram schematically illustrating the configuration of a video / image encoding device to which this disclosure can be applied.
[0025] Figure 3 This is a diagram schematically illustrating the configuration of a video / image decoding device to which this disclosure can be applied.
[0026] Figure 4 The illustrations represent multiple transformation techniques according to examples of this disclosure.
[0027] Figure 5 Intra-frame orientation patterns for 65 prediction directions are shown as an example.
[0028] Figure 6 This is a diagram used to illustrate an example of RST according to this disclosure.
[0029] Figure 7 This is a flowchart illustrating an example of reverse RST processing according to this disclosure.
[0030] Figure 8 This is a flowchart illustrating another example of reverse RST processing according to this disclosure.
[0031] Figure 9This is a flowchart illustrating an example of the inverse RST process based on an inseparable quadratic transformation according to this disclosure.
[0032] Figure 10 This is a diagram illustrating an example of applying RST according to this disclosure.
[0033] Figure 11 This is a diagram showing the scanning order applied to the 4×4 transform coefficients.
[0034] Figure 12 This is a diagram illustrating the mapping of transformation coefficients according to the diagonal scanning order, as exemplified by an example of this disclosure.
[0035] Figure 13 This is a diagram illustrating a mapping of transformation coefficients based on diagonal scan order according to another example of this disclosure.
[0036] Figure 14 This is a diagram illustrating a method for selecting a transformation set under specific conditions, according to an example of this disclosure.
[0037] Figure 15 This is a flowchart illustrating the operation of a video decoding device according to an example of this disclosure.
[0038] Figure 16 This is a flowchart illustrating the operation of a video encoding device according to an example of this disclosure.
[0039] Figure 17 An illustrative diagram illustrating the structure of a content flow system applying this disclosure is provided. Detailed Implementation
[0040] While this document may be readily modified and includes various embodiments, specific embodiments thereof have been illustrated by way of example in the accompanying drawings and will now be described in detail. However, this is not intended to limit this document to the specific embodiments disclosed herein. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the technical ideas of this document. Singular forms may include plural forms unless the context clearly indicates otherwise. Terms such as “comprising” and “having” are intended to indicate the presence of the features, numbers, steps, operations, elements, components, or combinations thereof used in the following description, and should therefore not be construed as pre-excluding the possibility of the presence or addition of one or more different features, numbers, steps, operations, elements, components, or combinations thereof.
[0041] Furthermore, for ease of description of their different features and functions, the components in the accompanying drawings described herein are illustrated independently; however, this does not imply that each component is implemented by a separate piece of hardware or software. For example, any two or more of these components can be combined to form a single component, and any single component can be divided into multiple components. Implementations in which components are combined and / or divided will fall within the scope of this document's patent rights, provided they do not depart from the spirit of this document.
[0042] In the following description, preferred embodiments of this document will be explained in more detail with reference to the accompanying drawings. Furthermore, in the drawings, the same reference numerals are used for the same components, and repeated descriptions of the same components will be omitted.
[0043] This document relates to video / image coding. For example, the methods / examples disclosed in this document may relate to the VVC (Video Coding Universal) standard (ITU-T Rec.H.266), the next generation video / image coding standard after VVC, or other video coding-related standards (e.g., HEVC (High Efficiency Video Coding) standard (ITU-T Rec.H.265), EVC (Essential Video Coding) standard, AVS2 standard, etc.).
[0044] This document provides various implementations related to video / image encoding, and these implementations may be combined and performed in combination with each other unless otherwise specified.
[0045] In this document, video can refer to a collection of images over a period of time. Typically, an image is a unit representing a specific time region, while a strip / patch is a unit that constitutes a part of an image. A strip / patch may include one or more coding tree units (CTUs). An image may consist of one or more strips / patches. An image may consist of one or more patch groups. A patch group may include one or more patches.
[0046] A pixel or primitive (pel) can refer to the smallest unit that makes up a picture (or image). Alternatively, "sample" can be used as the term corresponding to a pixel. A sample can typically represent a pixel or a pixel value, and can represent only the pixel / pixel value of the luminance component or only the pixel / pixel value of the chrominance component. Alternatively, a sample can refer to a pixel value in the spatial domain, or, when the pixel value is converted to the frequency domain, it can refer to a transform coefficient in the frequency domain.
[0047] A unit can represent the basic unit of image processing. A unit may include a specific region and at least one of the information associated with that region. A unit may include a luminance block and two chrominance (e.g., cb, cr) blocks. Depending on the context, units and terms such as blocks and regions may be used interchangeably. Typically, an M×N block may include a set (or array) of samples or transform coefficients consisting of M columns and N rows.
[0048] In this document, the terms " / " and "," should be interpreted as indicating "and / or". For example, the expression "A / B" can mean "A and / or B". Additionally, "A, B" can mean "A and / or B". Furthermore, "A / B / C" can mean "at least one of A, B, and / or C". Additionally, "A / B / C" can mean "at least one of A, B, and / or C".
[0049] Additionally, in this document, the term "or" should be interpreted as indicating "and / or". For example, the expression "A or B" could include 1) only A, 2) only B, and / or 3) both A and B. In other words, the term "or" in this document should be interpreted as indicating "additionally or alternatively".
[0050] Figure 1 This document illustrates examples of video / image coding systems that can be applied.
[0051] Reference Figure 1 A video / image encoding system may include a first device (source device) and a second device (receiving device). The source device may transmit encoded video / image information or data to the receiving device in the form of a file or stream via a digital storage medium or network.
[0052] The source device may include a video source, an encoding device, and a transmitter. The receiving device may include a receiver, a decoding device, and a renderer. The encoding device may be referred to as a video / image encoding device, and the decoding device may be referred to as a video / image decoding device. The transmitter may be included in the encoding device. The receiver may be included in the decoding device. The renderer may include a display, and the display may be configured as a separate device or an external component.
[0053] Video sources can be obtained through processes that capture, synthesize, or generate video / images. Video sources may include video / image capture devices and / or video / image generation devices. Video / image capture devices may include, for example, one or more cameras, video / image archives including previously captured video / images, etc. Video / image generation devices may include, for example, computers, tablets, and smartphones, and can generate video / images (electronically). For example, virtual video / images can be generated by computers, etc. In this case, the video / image capture process can be replaced by a process that generates related data.
[0054] Encoding devices can encode input video / images. They can perform a series of processes such as prediction, transformation, and quantization for compression and coding efficiency. The encoded data (encoded video / image information) can be output as a bitstream.
[0055] A transmitter can send encoded video / image information or data, output in bitstream form, to a receiver in a receiving device via a digital storage medium or network, either as a file or a stream. Digital storage media can include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmitter can include elements for generating media files according to a predetermined file format and may include elements for transmission via a broadcast / communication network. The receiver can receive / extract the bitstream and send the received / extracted bitstream to a decoding device.
[0056] Decoding devices can decode video / images by performing a series of processes such as dequantization, inverse transform, and prediction, which correspond to the operations of encoding devices.
[0057] The renderer can render decoded video / images. The rendered video / images can then be displayed on a monitor.
[0058] Figure 2 This is a schematic diagram illustrating the configuration of a video / image encoding device to which this document can be applied. In the following text, the term "video encoding device" may include an image encoding device.
[0059] Reference Figure 2The encoding device 200 may include an image segmenter 210, a predictor 220, a residual processor 230, an entropy encoder 240, an adder 250, a filter 260, and a memory 270. The predictor 220 may include an inter-frame predictor 221 and an intra-frame predictor 222. The residual processor 230 may include a transformer 232, a quantizer 233, a dequantizer 234, and an inverse transformer 235. The residual processor 230 may further include a subtractor 231. The adder 250 may be referred to as a reconstructor or a reconstruction block generator. According to embodiments, the image segmenter 210, predictor 220, residual processor 230, entropy encoder 240, adder 250, and filter 260 described above may be constituted by one or more hardware components (e.g., an encoder chipset or processor). Furthermore, the memory 270 may include a decoded picture buffer (DPB) and may be constituted by a digital storage medium. The hardware components may further include the memory 270 as an internal / external component.
[0060] Image partitioner 210 can divide an input image (or picture or frame) input to encoding device 200 into one or more processing units. As an example, a processing unit can be called a coding unit (CU). In this case, starting from a coding tree unit (CTU) or a maximum coding unit (LCU), the coding units can be recursively partitioned according to a quadtree-binary-tritree (QTBTTT) structure. For example, based on a quadtree structure, a binary tree structure, and / or a ternary tree structure, a coding unit can be partitioned into multiple coding units of varying depths. In this case, for example, a quadtree structure can be applied first, and a binary tree structure and / or a ternary tree structure can be applied later. Alternatively, a binary tree structure can be applied first. The encoding process according to this document can be performed based on the final coding units without further partitioning. In this case, the maximum coding unit can be directly used as the final coding unit based on the encoding efficiency according to the image characteristics. Alternatively, the coding units can be recursively partitioned into deeper coding units as needed, thereby allowing the optimally sized coding unit to be used as the final coding unit. Here, the encoding process may include processes such as prediction, transformation, and reconstruction, which will be described later. As another example, the processing unit may further include a prediction unit (PU) or a transformation unit (TU). In this case, the prediction unit and the transformation unit may be separate from or distinct from the final encoding unit described above. The prediction unit may be a unit for predicting samples, and the transformation unit may be a unit for deriving the transform coefficients and / or a unit for deriving the residual signal from the transform coefficients.
[0061] Depending on the context, units and terms such as blocks and regions can be used to represent each other. Typically, an M×N block can represent a set of samples or transform coefficients consisting of M columns and N rows. Samples can typically represent pixels or pixel values, and can represent only the pixel / pixel value of the luminance component, or only the pixel / pixel value of the chrominance component. Samples can be used as a term corresponding to pixels or primitives (pellets) in a picture (or image).
[0062] Subtractor 231 subtracts the prediction signal (prediction block, prediction sample array) output from inter-frame predictor 221 or intra-frame predictor 222 from the input image signal (original block, original sample array) to generate a residual signal (residual block, residual sample array), and the generated residual signal is sent to converter 232. In this case, as shown, the unit in encoding device 200 that subtracts the prediction signal (prediction block, prediction sample array) from the input image signal (original block, original sample array) can be referred to as subtractor 231. The predictor can perform prediction on the processing target block (hereinafter referred to as "current block") and can generate a prediction block that includes prediction samples of the current block. The predictor can determine whether to apply intra-frame prediction or inter-frame prediction based on the current block or CU. As discussed later in the description of each prediction mode, the predictor can generate various information related to the prediction (e.g., prediction mode information) and send the generated information to entropy encoder 240. The information about the prediction can be encoded in entropy encoder 240 and output as a bitstream.
[0063] Intra-predictor 222 can predict the current block by referencing samples in the current image. Depending on the prediction mode, the reference samples can be located near or separate from the current block. In intra-prediction, the prediction mode can include multiple non-directional modes and multiple directional modes. Non-directional modes can include, for example, DC mode and planar mode. Depending on the level of detail in the prediction direction, the directional modes can include, for example, 33 or 65 directional prediction modes. However, this is just an example, and more or fewer directional prediction modes can be used depending on the settings. Intra-predictor 222 can determine the prediction mode to be applied to the current block by using the prediction modes applied to neighboring blocks.
[0064] Inter-frame predictor 221 can derive a predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on a reference image. In this case, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted based on the correlation between motion information of neighboring blocks and the current block, on a block, sub-block, or sample basis. Motion information may include motion vectors and reference image indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. The reference image including the reference block and the reference image including the temporally neighboring block may be the same as or different from each other. The temporally neighboring block may be referred to as a juxtaposed reference block, a juxtaposed CU (colCU), etc., and the reference image including the temporally neighboring block may be referred to as a juxtaposed image (colPic). For example, inter-frame predictor 221 can configure a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to derive the motion vector and / or reference image index of the current block. Inter-frame prediction can be performed based on various prediction modes. For example, in jump mode and merge mode, the inter-frame predictor 221 can use motion information of neighboring blocks as motion information of the current block. In jump mode, unlike merge mode, residual signals cannot be sent. In motion information prediction (motion vector prediction, MVP) mode, motion vectors of neighboring blocks can be used as motion vector predictors, and the motion vector of the current block can be indicated by signaling the motion vector difference.
[0065] Predictor 220 can generate prediction signals based on various prediction methods. For example, the predictor can apply intra-frame prediction or inter-frame prediction to the prediction of a block, and can also apply intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as combined intra-frame and inter-frame prediction (CIIP). Additionally, the predictor can perform prediction on a block based on an intra-block copy (IBC) prediction mode or a palette mode. IBC prediction mode or palette mode can be used for content image / video encoding such as games, etc. Although IBC essentially performs prediction within the current block, its execution is similar to inter-frame prediction in that it derives a reference block within the current block. That is, IBC can use at least one of the inter-frame prediction techniques described in this document.
[0066] The predicted signals generated by the inter-frame predictor 221 and / or the intra-frame predictor 222 can be used to generate the reconstructed signal or the residual signal. The transformer 232 can generate transform coefficients by applying transform techniques to the residual signal. For example, the transform techniques can include at least one of Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen-Loève Transform (KLT), Graph-Based Transform (GBT), or Conditional Nonlinear Transform (CNT). Here, GBT refers to a transform obtained from a graph when the relationship information between pixels is represented as a graph. CNT refers to a transform obtained based on the predicted signal generated using all previously reconstructed pixels. Furthermore, the transform processing can be applied to square pixel blocks of the same size, or to blocks of variable size that are not square.
[0067] Quantizer 233 quantizes the transform coefficients and sends them to entropy encoder 240, which encodes the quantized signal (information about the quantized transform coefficients) and outputs the encoded signal in a bitstream. The information about the quantized transform coefficients can be referred to as residual information. Quantizer 233 can rearrange the block-type quantized transform coefficients into a one-dimensional vector based on the coefficient scan order and generate information about the quantized transform coefficients based on this one-dimensional vector form. Entropy encoder 240 can perform various encoding methods such as exponential Golomb, context-adaptive variable-length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). Entropy encoder 240 can encode information required for video / image reconstruction, other than the quantized transform coefficients (e.g., values of syntax elements), either together or separately. The encoded information (e.g., encoded video / image information) can be transmitted or stored in bitstream form on a unit-by-unit basis in the Network Abstraction Layer (NAL). The video / image information may also include information about various parameter sets such as Adaptive Parameter Set (APS), Picture Parameter Set (PPS), Sequence Parameter Set (SPS), and Video Parameter Set (VPS). Additionally, the video / image information may include general constraint information. In this document, information and / or syntax elements sent from the encoding device to / signaled to the decoding device may be included in the video / image information. The video / image information can be encoded using the encoding process described above and included in the bitstream. The bitstream can be transmitted over a network or stored in a digital storage medium. Here, the network may include broadcast networks, communication networks, and / or the like, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmitter (not shown) that sends the signal output from the entropy encoder 240 or a memory (not shown) that stores it may be configured as an internal / external element of the encoding device 200, or the transmitter may be included in the entropy encoder 240.
[0068] The quantized transform coefficients output from quantizer 233 can be used to generate a prediction signal. For example, by applying dequantization and inverse transform using vectorized transform coefficients via dequantizer 234 and inverse transformer 235, the residual signal (residual block or residual sample) can be reconstructed. Adder 155 adds the reconstructed residual signal to the prediction signal output from inter-frame predictor 221 or intra-frame predictor 222, thereby generating a reconstructed signal (reconstructed image, reconstructed block, reconstructed sample array). When there is no residual for the processing target block, as in the case of applying a jump mode, the prediction block can be used as a reconstructed block. Adder 250 can be referred to as a reconstructor or reconstructed block generator. The generated reconstructed signal can be used for intra-frame prediction of the next processing target block in the current block, and, as described later, for inter-frame prediction of the next image through filtering.
[0069] In addition, luminance mapping with chroma scaling (LMCS) can be applied in image encoding and / or reconstruction processing.
[0070] Filter 260 can improve subjective / objective video quality by applying filtering to the reconstructed signal. For example, filter 260 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image, and the modified reconstructed image can be stored in memory 270, specifically in the DPB of memory 270. Various filtering methods can include, for example, deblocking filtering, sample adaptive offset, adaptive ring filter, bilateral filter, etc. As discussed later in the description of each filtering method, filter 260 can generate various filtering-related information and send the generated information to entropy encoder 240. The filtering information can be encoded in entropy encoder 240 and output as a bitstream.
[0071] The modified reconstructed image sent to memory 270 can be used as a reference image in inter-frame predictor 221. Accordingly, the encoding device can avoid prediction mismatch in the encoding device 100 and the decoding device when applying inter-frame prediction, and can also improve encoding efficiency.
[0072] Memory 270DPB can store modified reconstructed images for use as reference images in inter-frame predictor 221. Memory 270 can store motion information of blocks in the current image from which motion information has been derived (or encoded) and / or motion information of blocks in reconstructed images. The stored motion information can be sent to inter-frame predictor 221 to be used as motion information of neighboring blocks or temporally neighboring blocks. Memory 270 can store reconstructed samples of reconstructed blocks in the current image and send them to intra-frame predictor 222.
[0073] Figure 3This is a diagram that schematically illustrates the configuration of the video / image decoding device to which this document can be applied.
[0074] Reference Figure 3 The video decoding device 300 may include an entropy decoder 310, a residual processor 320, a predictor 330, an adder 340, a filter 350, and a memory 360. The predictor 330 may include an inter-frame predictor 331 and an intra-frame predictor 332. The residual processor 320 may include a dequantizer 321 and an inverse transformer 322. According to embodiments, the entropy decoder 310, residual processor 320, predictor 330, adder 340, and filter 350 described above may be constituted by one or more hardware components (e.g., a decoder chipset or processor). Additionally, the memory 360 may include a decoded picture buffer (DPB) and may be constituted by a digital storage medium. The hardware components may also include the memory 360 as an internal / external component.
[0075] When the input includes a bitstream containing video / image information, the decoding device 300 can interact with data already prepared therein. Figure 2 The processing of video / image information in the encoding device correspondingly reconstructs the image. For example, the decoding device 300 can deduce units / blocks based on information related to block segmentation obtained from the bitstream. The decoding device 300 can perform decoding by using processing units applied in the encoding device. Therefore, the decoding processing unit can be, for example, an encoding unit, which can be segmented along a quadtree structure, binary tree structure, and / or ternary tree structure using encoding tree units or maximum encoding units. One or more transform units can be derived using encoding units. And, the reconstructed image signal decoded and output by the decoding device 300 can be reproduced by a reproducer.
[0076] Decoding device 300 can receive data from... in the form of a bitstream. Figure 2The signal output by the encoding device can be decoded by the entropy decoder 310. For example, the entropy decoder 310 can parse the bitstream to derive the information (e.g., video / image information) required for image reconstruction (or picture reconstruction). The video / image information may also include information about various parameter sets such as Adaptive Parameter Set (APS), Picture Parameter Set (PPS), Sequence Parameter Set (SPS), Video Parameter Set (VPS), etc. In addition, the video / image information may also include general constraint information. The decoding device can further decode the picture based on the information about the parameter sets and / or general constraint information. The signaling / receiving information and / or syntax elements, which will be described subsequently in this document, can be decoded and obtained from the bitstream through the decoding process. For example, the entropy decoder 310 can decode the information in the bitstream based on encoding methods such as Exponential Golomb coding, CAVLC, CABAC, etc., and can output the values of the syntax elements required for image reconstruction and the quantized values of the transform coefficients of the residuals. More specifically, the CABAC entropy decoding method can receive bins corresponding to each syntax element in the bitstream, determine a context model using information about the target syntax element and the decoding information of neighboring and target blocks, or information about symbols / bins decoded in previous steps, predict the bin generation probability based on the determined context model, and perform arithmetic decoding on the bins to generate symbols corresponding to each syntax element value. Here, the CABAC entropy decoding method can update the context model after determining it using information about symbols / bins decoded for the context model of the next symbol / bin. Prediction information from the information decoded in the entropy decoder 310 can be provided to the predictors (inter-frame predictor 332 and intra-frame predictor 331), and the residual values (i.e., quantization transform coefficients) and associated parameter information that have undergone entropy decoding in the entropy decoder 310 can be input to the residual processor 320. The residual processor 320 can derive residual signals (residual blocks, residual samples, residual sample arrays). Additionally, filtering information from the information decoded in the entropy decoder 310 can be provided to the filter 350. Furthermore, a receiver (not shown) that receives the signal output from the encoding device can also configure the decoding device 300 as an internal / external component, and the receiver can be a component of the entropy decoder 310. Additionally, the decoding device according to this document can be referred to as a video / image / picture encoding device, and the decoding device can be divided into an information decoder (video / image / picture information decoder) and a sample decoder (video / image / picture sample decoder). The information decoder may include the entropy decoder 310, and the sample decoder may include at least one of a dequantizer 321, an inverse transformer 322, an adder 340, a filter 350, a memory 360, an inter-frame predictor 332, and an intra-frame predictor 331.
[0077] The dequantizer 321 can output transform coefficients by dequantizing the quantized transform coefficients. The dequantizer 321 can rearrange the quantized transform coefficients into two-dimensional blocks. In this case, the rearrangement can be performed based on the order of coefficient scans already performed in the encoding device. The dequantizer 321 can perform dequantization on the quantized transform coefficients using quantization parameters (e.g., quantization step size information) and obtain the transform coefficients.
[0078] The inverse converter 322 obtains the residual signal (residual block, residual sample array) by performing an inverse transformation on the transformation coefficients.
[0079] The predictor can perform predictions on the current block and generate a prediction block that includes prediction samples for the current block. The predictor can determine whether to apply intra-frame prediction or inter-frame prediction to the current block based on information about the prediction output from the entropy decoder 310, and specifically, can determine the intra-frame / inter-frame prediction mode.
[0080] Predictors can generate predicted signals based on various prediction methods. For example, a predictor can apply intra-frame prediction or inter-frame prediction to the prediction of a block, and can also apply intra-frame prediction and inter-frame prediction simultaneously. This can be called combined intra-frame and inter-frame prediction (CIIP). Additionally, a predictor can perform intra-block copying (IBC) for the prediction of a block. Intra-block copying can be used for content image / video coding such as in games with screen content coding (SCC). Although IBC essentially performs prediction within the current block, its execution is similar to inter-frame prediction in that it derives a reference block within the current block. That is, IBC can use at least one of the inter-frame prediction techniques described in this document. Palette mode can be considered an example of intra-frame coding or intra-frame prediction.
[0081] The intra-predictor 331 can predict the current block by referencing samples in the current image. Depending on the prediction mode, the reference samples can be located near or separate from the current block. In intra-prediction, the prediction mode can include multiple non-directional modes and multiple directional modes. The intra-predictor 331 can determine the prediction mode applied to the current block by using the prediction modes applied to neighboring blocks.
[0082] Inter-frame predictor 332 can deduce the predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on a reference image. In this case, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted based on the correlation between motion information of neighboring blocks and the current block, on a block, sub-block, or sample basis. Motion information may include motion vectors and reference image indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. For example, inter-frame predictor 332 can configure a motion information candidate list based on neighboring blocks and deduce the motion vector and / or reference image index of the current block based on received candidate selection information. Inter-frame prediction can be performed based on various prediction modes, and the information about the prediction may include information indicating the mode of inter-frame prediction for the current block.
[0083] Adder 340 can generate a reconstruction signal (reconstructed image, reconstruction block, reconstruction sample array) by adding the obtained residual signal to the prediction signal (prediction block, prediction sample array) output from predictor 330. When there is no residual for processing the target block, as in the case of applying the jump mode, the prediction block can be used as the reconstruction block.
[0084] Adder 340 can be referred to as a reconstructor or reconstruction block generator. The generated reconstructed signal can be used for intra-frame prediction of the next processing target block in the current block, and as described later, it can be output by filtering or used for inter-frame prediction of the next image.
[0085] In addition, luminance mapping with chroma scaling (LMCS) can be applied in image decoding processing.
[0086] Filter 350 can improve subjective / objective video quality by applying filtering to the reconstructed signal. For example, filter 350 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image, and the modified reconstructed image can be sent to memory 360, specifically to the DPB of memory 360. Various filtering methods can include, for example, deblocking filtering, adaptive sample shifting, adaptive ring filtering, bilateral filtering, etc.
[0087] The (modified) reconstructed image stored in the DPB of memory 360 can be used as a reference image in inter-frame predictor 331. Memory 360 can store motion information of blocks in the current image from which motion information has been derived (or decoded) and / or motion information of blocks in a reconstructed image. The stored motion information can be sent to inter-frame predictor 331 to be used as motion information of neighboring blocks or temporally neighboring blocks. Memory 360 can store reconstructed samples of reconstructed blocks in the current image and send them to intra-frame predictor 332.
[0088] The examples described in this specification in the predictor 330, dequantizer 321, inverse transformer 322 and filter 350 of the decoding device 300 can be similarly or correspondingly applied to the predictor 220, dequantizer 234, inverse transformer 235 and filter 260 of the encoding device 200, respectively.
[0089] As described above, prediction is performed to improve compression efficiency during video encoding. Accordingly, a prediction block can be generated that includes prediction samples for the current block, which is the target block for encoding. Here, the prediction block includes prediction samples in the spatial domain (or pixel domain). The prediction block can be derived identically in both the encoding and decoding devices, and the encoding device can improve image encoding efficiency by signaling to the decoding device information about the residual between the original block and the prediction block (residual information), not the original sample values of the original block itself. The decoding device can derive a residual block including residual samples based on the residual information, generate a reconstructed block including reconstructed samples by adding the residual block to the prediction block, and generate a reconstructed image including the reconstructed block.
[0090] Residual information can be generated through transformation and quantization processes. For example, an encoding device can derive a residual block between the original block and the prediction block, derive transform coefficients by performing a transform process on the residual samples (residual sample array) included in the residual block, and derive quantized transform coefficients by performing a quantization process on the transform coefficients. This allows it to signal the associated residual information to the decoding device (via a bitstream). Here, the residual information can include the value information, position information, transform technique, transform kernel, quantization parameters, etc., of the quantized transform coefficients. The decoding device can perform quantization / dequantization processes based on the residual information and derive residual samples (or residual sample blocks). The decoding device can generate a reconstructed block based on the prediction block and the residual block. The encoding device can derive the residual block by performing dequantization / inverse transform on the quantized transform coefficients to serve as a reference for inter-frame prediction of the next image, and can generate a reconstructed image based on this.
[0091] Figure 4 This schematically illustrates the multi-transformation technique according to the present disclosure.
[0092] Reference Figure 4 The converter can correspond to the aforementioned Figure 2 The converter in the encoding device, and the inverse converter can correspond to the aforementioned Figure 2 Inverse converter in encoding devices, or Figure 3 The inverse converter in the decoding device.
[0093] The transformer can derive (first) transform coefficients (S410) by performing a first transform based on residual samples (residual sample array) in the residual block. This first transform can be referred to as the core transform. In this paper, the first transform can be based on multiple transform selection (MTS), and when multiple transforms are used as a first transform, it can be referred to as a multi-core transform.
[0094] Multi-core transform can represent a method of performing transforms by additionally using Discrete Cosine Transform (DCT) Type 2 and Discrete Sine Transform (DST) Type 7, DCT Type 8, and / or DST Type 1. In other words, multi-core transform can represent a method of transforming a spatial domain residual signal (or residual block) into frequency domain transform coefficients (or primary transform coefficients) based on multiple transform kernels selected from DCT Type 2, DST Type 7, DCT Type 8, and DST Type 1. In this paper, from the perspective of the transformer, primary transform coefficients can be referred to as temporary transform coefficients.
[0095] In other words, when applying conventional transform methods, transform coefficients can be generated by applying a spatial-to-frequency domain transform to the residual signal (or residual block) based on DCT type 2. In contrast, when applying multi-core transforms, transform coefficients (or single-stage transform coefficients) can be generated by applying a spatial-to-frequency domain transform to the residual signal (or residual block) based on DCT type 2, DST type 7, DCT type 8, and / or DST type 1. In this paper, DCT type 2, DST type 7, DCT type 8, and DST type 1 can be referred to as transform types, transform kernels, or transform cores.
[0096] For reference, the DCT / DST transform type can be defined based on basis functions, and the basis functions can be shown in the table below.
[0097] [Table 1]
[0098]
[0099] If a multi-core transform is performed, a vertical transform core and a horizontal transform core can be selected from the transform cores for the target block. A vertical transform can be performed on the target block based on the vertical transform core, and a horizontal transform can be performed on the target block based on the horizontal transform core. Here, the horizontal transform can represent the transform of the horizontal components of the target block, and the vertical transform can represent the transform of the vertical components of the target block. The vertical transform core / horizontal transform core can be adaptively determined based on the prediction mode and / or transform index of the target block (CU or sub-block), including the residual block.
[0100] The transformer can derive modified (secondary) transform coefficients by performing a secondary transform based on the (first) transform coefficients (S420). A primary transform is a transform from the spatial domain to the frequency domain, while a secondary transform refers to a transformation to a more compressed representation by utilizing the correlation between the (first) transform coefficients. Secondary transforms can include inseparable transforms. In this case, the secondary transform can be called an inseparable secondary transform (NSST) or a mode-dependent inseparable secondary transform (MDNSST). An inseparable secondary transform can represent a transform that generates modified transform coefficients (or secondary transform coefficients) for the residual signal by performing a secondary transform on the (first) transform coefficients derived from the primary transform based on an inseparable transform matrix. In this case, the vertical and horizontal transforms may not be applied separately to the (first) transform coefficients (or the horizontal and vertical transforms may not be applied independently), but the transform matrix can be applied once based on the inseparable transform. In other words, an inseparable quadratic transform can represent a transformation method where the vertical and horizontal components of the (first) transform coefficients are not separated, and, for example, a two-dimensional signal (transform coefficients) is rearranged into a one-dimensional signal through a defined direction (e.g., a first row direction or a first column direction), and then modified transform coefficients (or quadratic transform coefficients) are generated based on the inseparable transform matrix. The inseparable quadratic transform can be applied to the upper left region of a block containing the (first) transform coefficients (hereinafter referred to as a transform coefficient block). For example, if the width (W) and height (H) of the transform coefficient block are both equal to or greater than 8, an 8×8 inseparable quadratic transform can be applied to the upper left 8×8 region of the transform coefficient block. Furthermore, if the width (W) and height (H) of the transform coefficient block are both equal to or greater than 4, and the width (W) or height (H) of the transform coefficient block is less than 8, then a 4×4 inseparable quadratic transform can be applied to the upper left min(8,W)×min(8,H) region of the transform coefficient block. However, the implementation is not limited to this, and for example, even if only the condition that the width (W) or height (H) of the transform coefficient block is equal to or greater than 4 is met, the 4×4 inseparable quadratic transformation can be applied to the upper left min(8,W)×min(8,H) region of the transform coefficient block.
[0101] Specifically, for example, if a 4×4 input block is used, the inseparable quadratic transformation can be performed as follows.
[0102] A 4×4 input block X can be represented as follows.
[0103] [Formula 1]
[0104]
[0105] If X is represented as a vector, then the vector It can be represented as follows.
[0106] [Equation 2]
[0107]
[0108] In this case, the inseparable quadratic transformation can be calculated as follows.
[0109] [Formula 3]
[0110]
[0111] in, represents the transformation coefficient vector, while T represents the 16×16 (inseparable) transformation matrix.
[0112] Using Equation 3 above, the 16×1 transformation coefficient vector can be derived. Furthermore, the vector can be scanned in sequence (horizontal, vertical, and diagonal, etc.). Reorganize into 4×4 blocks. However, the above calculation is an example, and the hypercube-Givens transform (HyGT) and similar methods can also be used to calculate inseparable quadratic transformations in order to reduce the computational complexity of inseparable quadratic transformations.
[0113] Furthermore, in inseparable quadratic transforms, the transform kernel (or transform type) can be selected as mode-dependent. In this case, the mode can include intra-frame prediction mode and / or inter-frame prediction mode.
[0114] As described above, an inseparable quadratic transformation can be performed based on an 8×8 transformation or a 4×4 transformation, determined by the width (W) and height (H) of the transform coefficient block. In this case, to select the mode-dependent transform kernel, 35 sets of three inseparable quadratic transform kernels can be configured for both the 8×8 and 4×4 transformations for the inseparable quadratic transformation. That is, 35 transform sets can be configured for the 8×8 transformation, and 35 transform sets can be configured for the 4×4 transformation. In this case, each of the 35 transform sets for the 8×8 transformation can include three 8×8 transform kernels, and in this case, each of the 35 transform sets for the 4×4 transformation can include three 4×4 transform kernels. However, the size of the transformation, the number of sets, and the number of transform kernels in the sets are examples, and any size other than 8×8 or 4×4 can be used, or n sets can be configured, and each set can include k kernels.
[0115] The transform set can be called the NSST set, and the transform kernel in the NSST set can be called the NSST kernel. For example, a specific set from the transform set can be selected based on the intra-prediction mode of the target block (CU or sub-block).
[0116] For reference, as an example, intra-prediction modes may include two non-directional (or non-angular) intra-prediction modes and 65 directional (or angular) intra-prediction modes. The non-directional intra-prediction modes may include a 0-plane intra-prediction mode and a 1-DC intra-prediction mode, and the directional intra-prediction modes may include 65 intra-prediction modes between intra-prediction mode 2 and intra-prediction mode 66. However, this is an example, and this disclosure can be applied to cases where the number of intra-prediction modes differs. Furthermore, depending on the circumstances, an intra-prediction mode 67 may be used, and intra-prediction mode 67 may represent a linear model (LM) mode.
[0117] Figure 5 Intra-frame orientation patterns for 65 prediction directions are shown as an example.
[0118] Reference Figure 5 Based on intra-prediction mode 34 with a left-top diagonal prediction direction, intra-prediction modes with horizontal directionality and intra-prediction modes with vertical directionality can be classified. Figure 5H and V refer to the horizontal and vertical orientations, respectively, and the numbers -32 to 32 indicate displacements in units of 1 / 32 at the sample grid positions. This can represent the offset of the mode index value. Intra-prediction modes 2 through 33 are horizontally oriented, while intra-prediction modes 34 through 66 are vertically oriented. Furthermore, strictly speaking, intra-prediction mode 34 can be considered neither horizontal nor vertical, but in terms of the transform set used to determine the secondary transform, it can be classified as horizontally oriented. This is because the input data is transposed for a vertically oriented mode symmetric to intra-prediction mode 34, and the input data alignment method used for the horizontal mode is applied to intra-prediction mode 34. Intra-prediction modes 18 and 50 can represent the horizontal and vertical intra-prediction modes, respectively, and intra-prediction mode 2 can be called the upper-right diagonal intra-prediction mode because it has a left reference pixel and predicts in the upper-right direction. In the same way, intra prediction mode 34 can be called the bottom right diagonal intra prediction mode, and intra prediction mode 66 can be called the bottom left diagonal intra prediction mode.
[0119] In this case, for example, the mapping between the 35 transform sets and the intra-prediction mode can be shown in the table below. For reference, if the LM mode is applied to the target block, the quadratic transform need not be applied to the target block.
[0120] [Table 2]
[0121]
[0122] Furthermore, if a specific set is determined to be used, one of the k transform kernels in that specific set can be selected using an inseparable quadratic transform index. The encoding device can derive the inseparable quadratic transform index indicating the specific transform kernel based on rate-distortion (RD) check and can signal the inseparable quadratic transform index to the decoding device. The decoding device can then select one of the k transform kernels in the specific set based on the inseparable quadratic transform index. For example, NSST index value 0 can indicate the first inseparable quadratic transform kernel, NSST index value 1 can indicate the second inseparable quadratic transform kernel, and NSST index value 2 can indicate the third inseparable quadratic transform kernel. Alternatively, NSST index value 0 can indicate that the first inseparable quadratic transform is not applied to the target block, and NSST index values 1 through 3 can indicate three transform kernels.
[0123] Return to reference Figure 4The converter can perform an inseparable quadratic transform based on the selected transform core and obtain modified (quadratic) transform coefficients. As mentioned above, the modified transform coefficients can be derived as transform coefficients quantized by a quantizer and can be encoded and signaled to the decoding device, and transmitted to the dequantizer / inverse converter in the encoding device.
[0124] Furthermore, as mentioned above, if the second transformation is omitted, the (first) transformation coefficients, which are the output of the first (separable) transformation, can be derived as the transformation coefficients quantized by the quantizer as described above, and can be encoded and signaled to the decoding device, and transmitted to the dequantizer / inverse transformer in the encoding device.
[0125] The inverse transformer can perform a series of processes in the reverse order of those already executed in the aforementioned transformers. The inverse transformer can receive (dequantized) transform coefficients and derive (first) transform coefficients by performing a second (inverse) transform (S450), and obtain residual blocks (residual samples) by performing a first (inverse) transform on the (first) transform coefficients. In this regard, from the perspective of the inverse transformer, the first transform coefficients can be referred to as modified transform coefficients. As described above, the encoding and decoding devices can generate reconstructed blocks based on the residual blocks and the prediction blocks, and can generate reconstructed images based on the reconstructed blocks.
[0126] Furthermore, as mentioned above, if the second (inverse) transform is omitted, the (dequantized) transform coefficients can be received, a first (separable) inverse transform can be performed, and a residual block (residual sample) can be obtained. As mentioned above, the encoding and decoding devices can generate a reconstructed block based on the residual block and the prediction block, and can generate a reconstructed image based on the reconstructed block.
[0127] Furthermore, in this disclosure, a reduced quadratic transformation (RST) in which the size of the transformation matrix (kernel) is reduced can be applied to the concept of NSST in order to reduce the computational and storage requirements of the inseparable quadratic transformation.
[0128] Furthermore, the transform kernel, transform matrix, and coefficients constituting the transform kernel matrix described in this disclosure, i.e., kernel coefficients or matrix coefficients, can be represented in 8 bits. This is feasible for implementation in decoding and encoding devices, and compared to existing 9-bit or 10-bit representations, it reduces the amount of storage required to store the transform kernel and can reasonably accommodate performance degradation. Additionally, representing the kernel matrix in 8 bits allows for the use of smaller multipliers and is more suitable for Single Instruction Multiple Data (SIMD) instructions for optimal software implementation.
[0129] In this specification, the term "RST" can refer to a transformation performed on the residual samples of a target block based on a transformation matrix whose size is reduced according to a reduction factor. When performing a reduction transformation, the computational cost required for the transformation can be reduced due to the smaller size of the transformation matrix. In other words, RST can be used to address computational complexity issues that arise when transforming large blocks or when transforming indivisible blocks.
[0130] RST can be referred to by various terms such as reduced transform, reduced quadratic transform, reduced transform, simplified transform, and simple transform, and the names that RST can be called are not limited to the examples listed. Alternatively, since RST is performed primarily in the low-frequency region of the transform block that includes non-zero coefficients, it can be called low-frequency inseparable transform (LFNST).
[0131] Furthermore, when performing a second inverse transform based on RST, the inverse transformer 235 of the encoding device 200 and the inverse transformer 322 of the decoding device 300 may include an inverse reduced second transformer that derives modified transform coefficients based on the inverse RST of the transform coefficients; and an inverse first transformer that derives the residual samples of the target block based on the inverse first transform of the modified transform coefficients. The inverse first transform refers to the inverse transform of the first transform applied to the residuals.
[0132] Figure 6 This is a diagram used to illustrate an example of RST according to this disclosure.
[0133] In this specification, the term "target block" may refer to the current block or residual block on which coding is performed.
[0134] In the example RST, an N-dimensional vector can be mapped to an R-dimensional vector in another space, thus determining the reduced transformation matrix, where R is less than N. N can refer to the square of the length of the side of the block to which the transformation is applied, or the total number of transformation coefficients corresponding to the block to which the transformation is applied, and the reduction factor can refer to the R / N value. The reduction factor can be called a reduction factor, shrinkage factor, simplification factor, or other various terms. Furthermore, R can be called a reduction coefficient, but depending on the situation, the reduction factor can refer to R. Additionally, depending on the situation, the reduction factor can refer to the N / R value.
[0135] In this example, the reduction factor or reduction coefficient can be signaled via a bitstream, but the example is not limited to this. For instance, a predetermined value for the reduction factor or reduction coefficient can be stored in each of the encoding device 200 and the decoding device 300, and in this case, the reduction factor or reduction coefficient does not need to be signaled separately.
[0136] The size of the reduced transformation matrix, as shown in the example, can be less than N×N (the size of the regular transformation matrix) and can be R×N, as defined in Equation 4 below.
[0137] [Formula 4]
[0138]
[0139] Figure 6 The matrix T in the reduced transformation block shown in (a) can refer to the matrix T in Equation 4. R×N .like Figure 6 As shown in (a), when the reduced transformation matrix T R×N The transformation coefficients of the target block can be derived by multiplying by the residual sample of the target block.
[0140] In the example, if the size of the block to which the transformation is applied is 8×8 and R = 16 (i.e., R / N = 16 / 64 = 1 / 4), then according to Figure 6 The RST of (a) can be represented as the matrix operation shown in Equation 5. In this case, the storage and multiplication computations can be reduced to approximately 1 / 4 by a reduction factor.
[0141] [Formula 5]
[0142]
[0143] In Equation 5, r1 to r 64 This can represent the residual sample of the target block. As a result of Equation 5, the transformation coefficients c of the target block can be derived. i And derive c i The process can be shown in Equation 6.
[0144] [Formula 6]
[0145]
[0146] As a result of Equation 6, the transformation coefficients c1 to c of the target block can be derived. R In other words, when R = 16, the transformation coefficients c1 to c of the target block can be derived. 16 If a conventional transform is applied instead of an RST, and a 64×64 (N×N) transform matrix is multiplied by a 64×1 (N×1) residual sample, only 16(R) transform coefficients are derived for the target block because of the application of the RST, even though 64(N) transform coefficients are derived for the target block. Since the total number of transform coefficients used for the target block is reduced from N to R, the amount of data sent from the encoding device 200 to the decoding device 300 is reduced, thus improving the transmission efficiency between the encoding device 200 and the decoding device 300.
[0147] When considering the size of the transformation matrix, the size of a regular transformation matrix is 64×64 (N×N), but the size of a reduced transformation matrix is reduced to 16×64 (R×N). Therefore, compared to performing a regular transformation, the storage utilization rate of performing an RST can be reduced by the R / N ratio. Furthermore, compared to the number of multiplications (N×N) when using a regular transformation matrix, using a reduced transformation matrix can reduce the number of multiplications (R×N) by the R / N ratio.
[0148] In the example, the transformer 232 of the encoding device 200 can derive the transform coefficients of the target block by performing a first transform and an RST-based second transform on the residual samples of the target block. These transform coefficients can be passed to the inverse transformer of the decoding device 300, and the inverse transformer 322 of the decoding device 300 can derive the modified transform coefficients based on the inverse reduced second transform (RST) for the transform coefficients, and can derive the residual samples of the target block based on the inverse first transform for the modified transform coefficients.
[0149] Based on the example inverse RST matrix T N×R Its size is N×R, which is larger than the conventional inverse transformation matrix N×N, and is the same as the reduced transformation matrix T shown in Equation 4. R×N It has a transpose relationship.
[0150] Figure 6 The matrix T in the reduced inverse transform block shown in (b) t It can refer to the inverse RST matrix T N×R T (The superscript T indicates transpose). For example... Figure 6 As shown in (b), when the inverse RST matrix T N×R T When multiplied by the transformation coefficients of the target block, the modified transformation coefficients of the target block or the residual samples of the target block can be derived.
[0151] More specifically, when the inverse RST is used as a second inverse transformation, when the inverse RST matrix T N×R T When multiplied by the transform coefficients of the target block, the modified transform coefficients of the target block can be derived. Furthermore, the inverse RST can be used as the inverse first-order transform, and in this case, when the inverse RST matrix T... N×R T When multiplied by the transformation coefficients of the target block, the residual sample of the target block can be derived.
[0152] In the example, if the size of the block to which the inverse transform is applied is 8×8 and R = 16 (i.e., R / N = 16 / 64 = 1 / 4), then according to Figure 6 The RST of (b) can be represented as the matrix operation shown in Equation 7.
[0153] [Formula 7]
[0154]
[0155] In Equation 7, c1 to c 16 This can represent the transformation coefficients of the target block. As a result of Equation 7, the transformation coefficients representing the modifications to the target block or the r values of the residual samples of the target block can be derived. j And derive r j The process can be shown in Equation 8.
[0156] [Formula 8]
[0157]
[0158] As a result of Equation 8, the transformation coefficients representing the modification of the target block or the residual samples of the target block, r1 to r2, can be derived. N From the perspective of the size of the inverse transformation matrix, the size of the regular inverse transformation matrix is 64×64 (N×N), but the size of the inverse reduced transformation matrix is reduced to 64×16 (R×N). Therefore, compared with performing the regular inverse transformation, the storage utilization rate of performing the inverse RST can be reduced by the R / N ratio. In addition, when comparing the number of multiplications N×N when using the regular inverse transformation matrix, using the inverse reduced transformation matrix can reduce the number of multiplications (N×R) by the R / N ratio.
[0159] Figure 7 This is a flowchart illustrating an example of reverse RST processing according to this disclosure.
[0160] Figure 7 Each step disclosed in the document is by Figure 3 The decoding device 300 disclosed in the document is used to perform this. More specifically, it can be performed by... Figure 3 The publicly disclosed dequantizer 321 executes S700, and S710 and S720 can be... Figure 3 The inverse transformer 322 disclosed herein is used to perform this. Therefore, for the above... Figure 3 Descriptions of specific content that is repeated will be omitted or simplified. Furthermore, in this disclosure, RST can be applied to a forward transform, and inverse RST can refer to a transform applied in the reverse direction.
[0161] In the example, the difference between a specific operation according to the reverse RST and a specific operation according to RST may be only that their operation order is reversed, and the specific operation according to the reverse RST may be substantially similar to the specific operation according to RST. Therefore, those skilled in the art will readily understand that the description of S700 to S720 for the reverse RST described below can be applied to RST in the same or similar manner.
[0162] According to the example, the decoding device 300 can derive the transform coefficients by performing dequantization on the quantization transform coefficients of the target block (S700).
[0163] The decoding device 300 in the example can select a transform kernel (S710). More specifically, the decoding device 300 can select a transform kernel based on at least one of the following: transform index, the width and height of the region to which the transform is applied, the intra-prediction mode used in image decoding, and the color components of the target block. However, the example is not limited to this; for example, the transform kernel can be predefined, and separate information for selecting the transform kernel can be provided without signaling it.
[0164] In one example, CIdx can indicate information about the color components of a target block. If the target block is a luma block, CIdx can indicate 0, and if the target block is a chroma block (e.g., a Cb block or a Cr block), CIdx can indicate a non-zero value (e.g., 1).
[0165] According to the example, the decoding device 300 can apply the inverse RST to the transform coefficients based on the selected transform kernel and reduction factor (S720).
[0166] Figure 8 This is a flowchart illustrating another example of the reverse RST according to this disclosure.
[0167] Figure 8 Each step that is made public can be by Figure 3 300 publicly available decoding devices are used to perform this. Figure 8 The processing. More specifically, the S800 can be handled by... Figure 3 The publicly available dequantizer 321 is used for execution, and S810 to S860 can be powered by... Figure 3 The disclosed inverse transformer 322 is used to perform this. Therefore, for the above... Figure 3 Descriptions of specific content that are repeated will be omitted or simplified.
[0168] In the example, as described above, the difference between a specific operation according to reverse RST and a specific operation according to RST may be only that their operation order is reversed, and the specific operation according to reverse RST may be substantially similar to the specific operation according to RST. Therefore, those skilled in the art will readily understand that the description of S800 to S860 for reverse RST described below can be applied to RST in the same or similar manner.
[0169] According to the example, the decoding device 300 can perform dequantization on the quantization coefficients of the target block (S800). If the transformation has already been performed by the encoding device 200, then in S800, the decoding device 300 can deduce the transform coefficients of the target block by performing dequantization on the quantization transform coefficients of the target block. Conversely, if the encoding device 200 has not yet performed the transformation, then in S800, the decoding device 300 can deduce the residual samples of the target block by performing dequantization on the quantization residual samples of the target block.
[0170] According to the example, the decoding device 300 can determine whether a transformation has been performed on the residual sample of the target block in the encoding device 200 (S810), and when it is determined that a transformation has been performed, the decoding device can parse the transformation index (or decode it from the bitstream) (S820). The transformation index may include a horizontal transformation index for horizontal transformations and a vertical transformation index for vertical transformations.
[0171] In the example, the transformation index can include a first-order transformation index, a core transformation index, and an NSST index, etc. The transformation index can be represented, for example, as `Transform_idx`, and the NSST index can be represented, for example, as `NSST_idx`. Additionally, the horizontal transformation index can be represented as `Transform_idx_h`, and the vertical transformation index can be represented as `Transform_idx_v`.
[0172] Furthermore, according to another example of this disclosure, dequantization can be performed after all transform indices have been parsed.
[0173] If it is determined in S810 that the encoding device 200 has not performed a transformation on the residual sample of the target block, the decoding device 300 according to the example may omit the operations according to S820 to S860.
[0174] According to the example, the decoding device 300 can select a transform kernel based on at least one of the following: transform index, width and height of the region to which the transform is applied, intra-prediction mode used in image decoding, and color components of the target block (S830).
[0175] The example decoding device 300 can determine whether it corresponds to the condition for performing an inverse RST on the transform coefficients of the target block (S840).
[0176] In the example, when the width and height of the region to which the inverse RST is applied are both greater than the first coefficient, the decoding device 300 can determine that it corresponds to the condition for performing the inverse RST on the transform coefficients of the target block. The first coefficient can be 4.
[0177] In another example, when the product of the width and height of the region to which the inverse RST is applied is greater than a second coefficient, and the smaller of the width and height of the region to which the inverse RST is applied is greater than a third coefficient, the decoding device 300 can determine that it corresponds to the condition for performing the inverse RST on the transform coefficients of the target block. The second and third coefficients can be preset values.
[0178] In yet another example, when the width and height of the region to which the inverse RST is applied are both less than or equal to the fourth coefficient, the decoding device 300 can determine that it corresponds to the condition for performing the inverse RST on the transform coefficients of the target block. The fourth coefficient can be 8.
[0179] In another example, when the product of the width and height of the region to which the inverse RST is applied is less than or equal to the fifth coefficient, and the smaller of the width and height of the region to which the inverse RST is applied is less than or equal to the sixth coefficient, the decoding device 300 can determine that it corresponds to the condition for performing the inverse RST on the transform coefficients of the target block. The fifth and sixth coefficients can be preset values.
[0180] In another example, when at least one of the following conditions is met—that the width and height of the region to which the inverse RST is applied are greater than a first coefficient, that the product of the width and height of the region to which the inverse RST is applied is greater than a second coefficient and the smaller of the width and height of the region to which the inverse RST is applied is greater than a third coefficient, that the width and height of the region to which the inverse RST is applied are less than or equal to a fourth coefficient, and that the product of the width and height of the region to which the inverse RST is applied is less than or equal to a fifth coefficient and the smaller of the width and height of the region to which the inverse RST is applied is less than or equal to a sixth coefficient—the decoding device 300 can determine that it corresponds to the conditions for performing the inverse RST on the transform coefficients of the target block.
[0181] In the example above, the first through sixth coefficients can be any predefined positive integers. For example, the first through sixth coefficients could be 4, 8, 16, or 32.
[0182] According to the example, the reverse RST can be applied to square regions included in the target block (that is, when the width and height of the region to which the reverse RST is applied are the same), and in some cases, the width and height of the region to which the reverse RST is applied can be fixed to predefined coefficient values (e.g., 4, 8, 16, 32, etc.). Furthermore, the region to which the reverse RST is applied is not limited to square regions, but can also be applied to rectangular or non-rectangular regions. See below... Figure 10 The following will provide a more detailed description of the regions to which the reverse RST applies.
[0183] In the example, the condition for performing an inverse RST can be determined based on the transform index. That is, the transform index can indicate which transform has been performed on the target block.
[0184] If it is determined in S840 that the conditions for performing the inverse RST are not met, the example decoding device 300 can perform a (conventional) inverse transform on the transform coefficients of the target block (S850). As above... Figure 4 The inverse transformation may include, but is not limited to, DCT2, DCT4, DCT5, DCT7, DCT8, DST1, DST4, DST7, NSST, and JEM-NSST(HyGT).
[0185] When it is determined in S840 that the conditions for performing inverse RST are met, the inverse RST can be performed on the transform coefficients of the target block according to the example decoding device 300.
[0186] Figure 9 This is a flowchart illustrating an example of an RST process based on an inseparable quadratic transformation according to this disclosure.
[0187] Figure 9 Each step that is made public is by Figure 3 The publicly available decoding device 300 performs this. More specifically, it can be performed by... Figure 3 The publicly disclosed dequantizer 321 is used to execute the S900, and can be... Figure 3 The disclosed inverter 322 is used to execute S910 to S980. Additionally, Figure 9 The S900 can correspond to Figure 8 The S800, Figure 9 The S940 can correspond to Figure 8 The S830, and Figure 9 The S950 can correspond to Figure 8 The S840. Therefore, regarding the above... Figures 3 to 8 Descriptions of specific content that are repeated in the description will be omitted or simplified.
[0188] In the example, as described above, the difference between a specific operation according to the inverse RST and a specific operation according to RST may be only that their operation order is reversed, and the specific operation of RST according to the inverse operation may be substantially similar to the specific operation according to RST. Therefore, those skilled in the art will readily understand that the descriptions of S900 to S980 of the inverse RST described below can be applied to RST in the same or similar manner.
[0189] According to the example decoding device 300, dequantization can be performed on the quantization coefficients of the target block (S900).
[0190] According to the example, the decoding device 300 can determine whether NSST has been performed on the residual sample of the target block in the encoding device 200 (S910), and when it is determined that NSST has been performed, the decoding device can parse the NSST index (or decode it from the bit stream) (S920).
[0191] According to the example, the decoding device 300 can determine whether the NSST index is greater than 0 (S930), and when it is determined that the NSST index is greater than 0, the decoding device can select a transform kernel based on at least one of the information about the NSST index, the width and height of the region to which the NSST is applied, the intra-frame prediction mode, and the color components of the target block (S940).
[0192] The example decoding device 300 can determine whether it corresponds to the condition for performing an inverse RST on the transform coefficients of the target block (S950).
[0193] If it is determined in S950 that the conditions for performing inverse RST are not met, the example decoding device 300 may perform a non-inverse RST-based (regular) inverse NSST (S960) on the transform coefficients of the target block.
[0194] When it is determined in S950 that the conditions for performing inverse RST are met, the example decoding device 300 can perform inverse NSST based on inverse RST on the transform coefficients of the target block (S970).
[0195] If it is determined in S910 that the encoding device 200 has not performed NSST on the residual sample of the target block, the decoding device 300 according to the example may omit the operations according to S920 to S970.
[0196] When it is determined in S930 that the NSST index is not greater than 0, the operations according to S940 to S970 can be omitted according to the example decoding device 300.
[0197] According to the example decoding device 300, an inverse first transformation (S980) can be performed on the modified transform coefficients of the target block derived by applying the inverse NSST. When performing the inverse first transformation on the modified transform coefficients, the residual samples of the target block can be derived.
[0198] Figure 10 This is a diagram illustrating an example of applying RST according to this disclosure.
[0199] Reference Figure 8 As mentioned above, the area in the target block where RST is applied is not limited to a square area, and RST can also be applied to a rectangular area or a non-rectangular area.
[0200] Figure 10An example of applying RST to a non-rectangular region in a target block 1000 of size 16×16 is shown. Figure 10 The ten shaded blocks 1010 indicate the areas within target block 1000 where RST has been applied. Since the size of each minimum unit block is 4×4, therefore... Figure 10 In the example, RST is applied to ten 4×4 pixels (that is, RST is applied to 160 pixels). When R=16, the size of the reduced transformation matrix can be 16×160.
[0201] On the other hand, those skilled in the art will readily understand that Figure 10 The arrangement of the minimum unit blocks 1010 included in the region where RST is applied, as shown, is just one of many examples. For instance, the minimum unit blocks included in the region where RST is applied may not be adjacent to each other and may be in a relationship where they share only one vertex with each other.
[0202] The following sections will describe the design of an RST or inverse RST that can be applied to a 4×4 block, the arrangement and scan order of the transform coefficients generated after applying the 4×4 RST, and the index encoding method for specifying the 4×4 RST to be applied to the transform block.
[0203] More specifically, the following presents an example of an RST design method applicable to 4×4 blocks, a configuration of regions for which 4×4 RST is applied, a method for arranging transform coefficients generated after the application of 4×4 RST, a scan order of the arranged transform coefficients, and a method for sorting and incrementing transform coefficients generated for each target block.
[0204] Furthermore, as an example of the encoding method for specifying the index of the 4×4 RST applied to the target block according to this disclosure, a method is proposed to determine whether there are non-zero transform coefficients in the disallowed region when applying the 4×4 RST and to conditionally encode the corresponding index, or to omit the relevant residual encoding for the disallowed position after encoding the last non-zero transform coefficient position and then conditionally encode the corresponding index.
[0205] Furthermore, in the following sections, based on examples of this disclosure, a method is proposed for applying different index codes and residual codes to luminance and chrominance when applying 4×4 RST.
[0206] This method significantly reduces computation when encoding still images or videos compared to other inseparable quadratic transformations using 4×4 RST. Furthermore, based on the fact that there are no effective transform coefficients in specific regions when applying 4×4 RST, conditional coding of the specified 4×4 RST index can be performed, and related residual coding can be optimized, ultimately improving coding performance.
[0207] According to the examples of this disclosure, an inseparable transform or RST that can be applied to a 4×4 transform block (i.e., a 4×4 target block to be transformed) is a 16×16 transform. That is, if the data elements constituting the 4×4 target block are arranged in row-major or column-major order, they become 16×1 vectors, and the inseparable transform or RST can be applied to the target block. A forward 16×16 transform, that is, a forward transform that can be performed in an encoding device, consists of sixteen row-direction transform basis vectors, and the transform coefficients of the corresponding transform basis vectors are obtained by taking the inner product of the 16×1 vector and each transform basis vector. The process of obtaining the corresponding transform coefficients of the sixteen transform basis vectors is the same as multiplying the 16×16 inseparable transform or RST matrix with the input 16×1 vector. The transform coefficients obtained by matrix multiplication have a 16×1 vector form, and the statistical properties of each transform coefficient may be different. For example, if the 16×1 transform coefficient vector is constructed using elements 0 to 15, the variance of the 0th element may be greater than the variance of the 15th element. In other words, due to a larger variance, an element located before another element can have a larger energy value.
[0208] Furthermore, if a 16×16 inseparable transform or inverse RST is applied (when the effects of quantization or integer computation are ignored), the original 4×4 target block signal before the transform can be reconstructed from the 16×1 transform coefficients. If the forward 16×16 inseparable transform is an orthogonal transform, the inverse 16×16 transform can be obtained by matrix transposing the forward 16×16 transform. A simple multiplication of the 16×16 inseparable inverse transform matrix with the 16×1 transform coefficient vector yields the data in 16×1 vector form, which can be reconstructed by arranging the data in row-major or column-major order in which it was first applied.
[0209] Furthermore, as mentioned above, the elements constituting the 16×1 transform coefficient vector can have different statistical properties. Similar to the previous example, if the transform coefficients closer to the front (closer to the 0th element) have greater energy, then even if the inverse transform is applied to some of the first-appearing transform coefficients, without using all transform coefficients, a signal very close to the original signal can be reconstructed. For example, if a 16×16 inseparable inverse transform is constructed from 16 column basis vectors, then by retaining only L column basis vectors and only the L more important transform coefficients (L×1 vectors, appearing first as in the previous example), multiplying the 16×L matrix with the L×1 vector after constructing the 16×L matrix, a 16×1 vector with minimal error can be reconstructed from the original input 16×1 vector data. As a result, since only L coefficients are involved in data recovery, even when obtaining the transform coefficients, obtaining an L×1 transform coefficient vector instead of a 16×1 transform coefficient vector is sufficient. In other words, by selecting L corresponding row direction transformation vectors from the positive 16×16 inseparable transformation matrix, an L×16 transformation matrix is constructed, and then multiplied by the 16×1 input vector, thus obtaining L effective transformation coefficients.
[0210] At this point, although the range of L values is 1≤L<16, and generally, L transform basis vectors can be selected from 16 transform basis vectors by any method, from the perspective of encoding and decoding, it may be advantageous to select transform basis vectors that are highly important in terms of signal energy in terms of encoding efficiency, as shown in the example above.
[0211] As mentioned above, a 4×4 RST can be used as a quadratic transformation, and in this case, its quadratic application can be applied to a block that has already undergone a primary transformation such as DCT type 2. When assuming the size of the block to which the primary transformation is applied is N×N, typically, a 4×4 RST can be applied when N×N is greater than or equal to 4×4. An example of applying a 4×4 RST to an N×N block is as follows.
[0212] 1) A 4×4 RST can be applied only to certain regions in an N×N array, rather than all regions. For example, it can be applied only to the top-left M×M region (M≤N).
[0213] 2) After dividing the region to which the quadratic transformation is applied into 4×4 blocks, a 4×4 RST can be applied to each of the divided blocks.
[0214] 3) The above steps 1) and 2) can be combined and applied. For example, after dividing the upper left M×M region into 4×4 blocks, a 4×4 RST can be applied to the divided region.
[0215] As a specific example, the quadratic transformation can be applied only to the top left 8×8 region. When the N×N block is greater than or equal to 8×8, the 8×8 RST can be applied. When the N×N block is smaller than 8×8 (4×4, 8×4, 4×8), it can be divided into 4×4 blocks as in 2) above, and then the 4×4 RST can be applied.
[0216] Suppose that L transform coefficients (1 ≤ L < 16) are generated after applying a 4×4 RST. There are degrees of freedom in how to arrange these L transform coefficients (i.e., how to map them to the target block). However, since there is a predetermined order when reading and processing the transform coefficients in the residual coding section, the coding performance can vary depending on how the L transform coefficients are arranged in the two-dimensional block. Residual coding in HEVC starts from the position farthest from the DC position. This improves coding performance by taking advantage of the fact that as the distance from the DC position increases, the quantization coefficient value is 0 or close to 0. Therefore, for L transform coefficients, in terms of coding performance, it can be advantageous to arrange the more important coefficients with higher energy for later coding in the residual coding order.
[0217] Figure 11 This indicates the three forward scan sequences of transform coefficient blocks (4×4 blocks, coefficient groups (CG)) or 4×4 transform coefficients that can be applied in the HEVC standard. Figure 11 (a) indicates a diagonal scan; Figure 11 (b) represents a horizontal scan, and Figure 11 (c) indicates vertical scanning.
[0218] Residual coding follows Figure 11 The scanning order is the reverse, that is, encoding is performed in the order of 16 to 1. This is because the selection is based on the intra-frame prediction mode. Figure 11 The three scan orders are such that the scan order for the L transform coefficients can be determined in the same way according to the intra-frame prediction mode.
[0219] Figure 12 and Figure 13 This is a diagram illustrating the mapping of transformation coefficients according to the diagonal scanning order, as exemplified by an example of this disclosure.
[0220] Assuming the same applies Figure 11 The diagonal scan order is used, and the upper left 4×8 block is divided into 4×4 blocks, and 4×4RST is applied to each divided block. When L is 8 (that is, if 8 of the 16 transform coefficients are left), the transform coefficients can be as follows: Figure 12The locations are shown, and the transform coefficients can be mapped to half of the region of each 4×4 block, while the positions marked with X can be filled with values that default to 0. That is, based on the forward diagonal scan order, the quadratic transform coefficients obtained from the quadratic transform performed based on RST are mapped to the 4×4 target blocks.
[0221] As mentioned above, the difference between specific operations based on inverse RST and specific operations based on RST may only be that their operation order is reversed, and specific operations based on inverse RST can be substantially similar to specific operations based on RST. Therefore, when performing inverse RST, the operation can be performed using the transform kernel (Equation 8) by reading the quadratic transform coefficients mapped in the target block according to the diagonal scan order.
[0222] Therefore, according to Figure 11 The scan sequence shown arranges or maps L transform coefficients to each 4×4 block and maps 0 to the remaining (16-L) positions of each 4×4 block, after which the corresponding residual coding (e.g., residual coding in regular HEVC) can be applied.
[0223] Furthermore, according to another example of this disclosure, such as Figure 13 As shown, L transform coefficients (a) arranged in two 4×4 blocks can be combined and mapped into one 4×4 block (b). Specifically, when L is 8, the transform coefficients of the two 4×4 blocks are mapped into one 4×4 block, and one 4×4 target block is completely filled, leaving no transform coefficients remaining in the other 4×4 block. Therefore, since most residual coding is unnecessary for the empty 4×4 block, the corresponding coded_sub_block_flag can be encoded as 0 in the case of HEVC. The coded_sub_block_flag applied to HEVC and VVC is a flag indicating the position of the sub-blocks in the current transform block for the 4×4 array of 16 transform coefficient levels, and can be signaled as "0" for 4×4 blocks with no remaining residuals.
[0224] Furthermore, various methods are possible for mixing the transform coefficients of two 4×4 blocks. Typically, they can be combined in any order, but practical examples may include the following methods.
[0225] (1) Alternately mix the transform coefficients of two 4×4 blocks in scan order. That is, when used for Figure 12 The transformation coefficient 12 of the upper block is And the transformation coefficients of the lower block are At this time, the coefficients can be mixed alternately in the following manner: Of course, it can be changed. and The order of mapping makes the first mapping
[0226] (2) The transformation coefficients for the first 4×4 block can be placed first, and then the transformation coefficients for the second 4×4 block can be placed. That is, they can be arranged consecutively as follows: Of course, the order can be changed as follows:
[0227] The following sections describe methods for encoding NSST indexes for 4×4 RST. The first method involves encoding the NSST index after residual encoding, and the second method involves encoding the NSST index before residual encoding.
[0228] In addition, such as Figure 12 As shown, when applying 4×4 RST, the fill value can be 0 from the (L+1)th to the 16th position, depending on the scan order of the transform coefficients of each 4×4 block. Therefore, if a non-zero value appears even between the (L+1)th and 16th positions in one of the two 4×4 blocks, it corresponds to the case where 4×4 RST is not applied.
[0229] If a 4×4 RST has a structure that selects and applies one of the prepared transform sets (e.g., NSST), it can signal the index on which the transform is applied (which may be called the transform index, RST index, or NSST index).
[0230] Assuming the NSST index is known from the bitstream parsed in the decoding device, this parsing process is performed after residual coding. If residual coding is performed and at least one non-zero transform coefficient is found between the (L+1)th and 16th bits, it is determined, as described above, that 4×4 RST is not applied, and therefore the NSST index can be set not to be parsed. Thus, in this case, the NSST index is selectively parsed only when necessary, thereby increasing signaling efficiency.
[0231] For example, as in Figure 12In this context, if a 4×4 RST is applied to several 4×4 blocks within a specific region (all identical 4×4 RSTs can be applied to all, or different 4×4 RSTs can be applied), then the (identical or different) 4×4 RSTs applied to all 4×4 blocks can be specified by an NSST index. Since the 4×4 RSTs used for all 4×4 blocks and whether they are applied are determined by an NSST index, it can be configured such that: by checking during residual coding for the presence of non-zero transform coefficients at positions L+1 to 16 of all 4×4 blocks, the NSST index is not encoded if non-zero transform coefficients exist even at disallowed positions (positions L+1 to 16) within a single 4×4 block.
[0232] These NSST indices can be signaled individually for the Luminance and Chrominance blocks, or in the case of the Chrominance block, individual NSST indices can be signaled for Cb and Cr, or a single NSST index can be shared by signaling the NSST index only once.
[0233] If a shared NSST index is used for Cb and Cr, a 4×4 RST indicated by the same NSST index can be applied (the 4×4 RSTs for Cb and Cr can be the same, or separate 4×4 RSTs can be applied even if the NSST indexes are the same). To apply the above conditional signaling to the shared NSST index, check all 4×4 blocks for Cb and Cr from the (L+1)th to the 16th block for the presence of non-zero transform coefficients, and if any non-zero transform coefficients are found, it can be configured to omit the signaling for the NSST index.
[0234] As another example, also in... Figure 13 In the case of combining the transform coefficients of two 4×4 blocks, after checking whether a non-zero transform coefficient appears at a location where no valid transform coefficient exists when applying a 4×4 RST, it can be determined whether to signal the NSST index. Specifically, in cases such as... Figure 13 The value of L in the code is 8 and it refers to a 4×4 block when applying a 4×4 RST. Figure 13 In the case that the block indicated by Xs in (b) does not have valid transformation coefficients, it can be set such that if the value of the coded_sub_block_flag of the block is 1 after checking the block in the absence of valid transformation coefficients, the NSST index is not signaled.
[0235] The optimization method for signaling for the NSST index will be described later, in the case that the encoding for the NSST index is performed before the residual encoding according to the second method used for encoding the NSST index.
[0236] If the encoding for the NSST index is performed before the residual encoding, it is predetermined whether 4×4 RST will be applied, so the residual encoding can be omitted for positions where the transform coefficients will definitely be filled with zeros.
[0237] In this regard, the NSST index value can be signaled to know whether 4×4 RST is applied (e.g., if the NSST index is 0, then 4×4 RST is not applied), or it can be signaled via a separate syntax element. For example, if the separate syntax element is the NSST flag, the NSST flag is first parsed to determine whether 4×4 RST is applied. Then, if the NSST flag value is 1, residual coding can be omitted for positions where no valid transform coefficients can exist.
[0238] In the case of HEVC, when performing residual coding, the last non-zero coefficient position on the TU is encoded first. If the NSST index is encoded after the last non-zero coefficient position and assuming a 4×4 RST is applied while the last non-zero coefficient position is marked as a position where non-zero coefficients cannot occur, then the NSST index may not be encoded and the 4×4 RST may not be applied. For example, in Figure 12 When the position is indicated by Xs, since there are no valid transform coefficients when applying 4×4 RST (e.g., values that can be filled with 0s), the encoding for the NSST index can be omitted if the last non-zero coefficient is in the region indicated by X. If the last non-zero coefficient is not in the region indicated by X, the encoding for the NSST index can be performed.
[0239] If it is known whether 4×4 RST is applied by conditionally encoding the NSST index after encoding the last non-zero coefficient position (as mentioned above, if the last non-zero coefficient position is not allowed when assuming the application of 4×4 RST, the encoding for the NSST index can be omitted), the remaining residual encoding portion can be processed in the following two ways.
[0240] (1) Without applying 4×4 RST, the general residual coding can be preserved as is. That is, coding is performed under the assumption that non-zero transform coefficients can exist at any position from the last non-zero coefficient position to the DC position.
[0241] (2) When applying 4×4 RST, for a specific location or a specific 4×4 block (e.g., Figure 12 The X position in the code should not have corresponding transform coefficients (which can be padded with zeros by default), allowing the residual coding of the corresponding position or block to be omitted. For example, when reaching... Figure 12When the position indicated by X in the code is specified, the encoding of sig_coeff_flag (a flag indicating whether a non-zero coefficient exists at the corresponding position applied to HEVC and VVC) can be omitted, and when... Figure 13 When combining the transformation coefficients of the two blocks as shown, the encoding of the coded_sub_block_flag (which exists in HEVC) for the 4×4 block that is cleared to 0 can be omitted, and the corresponding value can be derived to 0. The 4×4 block can be filled with zero values without being encoded separately.
[0242] On the other hand, when encoding the NSST index after encoding the last non-zero coefficient position, if the x-position (Px) and y-position (Py) of the last non-zero coefficient are less than Tx and Ty (a specific threshold), respectively, it can be configured to omit the NSST index encoding and not apply the 4×4 RST. For example, when Tx = 1 and Ty = 1, this means omitting the NSST index encoding when the last non-zero coefficient is present in the DC position. The method of determining whether to encode the NSST index by comparing it with a threshold can be applied differently to luma and chroma. For example, different Tx and Ty can be applied to luma and chroma, or the threshold can be applied to luma (or chroma) but not to chroma (or luma).
[0243] Of course, two methods for omitting the NSST index encoding (omitting the NSST index encoding when the last non-zero coefficient is located in a region where no valid transform coefficients exist, and omitting the NSST index encoding when the X and Y coordinates of the last non-zero coefficient are each less than a certain threshold) can be applied. For example, after first performing a threshold check on the position coordinates of the last non-zero coefficient, it is possible to check whether the last non-zero coefficient is located in a region where no valid transform coefficients exist, and the reverse order is also possible.
[0244] The method of encoding the NSST index before residual encoding can be applied to 8×8 RST. That is, if the last non-zero coefficient is located in the top-left 8×8 region (excluding the top-left 4×4 region), the encoding for the NSST index can be omitted; otherwise, the encoding for the NSST index can be performed. Additionally, if both the X and Y coordinate values for the position of the last non-zero coefficient are less than a certain threshold, the encoding for the NSST index can be omitted. Of course, both methods can be applied together.
[0245] Furthermore, when applying RST, different NSST index encoding and residual encoding schemes can be applied to luminance and chrominance respectively.
[0246] The first method (Method 1) which performs NSST index encoding after residual encoding and the method (Method 2) which performs NSST index encoding before residual encoding can be applied differently to luminance and chrominance.
[0247] For example, luminance can follow the scheme described in Method 2, while Method 1 can be applied to chrominance. Alternatively, NSST index coding can be conditionally applied to luminance according to Method 1 or Method 2, and conditional NSST index coding can be excluded from chrominance, and vice versa. That is, NSST index coding can be conditionally applied to chrominance according to Method 1 or Method 2, and conditional NSST index coding can be excluded from luminance.
[0248] In the following, examples of this disclosure will provide a Hybrid NSST Transform Set (MNTS) for applying various NSST conditions during the application of NSST or RST, and a method for constructing the MNTS. In the following, depending on the size of the transform block to which NSST is applied, a 16×16 transform applied to the upper left 4×4 region can be represented as a 4×4 NSST, and a 64×64 transform applied to the upper left 8×8 region can be represented as an 8×8 NSST.
[0249] As described above, in the case of an inseparable transformation, based on the pre-selected size of the lower block, only 4×4 cores (4×4 NSST) are included in the case of a 4×4 NSST set, and only 8×8 cores (8×8 NSST) are included in the case of an 8×8 NSST set. Therefore, in this example, a method for constructing MNTS is additionally proposed as follows.
[0250] (1) The size of the available NSST cores may not be fixed, but may vary depending on the NSST set, resulting in one or more (e.g., 4×4 NSST cores (4×4 NSST) and 8×8 NSST cores (8×8 NSST) may be used together).
[0251] (2) The number of available NSST cores is not fixed, but varies depending on the NSST set (e.g., set 1 supports 3 cores, set 2 supports 4 cores).
[0252] (3) The order of NSST kernels is not fixed, but varies depending on the NSST set.
[0253] (For example, in set 1, NSST kernels 1, 2, and 3 are mapped to NSST indices 1, 2, and 3, respectively, while in set 2, NSST kernels 3, 2, and 1 are mapped to NSST indices 1, 2, and 3.)
[0254] A more detailed description of an example of a method for constructing an MNTS is provided below.
[0255] As an example, when determining the priority of NSST cores available in a given set, the size of the NSST cores (4×4 NSST vs. 8×8 NSST) and their relationships can be considered. For instance, if the transform block is large, an 8×8 NSST core may be more important than a 4×4 NSST core, and therefore a lower NSST index can be assigned to an 8×8 NSST core.
[0256] As another example, the priority of NSST cores available in a given set can be determined by considering the order of the NSST cores (first, second, third). For example, a given 4×4 NSST core #1 can have a higher priority than a 4×4 NSST core #2.
[0257] Since NSST_index is encoded and sent, it is desirable to give priority to frequently occurring NSST cores, that is, to encode them with fewer bits to have a low index.
[0258] Examples of the MNTS mentioned above can be shown in Table 3 or Table 4.
[0259] [Table 3]
[0260]
[0261] [Table 4]
[0262]
[0263]
[0264] In the following, based on examples of this disclosure, a method for determining a quadratic NSST set is proposed, taking into account intra-frame prediction mode and block size.
[0265] In the example, by combining the set of the current transform block based on the MNTS intra-prediction mode described above, transform sets configured with transform kernels of various sizes can be applied to the transform block.
[0266] [Table 5]
[0267]
[0268] As shown in Table 5, a 0 or 1 hybrid type is mapped to each intra-prediction mode. The hybrid type can be defined as an index (“hybrid type”) indicating whether the regular NSST set construction method or another NSST set construction method is followed for each intra-prediction mode.
[0269] More specifically, when its hybrid type is mapped to the intra-prediction mode of '1' in Table 5, the transform set can be constructed according to the above MNTS without following the conventional (JEM)NSST set construction method.
[0270] As yet another example, although Table 5 illustrates two transform set construction methods based on mixed type information (tags) associated with intra-frame prediction modes (1: conventional NSST set construction, 2: MNTS-based transform set construction), the proposed MNTS-based set construction method can be one or more, and in this case, the mixed type information can be represented by N (N>2) various values.
[0271] As another example, when constructing a transform set, both the intra-prediction mode and the size of the corresponding transform block are considered, and based on this, it can be determined whether to construct a hybrid type or use a regular NSST set. For example, if the mode type corresponding to the intra-prediction mode is 0, the regular NSST set configuration method can be followed unconditionally; otherwise (mode type == 1), various hybrid type NSST sets can be determined based on the size of the corresponding transform block.
[0272] Figure 14 This is a diagram illustrating a method for selecting a transformation set under specific conditions, according to an example of this disclosure.
[0273] As shown in the figure, when performing a second inverse transform after dequantizing the coefficients, the transform set is selected. At this time, the block size and intra-prediction mode can be considered when selecting the transform set, and it can be considered whether it is a regular NSST set or a transform set based on MNTS (multiple hybrid types 1, 2, 3...).
[0274] When the transform set is determined in this way, the corresponding NSST kernel can be selected through the NSST index information.
[0275] According to another example of this disclosure, the fixed NSST kernel mapping shown in Table 6 below can be used for both 4×4 NSST and 8×8 NSST.
[0276] [Table 6]
[0277]
[0278] In other words, 4×4 inseparable transformations (4×4 quadratic transformation, 4×4 RST, 4×4 inverse RST) and 8×8 inseparable transformations (8×8 quadratic transformation, 8×8 RST, 8×8 inverse RST) can be performed using the same transformation set instead of different transformation sets.
[0279] In the following section, considering the intra-frame prediction mode and block size when constructing the transform set, a method for efficiently encoding variations in the statistical distribution of the encoded and transmitted NSST index values is proposed. This is achieved by selecting the kernel to be applied substantially to the transform block using a syntax indicating the kernel size proposed above.
[0280] In this example, since the number of available NSST kernels varies for each transform set, a truncated univariate binarization method based on the maximum available NSST index value for each set is proposed for efficient binarization, as shown in Table 7.
[0281] [Table 7]
[0282]
[0283] Table 7 shows the (truncated unary) binarization method for NSST index values, and since the number of NSST kernels available for each transform set is different, the binarization method for the NSST index is performed based on the largest NSST index value.
[0284] In Table 7, each binary value is context-coded, and in this case, context-modeling values can be formed by taking into account variables such as the size of the corresponding transform block, the intra-prediction mode, the mixed type value, and the maximum NSST index value of the corresponding transform set.
[0285] Furthermore, according to another example of this disclosure, unlike Table 2, five or more intra-prediction modes can be mapped to a single transform set. As described above, inverse RST is performed based on a transform kernel matrix selected from a transform set comprising multiple transform kernel matrices, and the transform set is determined based on the mapping relationship according to the intra-prediction modes applied to the target block. According to this example, as shown in Tables 8 to 10 below, multiple intra-prediction modes comprising the intra-prediction modes of the target block can be mapped to a single transform set. That is, since intra-prediction modes can be mapped to a transform set of a set of transform matrices, and the number of transform sets is less than the number of intra-prediction modes, multiple intra-prediction modes can be mapped to a single transform set.
[0286] In other words, when the target block includes a first block and a second block, and the first intra-prediction mode applied to the first block and the second intra-prediction mode applied to the second block are different from each other, the transform sets mapped to the first intra-prediction mode and the second intra-prediction mode can be the same.
[0287] The number of intra-prediction modes mapped to a transform set can be at least one, and five or more intra-prediction modes can be mapped to a transform set.
[0288] [Table 8]
[0289]
[0290] [Table 9]
[0291]
[0292] [Table 10]
[0293]
[0294] The NSST set index used in Tables 8 to 10 can refer to any of the 35 transform sets shown in Table 2, and the number of transform sets in Table 8 is 19 out of 35, the number of transform sets in Table 9 is 13 out of 35, and the number of transform sets in Table 10 is 6 out of 35.
[0295] This means that if there is similarity in the prediction direction, the same transform set can be applied, just as in adjacent intra-prediction modes. Therefore, in the case of Table 8, two to three adjacent intra-prediction modes are mapped to the same transform set. As an example, in Table 8, Figure 5 Intra-prediction modes 33 to 35 are mapped to the same transform set.
[0296] In the case of Table 9, intra-prediction modes 46 to 48 are mapped to a transform set 20, and intra-prediction modes 29 to 39 are mapped alternately to transform set 29 and transform set 10.
[0297] With the application of the minimum number of transform sets in Table 10, intra-prediction modes 28 to 40, i.e., 13 intra-prediction modes, are mapped to a transform set 32.
[0298] The transform sets in Tables 8 through 10 can be applied only to 4×4 NSST, or they can be applied to both 4×4 NSST and 8×8 NSST. Alternatively, different transform set mappings can be applied to each of 4×4 NSST and 8×8 NSST (i.e., Tables 8 through 10 are applied differently). For example, the transform set mapping in Table 2 can be applied to 4×4 NSST, and the transform set mappings in Tables 8 through 10 can be applied to 8×8 NSST.
[0299] If the case of applying 4×4 NSST using Table 2 (using 16×16 direct matrix quadratic transformation) is called TESTA (Test A); the case of applying 4×4 NSST using Table 2 and the case of applying 8×8 NSST using Table 2 and 16×64 direct matrix quadratic transformation are called TEST B (Test B); and the case of applying 4×4 NSST using Tables 8 to 10 and the case of applying 8×8 NSST using Tables 8 to 10 and 16×64 direct matrix quadratic transformation for storage reduction is called TEST C (Test C), then the storage requirements are shown in Table 11 below.
[0300] [Table 11]
[0301]
[0302]
[0303] If each transform set for both planar and DC modes consists of two transform cores, the number in the "#type" column of Table 11 should be reduced by 2. For example, the "#type" for Test B (6 transform sets) with the proposed storage reduction cores will be 16. As shown in Table 11, the total number of transform cores decreases significantly as the number of transform sets to be mapped decreases. Therefore, storage requirements can be reduced by making a reasonable trade-off between performance and complexity. In Table 11, the transform set mappings in Table 2 can be used for HyGT 4×4 (BMS), Test A (4×4 NSST), and Test B (4×4 NSST + 8×8 RST).
[0304] Figure 15 This is a flowchart illustrating the operation of a video decoding device according to an example of this disclosure.
[0305] Figure 15 Each step disclosed is by Figure 3 The publicly available decoding device 300 is used to perform this. More specifically, S1510 can be performed by... Figure 3 The publicly available entropy decoder 310 is used for execution; S1520 can be executed by... Figure 3 The publicly available dequantizer 321 is used for execution; S1530 and S1540 can be executed by... Figure 3 The publicly disclosed inverse converter 322 is used to perform this; and S1550 can be executed by... Figure 3 The publicly disclosed adder 340 is used for execution. Furthermore, the operations according to S1510 to S1550 are based on... Figures 6 to 10 Some of the content described above. Therefore, regarding the above... Figure 3 , Figures 6 to 10 Descriptions of specific content that are repeated will be omitted or summarized.
[0306] According to the example, the decoding device 300 can derive the quantization transform coefficients of the target block from the bitstream (S1510). More specifically, the decoding device 300 can decode information about the quantization transform coefficients of the target block from the bitstream, and can derive the quantization transform coefficients of the target block based on the information about the quantization transform coefficients of the target block. The information about the quantization transform coefficients of the target block can be included in the sequence parameter set (SPS) or the stripe header, and can include information about whether a reduction transform (RST) is applied, information about the reduction factor, information about the minimum transform size applied with the reduction transform, information about the maximum transform size applied with the reduction transform, and information about the size of the inverse reduction transform applied.
[0307] More specifically, information about whether a reduction transformation has been applied can be indicated by an availability flag; information about the reduction factor can be indicated by a reduction factor value; information about the minimum transformation size for which the inverse reduction transformation has been applied (i.e., the minimum allowable transformation kernel size when performing the inverse transformation) can be indicated by a minimum transformation size value; information about the maximum transformation size for which the inverse reduction transformation has been applied (i.e., the maximum transformation kernel size applied when performing the inverse transformation) can be indicated by a maximum transformation size value; and information about the size of the inverse reduction transformation for which the inverse transformation has been substantially applied (i.e., the size of the transformation kernel) can be indicated by an inverse reduction transformation size value. In this case, the availability flag can be signaled via a first syntax element; the reduction factor value can be signaled via a second syntax element; the minimum transformation size value can be signaled via a third syntax element; the maximum transformation size value can be signaled via a fourth syntax element; and the inverse reduction transformation size value can be signaled via a fifth syntax element.
[0308] In the example, the first syntax element can be represented by the syntax element Reduced_transform_enabled_flag. When a reduction transformation is applied, the syntax element Reduced_transform_enabled_flag indicates 1, while when no reduction transformation is applied, the syntax element Reduced_transform_enabled_flag can indicate 0. When no signal is given to the syntax element Reduced_transform_enabled_flag, its value can be estimated as 0.
[0309] Additionally, the second syntax element can be represented as the syntax element Reduced_transform_factor. The syntax element Reduced_transform_factor can indicate the value of R / N, where N can be the square of the length of one side of the block to which the transformation is applied, or the total number of transformation coefficients in the block to which the transformation is applied. R can be a reduction factor less than N. However, examples are not limited to this; for instance, Reduced_transform_factor can indicate R instead of R / N. When considered from the perspective of the inverse reduction transformation matrix, R refers to the number of columns in the inverse reduction transformation matrix, and N refers to the number of rows in the inverse reduction transformation matrix. In this case, the number of columns in the inverse reduction transformation matrix should be less than the number of rows. R can be, for example, a value of 8, 16, or 32, but is not limited to this. When the syntax element Reduced_transform_factor is not signaled, the value of Reduced_transform_factor can be estimated as R / N (or R).
[0310] Furthermore, the third syntax element can be represented as the syntax element min_reduced_transform_size. When no signal is given to the syntax element min_reduced_transform_size, the value of min_reduced_transform_size can be estimated as 0.
[0311] Additionally, the fourth syntax element can be represented as the syntax element max_reduced_transform_size. When no signal is given to the syntax element max_reduced_transform_size, its value can be estimated as 0.
[0312] Additionally, the fifth syntax element can be represented as the syntax element `reduce_transform_size`. The size of the inverse reduction transformation, signaled simultaneously with the syntax element `reduce_transform_size`, can represent the size of the reduction transformation matrix, i.e., the transformation kernel, i.e., the size of the reduction transformation matrix shown in Equation 4 or 5, and can represent the dimension reduced for the reduction transformation, but is not limited thereto. When the syntax element `reduce_transform_size` is not signaled, the value of `reduce_transform_size` can be estimated as 0.
[0313] Table 12 below shows an example of how SPS includes information about the quantization transform coefficients that signal the target block.
[0314] [Table 12]
[0315]
[0316] According to the example, the decoding device 300 can derive the transform coefficients by performing dequantization on the quantization transform coefficients of the target block (S1520).
[0317] According to the example decoding device 300, the modified transform coefficients can be derived based on the inverse reduced quadratic transform (RST) of the transform coefficients (S1530).
[0318] In the example, the inverse reduction transformation can be performed based on the inverse reduction transformation matrix, and the inverse reduction transformation matrix can be a non-square matrix in which the number of columns is less than the number of rows.
[0319] In the example, S1530 may include: decoding the transform index; determining whether the conditions for applying the inverse reduction transform are met based on the transform index; selecting a transform kernel; and when the conditions for applying the inverse reduction transform are met, applying the inverse reduction transform to the transform coefficients based on the selected transform kernel and the reduction factor. In this case, the size of the inverse reduction transform matrix can be determined based on the reduction factor.
[0320] If the inverse reduction transform of S1530 is based on the inverse NSST, the modified transformation coefficients of the target block can be derived by performing the inverse reduction transform on the transformation coefficients of the target block.
[0321] According to the example, the decoding device 300 can derive the residual sample of the target block based on the inverse transform for the modified transform coefficients (S1540).
[0322] The decoding device 300 can perform an inverse first-order transform on the modified transform coefficients of the target block. In this case, the inverse reduction transform can be used as the inverse first-order transform, or a regular separable transform can be used.
[0323] According to the example, the decoding device 300 can generate a reconstructed sample based on the residual sample of the target block and the predicted sample of the target block (S1550).
[0324] Referring to S1530, it can be confirmed that the residual samples of the target block are derived based on the inverse reduction transformation of the transform coefficients for the target block. Considering the size of the inverse transform matrix, the size of a conventional inverse transform matrix is N×N, but the size of the inverse reduction transform matrix is reduced to N×R. Therefore, compared to performing the conventional transformation, the storage usage for performing the reduction transform can be reduced by an R / N ratio. Furthermore, compared to the number of multiplications (N×N) when using the conventional inverse transform matrix, using the inverse reduction transform matrix can reduce the number of multiplications (N×R) by an R / N ratio. Additionally, when applying the inverse reduction transform, only R transform coefficients need to be decoded. Therefore, compared to decoding N transform coefficients when applying the conventional inverse transform, the total number of transform coefficients for the target block can be reduced from N to R, thereby increasing decoding efficiency. In summary, according to S1530, the (inverse)transform efficiency and decoding efficiency of the decoding device 300 can be increased through the inverse reduction transform.
[0325] Figure 16 This is a flowchart illustrating the operation of a video encoding device according to an example of this disclosure.
[0326] Figure 16 Each step that is made public can be by Figure 2 The publicly available encoding device 200 is used to execute this. More specifically, S1610 can be performed by... Figure 2 The publicly available predictor 220 is used to perform this; S1620 can be executed by... Figure 2 The publicly disclosed subtractor 231 is used to perform this; S1630 and S1640 can be performed by... Figure 2 The publicly disclosed converter 232 is used to execute; S1650 can be performed by... Figure 2 The publicly disclosed quantizer 233 and entropy encoder 240 are used for execution. Furthermore, the operations according to S1610 to S1650 are based on the above. Figures 6 to 10 Some of the content described above. Therefore, regarding the content described above... Figure 2 , Figures 6 to 10 Descriptions of specific content that are repeated in the description will be omitted or simplified.
[0327] According to the implementation method, the coding device 200 can derive prediction samples based on the intra-prediction mode applied to the target block (S1610).
[0328] The residual sample of the target block can be derived from the example encoding device 200 (S1620).
[0329] According to the example coding device 200, the transform coefficients of the target block can be derived based on a single transform for the residual samples (S1630). A single transform can be performed using multiple transform kernels, and in this case, the transform kernel can be selected based on the intra-frame prediction mode.
[0330] The decoding device 300 can perform NSST on the transform coefficients of the target block, and in this case, NSST can be performed based on the reduced transform or not based on the reduced transform. If NSST is performed based on the reduced transform, it can correspond to the operation according to S1640.
[0331] According to the example, the encoding device 200 can derive the modified transform coefficients of the target block based on the reduced transform (RST) for the transform coefficients (S1640). In the example, the reduced transform can be performed based on the reduced transform matrix, and the reduced transform matrix can be a non-square matrix in which the number of rows is less than the number of columns.
[0332] In the example, S1640 may include: determining whether the conditions for applying the reduction transformation are met; generating and encoding the transformation index based on the determination; selecting a transformation kernel; and when the conditions for applying the reduction transformation are met, applying the reduction transformation to the residual samples based on the selected transformation kernel and the reduction factor. In this case, the size of the reduction transformation matrix may be determined based on the reduction factor.
[0333] According to the example, the encoding device 200 can derive the quantized transform coefficients by performing quantization of the modified transform coefficients based on the target block, and encode information about the quantized transform coefficients (S1660).
[0334] More specifically, the encoding device 200 can generate information about the quantization transform coefficients and encode the generated information about the quantization transform coefficients. The information about the quantization transform coefficients may include residual information.
[0335] In the example, information about the quantization transform coefficients may include at least one of the following: information about whether a reduction transform is applied, information about the reduction factor, information about the minimum transform size for which a reduction transform is applied, and information about the maximum transform size for which a reduction transform is applied.
[0336] Referring to S1640, it can be confirmed that the transform coefficients of the target block are derived based on the reduction transform for the residual samples. Considering the size of the transform matrix, the size of a regular transform matrix is N×N, but the size of the reduction transform matrix is reduced to R×N. Therefore, compared to performing a regular transform, the storage utilization rate can be reduced by the R / N ratio when performing a reduction transform. Furthermore, compared to the number of multiplications (N×N) when using a regular transform matrix, using a reduction transform matrix can reduce the number of multiplications (R×N) by the R / N ratio. Additionally, when applying the reduction transform, only R transform coefficients need to be derived. Therefore, compared to the N transform coefficients derived when applying a regular transform, the total number of transform coefficients of the target block can be reduced from N to R, thereby reducing the amount of data sent from the encoding device 200 to the decoding device 300. In summary, according to S1640, the transformation efficiency and encoding efficiency of the encoding device 200 can be increased through the reduction transform.
[0337] In the above embodiments, the method is explained based on a flowchart using a series of steps or blocks. However, this disclosure is not limited to the order of the steps, and a step may be performed in a different order or sequence than described above, or a step may be performed concurrently with other steps. Furthermore, those skilled in the art will understand that the steps shown in the flowchart are not exclusive, and another step may be incorporated or one or more steps in the flowchart may be deleted without affecting the scope of this disclosure.
[0338] The methods described above according to this disclosure can be implemented in software form, and the encoding and / or decoding devices according to this disclosure can be included in devices for image processing such as televisions, computers, smartphones, set-top boxes, and display devices.
[0339] When the embodiments of this disclosure are implemented by software, the above methods can be implemented as modules (steps, functions, etc.) for performing the above functions. These modules can be stored in memory and can be executed by a processor. The memory can be internal or external to the processor and can be connected to the processor in various well-known ways. The processor may include application-specific integrated circuits (ASICs), other chipsets, logic circuits, and / or data processing devices. The memory may include read-only memory (ROM), random access memory (RAM), flash memory, memory cards, storage media, and / or other storage devices. That is, the embodiments described in this disclosure can be implemented and executed on a processor, microprocessor, controller, or chip. For example, the functional units shown in each figure can be implemented and executed on a computer, processor, microprocessor, controller, or chip.
[0340] Furthermore, the decoding and encoding devices using this disclosure can include multimedia broadcast transceivers, mobile communication terminals, home theater video devices, digital cinema video devices, surveillance cameras, video chat devices, real-time communication devices (such as video communication), mobile streaming devices, storage media, portable video cameras, video-on-demand (VoD) service providers, over-the-top (OTT) video devices, internet streaming service providers, three-dimensional (3D) video devices, video telephony devices, and medical video devices, and can be used to process video signals or data signals. For example, over-the-top (OTT) video devices can include game consoles, Blu-ray players, internet access TVs, home theater systems, smartphones, tablet PCs, digital video recorders (DVRs), etc.
[0341] Furthermore, the processing methods of this disclosure can be produced in the form of a computer-executable program and can be stored in a computer-readable recording medium. Multimedia data having the data structure according to this disclosure can also be stored in a computer-readable recording medium. Computer-readable recording media include various storage devices and distributed storage devices for storing computer-readable data. Computer-readable recording media can include, for example, Blu-ray discs (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical data storage devices. In addition, computer-readable recording media include media implemented in the form of a carrier wave (e.g., transmission over the Internet). Furthermore, bitstreams generated by encoding methods can be stored in computer-readable recording media or transmitted via wired or wireless communication networks. Additionally, embodiments of this disclosure can be implemented as computer program products by program code, and the program code can be executed on a computer according to embodiments of this disclosure. The program code can be stored on a computer-readable carrier.
[0342] Figure 17 An illustrative diagram of a content flow system architecture that applies this disclosure is shown.
[0343] Furthermore, the content streaming system using this disclosure can generally include an encoding server, a streaming server, a web server, a media storage device, a user device, and a multimedia input device.
[0344] An encoding server is used to compress content input from multimedia input devices such as smartphones, cameras, and camcorders into digital data to generate a bitstream, and then sends it to a streaming server. As another example, in cases where the multimedia input device, such as a smartphone, camera, or camcorder, directly generates the bitstream, the encoding server can be omitted. The bitstream can be generated by applying the encoding method or bitstream generation method disclosed herein. Furthermore, the streaming server can temporarily store the bitstream during the sending or receiving process.
[0345] The streaming server sends multimedia data to the user's device via a web server based on the user's request. The web server acts as a tool to notify the user of available services. When a user requests a desired service, the web server transmits the request to the streaming server, and the streaming server sends the multimedia data to the user. In this context, the content streaming system may include a separate control server, which in this case controls the commands / responses between the corresponding devices within the content streaming system.
[0346] A streaming server can receive content from media storage devices and / or encoding servers. For example, when receiving content from an encoding server, the content can be received in real time. In this case, to provide a smooth streaming service, the streaming server can store the bitstream for a predetermined period of time.
[0347] For example, user devices may include mobile phones, smartphones, laptop computers, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigators, board-type PCs, tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, smart glasses, head-mounted displays (HMDs)), digital TVs, desktop computers, digital signage, etc. The servers in the content streaming system can operate as distributed servers, and in this case, data received by each server can be processed in a distributed manner.
Claims
1. An image decoding method performed by a decoding device, the image decoding method comprising the following steps: Derive the quantization transform coefficients of the target block from the bitstream, which includes residual information; The transformation coefficients are derived by dequantizing the quantized transformation coefficients based on the target block; The modified transform coefficients are derived based on the inverse reduced quadratic transform RST for the transform coefficients. RST is called the low-frequency inseparable quadratic transform LFNST. The residual samples of the target block are derived based on the inverse first transformation for the modified transformation coefficients; as well as A reconstructed image is generated based on the residual samples and predicted samples of the target block, wherein the predicted samples are derived based on the intra-frame prediction mode of the target block. The inverse RST derives more output data than the input data by transforming the kernel matrix and performing matrix operations on the input data for the upper left region of the target block. The input data for the upper left region exists within the upper left 4×4 region of the target block, and the output data for the upper left region exists within either the upper left 4×4 region or the upper left 8×8 region of the target block. The inverse RST is performed based on the transformation kernel matrix, which is selected from a transformation set comprising multiple transformation kernel matrices based on a transformation index, wherein the transformation index indicates one of the multiple transformation kernel matrices. Specifically, the transform set is determined based on a mapping relationship according to the intra-prediction mode applied to the target block. Among them, ten or more directional intra-prediction modes, including the target block, are mapped to a transform set, and The transform set mapped to the directional intra-prediction mode at index 2 is equal to the transform set mapped to the directional intra-prediction mode at index 3.
2. The image decoding method according to claim 1, wherein, The modified transform coefficients are mapped into the target block based on the scan order determined according to the intra-prediction mode of the target block.
3. The image decoding method according to claim 1, further comprising the following steps: Obtain the RST enable flag from the sequence parameter set SPS in the bitstream.
4. The image decoding method according to claim 1, wherein, The matrix coefficients in the transformation kernel matrix are represented in 8-bit form.
5. The image decoding method according to claim 1, wherein, The inverse RST is performed based on the inverse 4×4 RST or the inverse 8×8 RST determined based on the width and height of the target block, and the inverse 4×4 RST or the inverse 8×8 RST is performed based on the same set of transformations.
6. An image encoding method performed by an image encoding device, the image encoding method comprising the following steps: Inferring prediction samples based on the intra-frame prediction mode of the target block; The residual sample of the target block is derived based on the predicted sample; The transformation coefficients of the target block are derived based on a first transformation applied to the residual sample; The modified transformation coefficients are derived based on the reduced quadratic transformation RST applied to the transformation coefficients. The quantized transform coefficients are derived by performing quantization based on the modified transform coefficients. as well as Generate residual information about the quantization transform coefficients. Specifically, the RST derives less output data than the input data by transforming the kernel matrix and performing matrix operations on the input data for the upper left region of the target block. Specifically, the input data for the upper left region exists in the upper left 4×4 region or the upper left 8×8 region of the target block, and the output data for the upper left region exists in the upper left 4×4 region of the target block. The RST is performed based on the transformation kernel matrix selected from a transformation set comprising multiple transformation kernel matrices. The transform index is encoded into the bitstream to indicate the selected transform kernel matrix. Specifically, the transform set is determined based on the intra-frame prediction mode of the target block according to a mapping relationship. Among them, ten or more directional intra-prediction modes, including the target block, are mapped to a transform set, and The transform set mapped to the directional intra-prediction mode at index 2 is equal to the transform set mapped to the directional intra-prediction mode at index 3.
7. The image encoding method according to claim 6, further comprising the following steps: The RST enable flag is encoded into the sequence parameter set SPS in the bitstream.
8. The image encoding method according to claim 6, wherein, The RST is performed based on a 4×4 RST or an 8×8 RST determined based on the width and height of the target block, and the 4×4 RST or the 8×8 RST is performed based on the same set of transformations.
9. A method for transmitting data for an image, the method comprising the following steps: The bitstream of the image is generated by performing the method according to any one of claims 6 to 8; as well as Send data including the bit stream.