Decoding device, encoding device, and data transmitting device
By using a palette-based coding technique and optimizing quantization escape values with minimum quantization parameters and sequence parameter sets, the problem of efficient compression of high-resolution image/video data is solved, improving coding efficiency and transmission efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- LG ELECTRONICS INC
- Filing Date
- 2020-08-26
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to efficiently compress and transmit high-resolution, high-quality image/video data, especially for virtual reality and immersive media, leading to increased transmission and storage costs.
A palette-based coding technique is adopted, which derives the quantization escape value by using the minimum quantization parameter information, limits the range of the quantization escape value, and sends a signal through the sequence parameter set to notify the palette size information, thereby improving coding efficiency.
It improves image/video compression efficiency, enhances the efficiency of palette mode encoding and the accuracy of escape encoding, and enables efficient information configuration and signal notification.
Smart Images

Figure CN116684583B_ABST
Abstract
Description
[0001] This application is a divisional application of the original invention patent application No. 202080073115.7 (International Application No.: PCT / KR2020 / 011383, Application Date: August 26, 2020, Invention Title: Image or Video Coding Based on Palette Pattern). Technical Field
[0002] This disclosure relates to video or image coding, and, for example, to a color palette pattern-based coding technique. Background Technology
[0003] Recently, there has been a growing demand for high-resolution, high-quality images / videos, such as 4K or 8K Ultra High Definition (UHD) images / videos, across various fields. As image / video resolution or quality increases, relatively more information or bits are transmitted compared to traditional image / video data. Therefore, if image / video data is transmitted via media such as existing wired / wireless broadband lines or stored in traditional storage media, the costs of transmission and storage can easily increase.
[0004] In addition, there is growing interest and demand for virtual reality (VR) and artificial reality (AR) content, as well as immersive media such as holograms; and the broadcasting of images / videos that exhibit characteristics different from actual images / videos (e.g., game images / videos) is also increasing.
[0005] Therefore, highly efficient image / video compression technology is needed to effectively compress and send, store, or play high-resolution, high-quality images / videos that exhibit the various characteristics described above.
[0006] Furthermore, a palette-pattern encoding technique for improving the encoding efficiency of screen content such as computer-generated videos containing large amounts of text and graphics is discussed. To apply this technique efficiently, methods for encoding and signaling relevant information are needed. Summary of the Invention
[0007] Technical Purpose
[0008] The purpose of this disclosure is to provide methods and apparatus for improving the efficiency of video / image coding.
[0009] Another object of this disclosure is to provide methods and apparatus for improving the efficiency of palette pattern encoding.
[0010] Another object of this disclosure is to provide methods and apparatus for efficiently configuring and signaling various types of information used in palette pattern encoding.
[0011] Another object of this disclosure is to provide a method and apparatus for efficiently applying escape codes in palette mode.
[0012] Technical solution
[0013] According to embodiments of this disclosure, the quantization parameters used in scaling the quantization escape values for palette mode can be derived based on the minimum quantization parameter information used for the change skip mode. The quantization parameters can have values equal to or greater than the minimum quantization parameter value used for the change skip mode.
[0014] According to embodiments of this disclosure, the range of quantization escape values in palette mode can be limited based on bit depth. For example, the range of quantization escape value information for the luminance component has 0 and (1 < 0) ranges. <BitDepth Y The values are between 0 and 1, and the range of quantization escape values for chrominance components can be from 0 to (1 < 1). <BitDepth C The value of )–1.
[0015] According to embodiments of this disclosure, palette size information about the maximum index of the palette table can be defined and signaled via a sequence parameter set (SPS).
[0016] According to embodiments of this disclosure, a video / image decoding method performed by a decoding device is provided. This video / image decoding method may include the methods disclosed in embodiments of this disclosure.
[0017] According to embodiments of this disclosure, a decoding apparatus is provided for performing video / image decoding. This decoding apparatus can perform the methods disclosed in embodiments of this disclosure.
[0018] According to embodiments of the present disclosure, a video / image encoding method performed by an encoding device is provided. This video / image encoding method may include the methods disclosed in embodiments of the present disclosure.
[0019] According to embodiments of the present disclosure, an encoding apparatus for performing video / image encoding is provided. This encoding apparatus can perform the methods disclosed in embodiments of the present disclosure.
[0020] According to embodiments of the present disclosure, a computer-readable digital storage medium is provided for storing encoded video / image information generated by a video / image encoding method disclosed in at least one embodiment of the present disclosure.
[0021] According to embodiments of the present disclosure, a computer-readable digital storage medium is provided that stores encoded information or encoded video / image information that enables a decoding device to perform at least one of the video / image decoding methods disclosed in the embodiments of the present disclosure.
[0022] Beneficial effects
[0023] This disclosure offers various advantages. For example, embodiments of this disclosure can improve overall image / video compression efficiency. Additionally, embodiments of this disclosure can improve the efficiency of palette pattern encoding. Furthermore, embodiments of this disclosure allow for efficient configuration and signaling of various types of information used in palette pattern encoding. Furthermore, embodiments of this disclosure can improve the accuracy of escape samples and encoding efficiency by efficiently applying escape coding in palette pattern encoding.
[0024] The effects that can be obtained through specific embodiments of this disclosure are not limited to those listed above. For example, there may be various technical effects that can be understood or deduced by those skilled in the art from this disclosure. Therefore, the specific effects of this disclosure are not limited to those explicitly described in this disclosure, but may include various effects that can be understood or deduced from the technical features of this disclosure. Attached Figure Description
[0025] Figure 1 An example of a video / image coding system applicable to embodiments of this disclosure is illustrated schematically.
[0026] Figure 2 This is a schematic illustration of the configuration of a video / image encoding apparatus to which embodiments of the present disclosure are applicable.
[0027] Figure 3 This is a schematic diagram illustrating the configuration of a video / image decoding device to which embodiments of the present disclosure are applicable.
[0028] Figure 4 An example of an illustrative video / image encoding process applicable to embodiments of this disclosure is shown.
[0029] Figure 5 An example of an illustrative video / image decoding process applicable to embodiments of this disclosure is shown.
[0030] Figure 6 An example of the basic structure used to describe palette coding is shown.
[0031] Figure 7 Examples are shown to describe horizontal and vertical traversal scanning methods for encoding palette index maps.
[0032] Figure 8 This is a diagram illustrating an example of a palette-based encoding method.
[0033] Figure 9 An example of a video / image encoding method according to an embodiment of the present disclosure is illustrated.
[0034] Figure 10An example of a video / image decoding method according to an embodiment of the present disclosure is illustrated.
[0035] Figure 11 An example of a content streaming system to which the embodiments disclosed in this disclosure are applicable is shown. Detailed Implementation
[0036] This disclosure may be modified in various forms, and specific embodiments thereof will be described and illustrated in the accompanying drawings. However, these embodiments are not intended to limit this disclosure. The terminology used in the following description is for the purpose of describing specific embodiments only and is not intended to limit this disclosure. Singular expressions include plural expressions, provided that different interpretations are clear. Terms such as “comprising” and “having” are intended to indicate the presence of the features, quantities, steps, operations, elements, components, or combinations thereof used in the following description, and therefore it should be understood that the possibility of having or adding one or more different features, quantities, steps, operations, elements, components, or combinations thereof is not excluded.
[0037] Furthermore, the various configurations described in the accompanying drawings are illustrated independently to illustrate functions that are distinct features from one another, and do not imply that the configurations are implemented using different hardware or different software. For example, two or more configurations may be combined to form one configuration, and a configuration may be divided into multiple configurations. Embodiments in which configurations are combined and / or separated are included within the scope of the claims without departing from the spirit of this document.
[0038] In this disclosure, the term "A or B" may mean "A only", "B only", or "both A and B". In other words, in this disclosure, the term "A or B" may be interpreted as indicating "A and / or B". For example, in this disclosure, the term "A, B or C" may mean "A only", "B only", "C only", or "any combination of A, B, and C".
[0039] The forward slash " / " or comma used in this disclosure can mean "and / or". For example, "A / B" can mean "A and / or B". Therefore, "A / B" can mean "A only", "B only", or "both A and B". For example, "A, B, C" can mean "A, B, or C".
[0040] In this disclosure, "at least one of A and B" may mean "only A", "only B" or "both A and B". Furthermore, in this disclosure, the expression "at least one of A or B" or "at least one of A and / or B" may be interpreted as the same as "at least one of A and B".
[0041] Additionally, in this disclosure, "at least one of A, B, and C" may mean "A only", "B only", "C only" or "any combination of A, B, and C". Furthermore, "at least one of A, B, or C" or "at least one of A, B, and / or C" may mean "at least one of A, B, and C".
[0042] Furthermore, the parentheses used in this disclosure may mean "for example". Specifically, in the case of expressing "prediction (intra-frame prediction)", it can be indicated that "intra-frame prediction" is proposed as an example of "prediction". In other words, the term "prediction" in this disclosure is not limited to "intra-frame prediction" and can be indicated that "intra-frame prediction" is proposed as an example of "prediction". In addition, even in the case of expressing "prediction (i.e., intra-frame prediction)", it can be indicated that "intra-frame prediction" is proposed as an example of "prediction".
[0043] This disclosure relates to video / image coding. For example, the methods / implementations disclosed in this disclosure can be applied to methods disclosed in Universal Video Coding (VVC). Additionally, the methods / implementations disclosed in this disclosure can be applied to methods disclosed in the Basic Video Coding (EVC) standard, the AOMedia Video 1 (AV1) standard, the second-generation Audio Video Coding Standard (AVS2), or next-generation video / image coding standards (e.g., H.267 or H.268).
[0044] This disclosure presents various implementations of video / image coding, and unless otherwise mentioned, these implementations can be combined with each other.
[0045] In this disclosure, video can refer to a collection of images over time. An image generally refers to a unit representing an image within a specific time period, and a slice / tile is a unit that constitutes part of an image during encoding. A slice / tile may include one or more Code Tree Units (CTUs). An image may consist of one or more slices / tiles. A tile is a rectangular area of CTUs within a specific tile column and a specific tile row in an image. A tile column is a rectangular area of CTUs with a height equal to the height of the image and a width specified by a syntax element in the image parameter set. A tile row is a rectangular area of CTUs with a height specified by a syntax element in the image parameter set and a width equal to the width of the image. A tile scan is a specific ordering of the CTUs in a segmented image, where CTUs are ordered consecutively by a CTU raster scan within a tile, and tiles within an image are ordered consecutively by a raster scan of the image's tiles. A slice comprises an integer number of consecutive complete CTU rows or integer number of complete tiles within a tile of an image that can be exclusively contained within a single NAL unit.
[0046] Furthermore, an image can be divided into two or more sub-images. A sub-image can be a rectangular region of one or more slices within the image.
[0047] A pixel or cell (pel) can refer to the smallest unit that makes up a picture (or image). Additionally, "sample" can be used as the term corresponding to a pixel. A sample can typically represent a pixel or pixel value, and can represent only the pixel / pixel value of the luminance component or only the pixel / pixel value of the chrominance component.
[0048] A unit can represent a basic unit of image processing. A unit may include a specific region of an image and at least one of the information associated with that region. A unit may include a luminance block and two chrominance (e.g., cb, cr) blocks. In some cases, the term "unit" may be used interchangeably with terms such as "block" or "region." In general, an M×N block may include a set (or array) of samples (or sample arrays) or transform coefficients in M columns and N rows. Alternatively, a sample may refer to a pixel value in the spatial domain, and when such a pixel value is transformed to the frequency domain, it may refer to a transform coefficient in the frequency domain.
[0049] Furthermore, in this disclosure, at least one of quantization / dequantization and / or transform / inverse transform can be omitted. When quantization / dequantization is omitted, the quantized transform coefficients can be referred to as transform coefficients. When transform / inverse transform is omitted, the transform coefficients can be referred to as coefficients or residual coefficients, or, for consistency of expression, they can still be referred to as transform coefficients.
[0050] In this disclosure, the quantized transform coefficients and the transformed coefficients can be referred to as transform coefficients and scaled transform coefficients, respectively. In this case, the residual information can include information about the transform coefficients, and this information can be signaled via residual coding syntax. The transform coefficients can be derived based on the residual information (or information about the transform coefficients), and the scaled transform coefficients can be derived by performing an inverse transform (scaling) on the transform coefficients. The residual samples can be derived based on the inverse transform (scaling) on the scaled transform coefficients. This can also be applied / expressed in other parts of this disclosure.
[0051] In this disclosure, the technical features described in a single accompanying drawing may be implemented individually or simultaneously.
[0052] In the following, preferred embodiments of the present disclosure are described in more detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used for the same elements, and redundant descriptions of the same elements may be omitted.
[0053] Figure 1 Examples of video / image coding systems to which the implementation methods described herein can be applied are illustrated.
[0054] Reference Figure 1 A video / image encoding system may include a source device and a receiving device. The source device may transmit encoded video / image information or data to the receiving device in the form of a file or stream via a digital storage medium or network.
[0055] The source device may include a video source, an encoding device, and a transmitter. The receiving device may include a receiver, a decoding device, and a renderer. The encoding device may be referred to as a video / image encoding device, and the decoding device may be referred to as a video / image decoding device. The transmitter may be included in the encoding device. The receiver may be included in the decoding device. The renderer may include a display, and the display may be configured as a separate device or an external component.
[0056] Video sources can acquire video / images through processes that capture, synthesize, or generate video / images. Video sources may include video / image capture devices and / or video / image generation devices. For example, a video / image capture device may include one or more cameras, a video / image archive containing previously captured video / images, etc. For example, a video / image generation device may include a computer, tablet computer, and smartphone, and may generate video / images (electronically). For example, virtual video / images may be generated via a computer, etc. In this case, the video / image capture process may be replaced by a process that generates related data.
[0057] Encoding devices can encode input video / images. For compression and encoding efficiency, encoding devices can perform a series of processes such as prediction, transformation, and quantization. The encoded data (encoded video / image information) can be output as a bitstream.
[0058] The transmitter can transmit encoded images / image information or data, output as a bitstream, to the receiver of the receiving device in the form of a file or stream via a digital storage medium or network. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmitter may include elements for generating media files according to a predetermined file format and may include elements for transmission over a broadcast / communication network. The receiver can receive / extract the bitstream and transmit the received bitstream to a decoding device.
[0059] Decoding devices can decode video / images by performing a series of processes such as inverse quantization, inverse transform, and prediction, which correspond to the operations of encoding devices.
[0060] The renderer can render decoded video / images. The rendered video / images can be displayed on a monitor.
[0061] Figure 2This is a schematic illustration of the configuration of a video / image encoding apparatus applicable to embodiments of the present disclosure. Hereinafter, the encoding apparatus may include image encoding apparatus and / or video encoding apparatus.
[0062] Reference Figure 2 The encoding device 200 includes an image segmenter 210, a predictor 220, a residual processor 230, an entropy encoder 240, an adder 250, a filter 260, and a memory 270. The predictor 220 may include an inter-frame predictor 221 and an intra-frame predictor 222. The residual processor 230 may include a transform 232, a quantizer 233, an inverse quantizer 234, and an inverse transform 235. The residual processor 230 may also include a subtractor 231. The adder 250 may be referred to as a reconstructor or a reconstruction block generator. According to embodiments, the image segmenter 210, predictor 220, residual processor 230, entropy encoder 240, adder 250, and filter 260 may be configured by at least one hardware component (e.g., an encoder chipset or processor). Additionally, the memory 270 may include a decoded picture buffer (DPB) or may be configured by a digital storage medium. The hardware component may also include the memory 270 as an internal / external component.
[0063] Image segmenter 210 can segment an input image (or picture or frame) input to encoding device 200 into one or more processors. For example, a processor may be referred to as a coding unit (CU). In this case, the coding unit may be recursively segmented from coding tree unit (CTU) or maximum coding unit (LCU) according to a quadtree-binary-trinary tree (QTBTTT) structure. For example, a coding unit may be segmented into multiple deeper coding units based on a quadtree structure, a binary tree structure, and / or a ternary structure. In this case, for example, a quadtree structure may be applied first, followed by a binary tree structure and / or a ternary structure. Alternatively, a binary tree structure may be applied first. The encoding process according to this disclosure may be performed based on the final coding unit that is no longer segmented. In this case, the maximum coding unit may be used as the final coding unit based on image characteristics, coding efficiency, etc., or if necessary, the coding unit may be recursively segmented into deeper coding units, and the coding unit with the optimal size may be used as the final coding unit. Here, the encoding process may include prediction, transformation, and reconstruction processes (described later). As another example, the processor may also include a prediction unit (PU) or a transform unit (TU). In this case, the prediction unit and the transform unit may be split or separated from the final encoding unit described above. The prediction unit may be a unit for predicting samples, and the transform unit may be a unit for deriving transform coefficients and / or a unit for deriving residual signals from transform coefficients.
[0064] In some cases, a unit can be used interchangeably with terms such as a block or region. Generally, an M×N block can represent a set of samples or transform coefficients consisting of M columns and N rows. A sample can typically represent a pixel or pixel value, either representing only the pixel / pixel value of the luminance component or only the pixel / pixel value of the chrominance component. A sample can be used as a term corresponding to a picture (or image) of pixels or cells.
[0065] In the encoding device 200, a residual signal (residual block, residual sample array) is generated by subtracting the prediction signal (prediction block, prediction sample array) output from the inter-frame predictor 221 or the intra-frame predictor 222 from the input image signal (original block, original sample array), and the generated residual signal is sent to the converter 232. In this case, as shown, the unit in the encoding device 200 that subtracts the prediction signal (prediction block, prediction sample array) from the input image signal (original block, original sample array) may be called the subtractor 231. The predictor can perform prediction on the block to be processed (hereinafter referred to as the current block) and generate a prediction block including the prediction samples of the current block. The predictor can determine whether to apply intra-frame prediction or inter-frame prediction based on the current block or CU. As described later in the description of the various prediction modes, the predictor can generate various types of information related to the prediction (e.g., prediction mode information) and send the generated information to the entropy encoder 240. The information about the prediction can be encoded in the entropy encoder 240 and output in the form of a bitstream.
[0066] Intra-predictor 222 can refer to samples in the current image to predict the current block. Depending on the prediction mode, the referenced samples may be located near or separated from the current block. In intra-prediction, the prediction mode may include multiple non-directional modes and multiple directional modes. For example, non-directional modes may include DC mode and planar mode. For example, depending on the level of detail in the prediction direction, the directional modes may include 33 or 65 directional prediction modes. However, this is just an example, and more or fewer directional prediction modes may be used depending on the settings. Intra-predictor 222 can use the prediction modes applied to neighboring blocks to determine the prediction mode applied to the current block.
[0067] Inter-frame predictor 221 can deduce the predicted block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference image. Here, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted on a block, sub-block, or sample basis based on the correlation between motion information between neighboring blocks and the current block. Motion information may include motion vectors and reference image indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. The reference image including the reference block and the reference image including the temporally neighboring block may be the same or different. The temporally neighboring block may be referred to as a juxtaposed reference block, juxtaposed CU (colCU), etc., and the reference image including the temporally neighboring block may be referred to as a juxtaposed image (colPic). For example, inter-frame predictor 221 can configure a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to deduce the motion vector and / or reference image index of the current block. Inter-frame prediction can be performed based on various prediction modes. For example, in skip mode and merge mode, the inter-frame predictor 221 can use motion information from neighboring blocks as motion information for the current block. In skip mode, unlike merge mode, residual signals may not be sent. In motion vector prediction (MVP) mode, motion vectors from neighboring blocks can be used as motion vector predictors, and the motion vector of the current block can be indicated by signaling the motion vector difference.
[0068] Predictor 220 can generate a prediction signal based on various prediction methods described below. For example, the predictor can apply not only intra-frame prediction or inter-frame prediction to predict a block, but also both intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as combined intra-frame and inter-frame prediction (CIIP). Alternatively, the predictor can predict blocks based on an intra-block copy (IBC) prediction mode or a palette mode. The IBC prediction mode or palette mode can be used for content image / video coding such as screen content coding (SCC) in games, etc. IBC essentially performs prediction in the current frame, but can be performed similarly to inter-frame prediction, such that a reference block is derived in the current frame. That is, IBC can use at least one inter-frame prediction technique described herein. The palette mode can be considered as an example of intra-frame coding or intra-frame prediction. When a palette mode is applied, sample values within the frame can be signaled based on information about the palette table and palette index.
[0069] The predicted signal generated by the predictor (including inter-frame predictor 221 and / or intra-frame predictor 222) can be used to generate a reconstructed signal or a residual signal. Transformer 232 can generate transform coefficients by applying transform techniques to the residual signal. For example, the transform technique may include at least one of Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen–Loève Transform (KLT), Graphical Based Transform (GBT), or Conditional Nonlinear Transform (CNT). Here, GBT refers to a transform obtained from a graphic when the relationship information between pixels is represented graphically. CNT refers to a transform generated based on the predicted signal generated using all previously reconstructed pixels. Furthermore, the transform processing can be applied to square pixel blocks of the same size or to blocks of variable size other than square.
[0070] Quantizer 233 quantizes the transform coefficients and sends them to entropy encoder 240, which encodes the quantized signal (information about the quantized transform coefficients) and outputs a bitstream. This information about the quantized transform coefficients can be referred to as residual information. Quantizer 233 can rearrange the block-type quantized transform coefficients into a one-dimensional vector based on the coefficient scan order, and generate information about the quantized transform coefficients based on this one-dimensional vector. Entropy encoder 240 can perform various encoding methods, such as exponential Golomb coding, context-adaptive variable-length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). Entropy encoder 240 can encode information required for video / image reconstruction other than the quantized transform coefficients (e.g., values of syntax elements, etc.) together or separately. The encoded information (e.g., encoded video / image information) can be sent or stored in NAL (Network Abstraction Layer) units as a bitstream. The video / image information may also include information about various parameter sets, such as Adaptive Parameter Set (APS), Picture Parameter Set (PPS), Sequence Parameter Set (SPS), or Video Parameter Set (VPS). Additionally, the video / image information may include general constraint information. In this document, information and / or syntactic elements sent from the encoding device / signed to the decoding device may be included in the video / image information. The video / image information may be encoded by the encoding process described above and included in the bitstream. The bitstream may be transmitted via a network or stored in a digital storage medium. The network may include broadcast networks and / or communication networks, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmitter (not shown) that transmits the signal output from the entropy encoder 240 and / or a storage unit (not shown) that stores the signal may be included as an internal / external element of the encoding device 200, and alternatively, the transmitter may be included in the entropy encoder 240.
[0071] The quantized transform coefficients output from quantizer 233 can be used to generate a prediction signal. For example, the residual signal (residual block or residual sample) can be reconstructed by applying inverse quantization and inverse transform to the quantized transform coefficients via inverse quantizer 234 and inverse transformer 235. Adder 250 adds the reconstructed residual signal to the prediction signal output from inter-frame predictor 221 or intra-frame predictor 222 to generate a reconstructed signal (reconstructed image, reconstructed block, reconstructed sample array). If the block to be processed has no residual (e.g., in the case of applying skip mode), the prediction block can be used as a reconstructed block. Adder 250 may be referred to as a reconstructor or reconstructed block generator. As described below, the generated reconstructed signal can be used for intra-frame prediction of the next block to be processed in the current image and can be filtered for inter-frame prediction of the next image.
[0072] In addition, luminance mapping with chroma scaling (LMCS) can be applied during image encoding and / or reconstruction.
[0073] Filter 260 can improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, filter 260 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image and store the modified reconstructed image in memory 270 (specifically, the DPB of memory 270). Various filtering methods may include deblocking filtering, sample adaptive shifting, adaptive loop filtering, bilateral filtering, etc. Filter 260 can generate various types of filtering-related information and send the generated information to entropy encoder 240, as described later in the description of the various filtering methods. The filtering-related information can be encoded by entropy encoder 240 and output as a bitstream.
[0074] The modified reconstructed image sent to memory 270 can be used as a reference image in inter-frame predictor 221. When inter-frame prediction is applied by the encoding device, prediction mismatch between the encoding device 200 and the decoding device can be avoided and encoding efficiency can be improved.
[0075] The DPB of memory 270 can store a modified reconstructed image used as a reference image in inter-frame predictor 221. Memory 270 can store motion information of blocks in the current image that derive (or encode) motion information and / or motion information of already reconstructed blocks in the image. The stored motion information can be sent to inter-frame predictor 221 and used as motion information for spatially or temporally neighboring blocks. Memory 270 can store reconstructed samples of reconstructed blocks in the current image and can transmit the reconstructed samples to intra-frame predictor 222.
[0076] Figure 3This is a schematic illustration of the configuration of a video / image decoding apparatus to which embodiments of the present disclosure are applicable. Hereinafter, the decoding apparatus may include an image decoding apparatus and / or a video decoding apparatus.
[0077] Reference Figure 3 The decoding device 300 may include an entropy decoder 310, a residual processor 320, a predictor 330, an adder 340, a filter 350, and a memory 360. The predictor 330 may include an inter-frame predictor 332 and an intra-frame predictor 331. The residual processor 320 may include an inverse quantizer 321 and an inverse transformer 322. According to embodiments, the entropy decoder 310, residual processor 320, predictor 330, adder 340, and filter 350 may be configured by hardware components (e.g., a decoder chipset or processor). Additionally, the memory 360 may include a decoded picture buffer (DPB) or may be configured by a digital storage medium. The hardware components may also include the memory 360 as an internal / external component.
[0078] When the input includes a bitstream containing video / image information, the decoding device 300 can reconstruct and... Figure 2 The encoding device processes video / image information corresponding to the image. For example, the decoding device 300 can deduce units / blocks based on block segmentation information obtained from the bitstream. The decoding device 300 can use a processor applied in the encoding device to perform decoding. Therefore, for example, the processor for decoding can be an encoding unit, and the encoding unit can be segmented from encoding tree units or maximum encoding units according to a quadtree structure, binary tree structure, and / or ternary tree structure. One or more transform units can be derived from the encoding units. The reconstructed image signal decoded and output by the decoding device 300 can be reproduced by a reproduction device.
[0079] Decoding device 300 can receive from Figure 2The encoding device outputs a signal in the form of a bitstream, and the received signal can be decoded by the entropy decoder 310. For example, the entropy decoder 310 can parse the bitstream to derive information (e.g., video / image information) required for image reconstruction (or picture reconstruction). The video / image information may also include information about various parameter sets, such as adaptive parameter sets (APS), picture parameter sets (PPS), sequence parameter sets (SPS), or video parameter sets (VPS). In addition, the video / image information may also include general constraint information. The decoding device can also decode the picture based on the information about the parameter sets and / or general constraint information. The information and / or syntax elements notified / received by signals, as described later herein, can be decoded and obtained from the bitstream through the decoding process. For example, the entropy decoder 310 decodes the information in the bitstream based on encoding methods such as exponential Golomb coding, CAVLC, or CABAC, and outputs quantized values of the syntax elements and transform coefficients of the residuals required for image reconstruction. More specifically, the CABAC entropy decoding method receives bins corresponding to each syntactic element in the bitstream, determines a context model using information about the target syntactic element, decoding information about the target block, or information about symbols / bins decoded in a previous stage, and performs arithmetic decoding on the bins by predicting the probability of bin occurrence based on the determined context model, generating symbols corresponding to the values of each syntactic element. In this case, the CABAC entropy decoding method can update the context model after determining the context model by using the information of the decoded symbols / bins for the context model of the next symbol / bin. Information related to prediction from the information decoded by the entropy decoder 310 can be provided to the predictors (inter-frame predictor 332 and intra-frame predictor 331), and the residual values (i.e., quantized transform coefficients and related parameter information) from the entropy decoder 310 can be input to the residual processor 320. The residual processor 320 can derive residual signals (residual blocks, residual samples, residual sample arrays). Additionally, information about filtering from the information decoded by the entropy decoder 310 can be provided to the filter 350. Furthermore, a receiver (not shown) for receiving signals output from the encoding device may be configured as an internal / external element of the decoding device 300, or the receiver may be a component of the entropy decoder 310. Additionally, the decoding device according to this document may be referred to as a video / image / picture decoding device, and the decoding device may be classified as an information decoder (video / image / picture information decoder) and a sample decoder (video / image / picture sample decoder). The information decoder may include the entropy decoder 310, and the sample decoder may include at least one of a dequantizer 321, an inverse transformer 322, an adder 340, a filter 350, a memory 360, an inter-frame predictor 332, and an intra-frame predictor 331.
[0080] The dequantizer 321 can dequantize the quantized transform coefficients and output the transform coefficients. The dequantizer 321 can rearrange the quantized transform coefficients in a two-dimensional block format. In this case, the rearrangement can be performed based on the coefficient scan order performed in the encoding device. The dequantizer 321 can use quantization parameters (e.g., quantization step size information) to perform dequantization on the quantized transform coefficients and obtain the transform coefficients.
[0081] The inverse transformer 322 performs inverse transformation on the transformation coefficients to obtain the residual signal (residual block, residual sample array).
[0082] Predictor 330 can perform prediction on the current block and generate a prediction block that includes prediction samples of the current block. The predictor can determine whether to apply intra-frame prediction or inter-frame prediction to the current block based on information about the prediction output from entropy decoder 310, and can determine a specific intra-frame / inter-frame prediction mode.
[0083] The predictor can generate a prediction signal based on various prediction methods. For example, the predictor can apply not only intra-frame prediction or inter-frame prediction to predict a block, but also both intra-frame and inter-frame prediction simultaneously. This can be referred to as combined intra-frame and inter-frame prediction (CIIP). Alternatively, the predictor can predict blocks based on an intra-block copy (IBC) prediction mode or a palette mode. IBC prediction modes or palette modes can be used for content image / video coding, such as screen content coding (SCC), in games, etc. IBC essentially performs prediction in the current frame, but can be performed similarly to inter-frame prediction, such that a reference block is derived in the current frame. That is, IBC can use at least one inter-frame prediction technique described herein. A palette mode can be considered an example of intra-frame coding or intra-frame prediction. When a palette mode is applied, sample values within the frame can be signaled based on information about the palette table and palette index. The intra-frame predictor 331 can refer to samples in the current frame to predict the current block. Depending on the prediction mode, the referenced samples may be located near or separated from the current block. In intra-frame prediction, the prediction mode may include multiple non-directional modes and multiple directional modes. The intra-frame predictor 331 can use the prediction modes applied to neighboring blocks to determine the prediction mode applied to the current block.
[0084] Intra-predictor 331 can refer to samples in the current image to predict the current block. Depending on the prediction mode, the referenced samples may be located near or separated from the current block. In intra-prediction, the prediction mode may include multiple non-directional modes and multiple directional modes. Intra-predictor 331 can use the prediction modes applied to neighboring blocks to determine the prediction mode applied to the current block.
[0085] Inter-frame predictor 332 can deduce the predicted block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference image. In this case, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted on a block, sub-block, or sample basis based on the correlation of motion information between neighboring blocks and the current block. Motion information may include motion vectors and reference image indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in the reference image. For example, inter-frame predictor 332 can configure a motion information candidate list based on neighboring blocks and deduce the motion vector and / or reference image index of the current block based on the received candidate selection information. Inter-frame prediction can be performed based on various prediction modes, and the information about the prediction may include information indicating the inter-frame prediction mode of the current block.
[0086] Adder 340 generates a reconstruction signal (reconstructed image, reconstruction block, reconstruction sample array) by adding the obtained residual signal to the prediction signal (prediction block, prediction sample array) output from the predictor (including inter-frame predictor 332 and / or intra-frame predictor 331). If the block to be processed has no residual, such as when a skip mode is applied, the prediction block can be used as the reconstruction block.
[0087] Adder 340 can be referred to as a reconstructor or reconstruction block generator. The generated reconstructed signal can be used for intra-frame prediction of the next block to be processed in the current image, and can be output through filtering as described below, or it can be used for inter-frame prediction of the next image.
[0088] In addition, Luminance Mapping with Chroma Scaling (LMCS) can be applied in image decoding processing.
[0089] Filter 350 can improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, filter 350 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image and store the modified reconstructed image in memory 360 (specifically, the DPB of memory 360). For example, various filtering methods may include deblocking filtering, sample adaptive shifting, adaptive loop filtering, bilateral filtering, etc.
[0090] The (modified) reconstructed image stored in the DPB of memory 360 can be used as a reference image in inter-frame predictor 332. Memory 360 can store motion information of blocks in the current image from which motion information is derived (or decoded) and / or motion information of already reconstructed blocks in the image. The stored motion information can be sent to inter-frame predictor 332 as motion information of spatially or temporally neighboring blocks. Memory 360 can store reconstructed samples of reconstructed blocks in the current image and transmit the reconstructed samples to intra-frame predictor 331.
[0091] In this disclosure, the embodiments described in the filter 260, inter-frame predictor 221, and intra-frame predictor 222 of the encoding device 200 can be applied in the same way as or respectively corresponding to the filter 350, inter-frame predictor 332, and intra-frame predictor 331 of the decoding device 300. This can also be applied to the inter-frame predictor 332 and the intra-frame predictor 331.
[0092] Furthermore, as mentioned above, prediction is performed during video encoding to enhance compression efficiency. A prediction block, including predicted samples of the current block (i.e., the target coding block), can be generated through prediction. In this case, the prediction block includes predicted samples in the spatial domain (or pixel domain). The prediction block is derived identically in both the encoding and decoding devices. The encoding device can enhance image coding efficiency by signaling information (residual information) about the residual between the original block (rather than the original sample values of the original block) and the prediction block. The decoding device can derive a residual block including residual samples based on the residual information, generate a reconstructed block including reconstructed samples by adding the residual block and the prediction block, and generate a reconstructed image including the reconstructed block.
[0093] Residual information can be generated through transform and quantization processes. For example, an encoding device can derive a residual block between the original block and the prediction block, derive transform coefficients by performing a transform process on residual samples (residual sample arrays) included in the residual block, derive quantized transform coefficients by performing a quantization process on the transform coefficients, and signal the relevant residual information (via a bitstream) to the decoding device. In this case, the residual information may include information such as the values of the quantized transform coefficients, their locations, the transform scheme, the transform core, and the quantization parameters. The decoding device can perform inverse quantization / inverse transform processes based on the residual information and derive residual samples (or residual blocks). The decoding device can generate a reconstructed image based on the prediction block and the residual block. Furthermore, the encoding device can derive the residual block by performing inverse quantization / inverse transform on the quantized transform coefficients used as a reference for inter-frame prediction of subsequent images and can generate a reconstructed image.
[0094] Figure 4 An example of an illustrative video / image encoding process applicable to embodiments of this disclosure is shown. Figure 4In China, the S400 can be composed of the above-mentioned components. Figure 2 The predictor 220 of the encoding apparatus described herein executes S410, which may be executed by the residual processor 230, and S420 may be executed by the entropy encoder 240. S400 may include the inter-frame / intra-frame prediction process described herein, S410 may include the residual processing process described herein, and S420 may include the information encoding process described herein.
[0095] Reference Figure 4 The video / image encoding process can include more than just what is shown in the original text. Figure 2 The description describes the process of encoding information used for image reconstruction (e.g., prediction information, residual information, segmentation information, etc.) and outputting the encoded information as a bitstream. It also includes the process of generating a reconstructed image for the current image and, optionally, applying loop filtering to the reconstructed image. The encoding device can derive (modified) residual samples from the transform coefficients quantized by the dequantizer 234 and the inverse transformer 235, and generate a reconstructed image based on the prediction samples and (modified) residual samples output in S400. Therefore, the generated reconstructed image can be the same as the reconstructed image generated by the decoding device described above. The modified reconstructed image can be generated by a loop filtering process for the reconstructed image, which can be stored in the decoding image buffer or memory 270, and, as in the case of the decoding device, can be used as a reference image in the inter-frame prediction process during subsequent image encoding. As mentioned above, in some cases, part or all of the loop filtering process can be omitted. When performing the loop filtering process, the (loop) filtering-related information (parameters) can be encoded by the entropy encoder 240 and output as a bit stream, and the decoding device can perform the loop filtering process in the same way as the encoding device based on the filtering-related information.
[0096] This loop-filtered sensing reduces noise generated during image / video encoding, such as block artifacts and ringing artifacts, and improves both subjective and objective visual quality. Furthermore, by performing the loop-filtering process in both the encoding and decoding devices, they can derive the same prediction results, improving the reliability of image encoding and reducing the amount of data transmitted for image encoding.
[0097] As described above, the image reconstruction process can be performed not only in the decoding device but also in the encoding device. Reconstructed blocks can be generated based on intra-frame prediction / inter-frame prediction for each block, and a reconstructed image including the reconstructed blocks can be generated. When the current image / slice / tile group is an I-image / slice / tile group, the blocks included in the current image / slice / tile group can be reconstructed based solely on intra-frame prediction. On the other hand, when the current image / slice / tile group is a P-image / slice / tile group or a B-image / slice / tile group, the blocks included in the current image / slice / tile group can be reconstructed based on either intra-frame prediction or inter-frame prediction. In this case, inter-frame prediction can be applied to some blocks in the current image / slice / tile group, and intra-frame prediction can be applied to the remaining blocks. The color components of the image can include luma and chroma components, and unless explicitly limited in this disclosure, the methods and implementations proposed in this disclosure can be applied to both luma and chroma components.
[0098] Figure 5 An example of an illustrative video / image decoding process applicable to embodiments of this disclosure is shown. Figure 5 In China, the S500 can be composed of the above... Figure 3 The entropy decoder 310 of the decoding device described herein executes step S510, which may be executed by the predictor 330, step S520 by the residual processor 320, step S530 by the adder 340, and step S540 by the filter 350. Step S500 may include the information decoding process described herein, step S510 may include the inter-frame / intra-frame prediction process described herein, step S520 may include the residual processing process described herein, step S530 may include the block / picture reconstruction process described herein, and step S540 may include the loop filtering process described herein.
[0099] Reference Figure 5 The image decoding process may include the process of obtaining image / video information from the bitstream (through decoding) (S500), the image reconstruction process (S510 to S530), and the loop filtering process for reconstructing the image (S540), such as for... Figure 3 As described in the description, the image reconstruction process can be performed based on prediction samples and residual samples obtained through inter-frame / intra-frame prediction (S510) and residual processing (S520; inverse quantization and inverse transform of the quantization transform coefficients) as described in this disclosure. A modified reconstructed image can be generated by a loop filtering process applied to the reconstructed image generated by the image reconstruction process, and the modified reconstructed image can be output as a decoded image. Furthermore, the modified reconstructed image can be stored in a decoded image buffer or the memory 360 of the decoding device and used as a reference image in the inter-frame prediction process during subsequent image decoding.
[0100] In some cases, the loop filtering process can be omitted. In this case, the reconstructed image can be output as a decoded image stored in the memory 360 of the decoding device or in the decoded image buffer, and used as a reference image in inter-frame prediction during subsequent image decoding. The loop filtering process (S540) as described above may include a deblocking filtering process, a sample adaptive offset (SAO) process, an adaptive loop filter (ALF) process, and / or a bilateral filter process, and some or all of them may be omitted. Alternatively, one or more of the deblocking filtering process, the sample adaptive offset (SAO) process, the adaptive loop filter (ALF) process, and the bilateral filter process may be applied sequentially, or all of them may be applied sequentially. For example, the SAO process may be performed after the deblocking filtering process is applied to the reconstructed image. For example, the ALF process may be performed after the deblocking filtering process is applied to the reconstructed image. This can be performed in the same manner in the encoding device.
[0101] As described above, the encoding device can derive residual blocks (residual samples) based on blocks (predicted samples) predicted through intra / inter / IBC prediction, and apply transform and quantization to the derived residual samples to derive the quantized transform coefficients. Information about the quantized transform coefficients (residual information) can be included in the residual coding syntax and output as a bitstream after encoding. The decoding device can obtain information about the quantized transform coefficients (residual information) from the bitstream and decode this information to derive the quantized transform coefficients. The decoding device can derive the residual samples through inverse quantization / inverse transform based on the quantized transform coefficients. As described above, at least one of quantization / inverse quantization and / or transform / inverse transform can be skipped. When transform / inverse transform is omitted, the transform coefficients can be referred to as coefficients or residual coefficients, or for consistency, they can still be referred to as transform coefficients. A signal can be sent based on `transform_skip_flag` to indicate whether the transform / inverse transform is omitted. For example, when the value of `transform_skip_flag` is 1, it can indicate that the transform / inverse transform is skipped; this can be called transform skip mode.
[0102] In video / image coding, the quantization rate can typically be changed, and the compression ratio can be adjusted using the changed quantization rate. From an implementation perspective, considering complexity, a quantization parameter (QP) can be used instead of the quantization rate. For example, a quantization parameter with integer values from 0 to 63 can be used, and each quantization parameter value can correspond to the actual quantization rate. For instance, the quantization parameter QP for the luma component (luma sample) can be set differently. Y Quantization parameter QP of chromaticity components (chromaticity samples) C .
[0103] Quantization can be performed by taking the transform coefficients C as input and dividing them by the quantization rate Q. step And based on this, the quantized transform coefficients C' are obtained. In this case, considering computational complexity, the quantization rate can be multiplied by a certain scale to form an integer, and a shift operation can be performed using the value corresponding to the scale value. The quantization scale can be derived based on the product of the quantization rate and the scale value. That is, the quantization scale can be derived from QP. For example, the quantized transform coefficients C' can be derived by applying the quantization scale to the transform coefficients C.
[0104] Inverse quantization is the inverse process of quantization, and it can be achieved by multiplying the quantized transform coefficients C' by the quantization rate Q. step To obtain the reconstructed transform coefficients C'. In this case, the level scale can be derived from the quantization parameters, and the reconstructed transform coefficients C' can be derived by applying the level scale to the quantized transform coefficients C'. Due to losses in the transform and / or quantization processes, the reconstructed transform coefficients C' may differ slightly from the original transform coefficients C. Therefore, inverse quantization is performed in the encoding device in the same manner as in the decoding device.
[0105] Furthermore, prediction can be performed based on palette encoding. Palette encoding is a useful technique for representing blocks that comprise a small number of unique color values. Instead of applying prediction and transformation to blocks, a palette mode is used to signal the index used to indicate the color value of each sample. This palette mode is useful for saving video memory buffer space. Blocks can be encoded using a palette mode (e.g., MODE_PLT). To decode blocks encoded in this way, the decoder needs to decode both the palette colors and the indices. Palette colors can be represented by a palette table and can be encoded using a palette table encoding tool.
[0106] Figure 6 An example of the basic structure used to describe palette coding is shown.
[0107] Reference Figure 6 Image 600 can be represented by histogram 610. Here, the primary color value is typically mapped to a color index (620), and the image can be encoded using a color index map (630).
[0108] Palette coding can be referred to as (intra) palette mode, (intra) palette coding scheme, etc. The current block can be reconstructed based on palette coding or palette mode. Palette coding can be considered an example of intra coding, or one of the intra prediction methods. However, similar to the skip mode described above, the additional residual values for the corresponding block may not be signaled.
[0109] For example, palette patterns can be used to improve the encoding efficiency of screen content such as computer-generated videos containing large amounts of text and graphics. Typically, local areas of screen content have several colors distinguished by sharpened edges. To take advantage of this property, palette patterns can represent samples of blocks based on indices indicating color entries in a palette table.
[0110] For example, information about the palette table can be signaled. The palette table may include index values corresponding to each color. Palette index prediction data can be received, and the palette table may include data indicating index values of at least a portion of a palette index map that maps pixels of video data to color indices in the palette table. The palette index prediction data may include run value data that associates at least a portion of the index values of the palette index map with run values. Run values may be associated with escaped color indices. The palette index map can be generated from the palette index prediction data, at least in part, by determining whether to adjust the index values of the palette index prediction data based on the last index value. The current block in the image can be reconstructed based on the palette index map.
[0111] When using palette mode, the pixel values of a CU can be represented by a set of representative color values. This set can be called a palette. When a pixel has a value close to a color value in the palette, the palette index corresponding to that color value can be signaled. When a pixel has a color value other than that in the palette, the pixel can be represented by an escape symbol and the quantized pixel value can be signaled directly. In this disclosure, a pixel or pixel value can be referred to as a sample or sample value.
[0112] To decode blocks encoded in palette mode, the decoder needs to decode both the palette colors and indices. Palette colors can be represented using a palette table and encoded using a palette table encoding tool. Escape flags can be signaled to each CU to indicate the presence of an escape character in the current CU. If an escape character is present, the palette table is incremented by 1, and the last index can be assigned to the escape mode. The palette indices of all pixels in the CU can form a palette index map, which can be encoded using a palette index map encoding tool.
[0113] For example, a palette predictor can be maintained for encoding the palette table. The predictor can be initialized at the beginning of each slice where the predictor is reset to zero. For each entry in the palette predictor, a reuse flag can be signaled to indicate whether it is part of the current palette. Zero-run-length encoding can be used to send the reuse flag. Then, zero-order exponent Golomb encoding can be used to signal the number of new palette entries. Finally, the component values of the new palette entries can be signaled. After encoding the current CU, the palette predictor can be updated with the current palette, and entries from old palette predictors that are not reused in the current palette can be added to the end of the new palette predictor until the maximum allowed size (palette filling) is reached.
[0114] For example, horizontal and vertical traversal scans can be used to encode the index in order to encode the palette index map. The scan order can be explicitly signaled from the bitstream using flag information (e.g., palette_transpose_flag).
[0115] Figure 7 Examples are shown to describe horizontal and vertical traversal scanning methods for encoding palette index maps.
[0116] Figure 7 (a) shows an example of encoding a palette index map using a horizontal traversal scan, and Figure 7 (b) shows an example of encoding a palette index map using a vertical traversal scan.
[0117] like Figure 7 As shown in (a), when using a horizontal scan, the palette index can be encoded by scanning samples in the horizontal direction from the samples in the first row (top row) to the samples in the last row (bottom row) of the current block (i.e., the current CU).
[0118] like Figure 7 As shown in (b), when using a vertical scan, the palette index can be encoded by scanning samples in the vertical direction from the first column (leftmost column) to the last column (bottom column) in the current block (i.e., the current CU).
[0119] Palette indices can be encoded using two palette sample modes (e.g., "INDEX" mode and "COPY_ABOVE" mode). A flag indicating whether the mode is "INDEX" or "COPY_ABOVE" can be used to signal this palette mode. Here, when using horizontal scan, the flag can be signaled except for the top row, and when using vertical scan or when the previous mode was "COPY_ABOVE," the flag can be signaled except for column A. In "COPY_ABOVE" mode, the palette index of the sample in the previous row can be copied. In "INDEX" mode, the palette index can be explicitly signaled. For both "INDEX" and "COPY_ABOVE" modes, a running value indicating the number of pixels encoded using the same mode can be signaled.
[0120] The encoding order of the index graph is as follows. First, the number of index values for the CU can be signaled. Then, the actual index value for the entire CU can be signaled using truncated binary (TB) encoding. Both the number of indices and the index values can be encoded in bypass mode. In this case, the bypass bins associated with the indexes can be grouped together. Next, the palette mode (“INDEX” mode or “COPY_ABOVE” mode) and the run can be signaled in an interleaved manner. Finally, the component escape values corresponding to the escape samples of the entire CU can be grouped together and encoded in bypass mode. After signaling the index values, the additional syntax element `last_run_type_flag` can be signaled. This syntax element does not need to signal the run value corresponding to the last run in the block along with the number of indices.
[0121] Furthermore, in the VVC standard, dual-tree coding can be enabled for I slices, separating the coding units for luma and chroma differentiation. Palette coding (palette mode) can be applied independently or together to luma (Y component) and chroma (Cb and Cr components). When dual-tree coding is disabled, palette coding (palette mode) can be applied together to luma (Y component) and chroma (Cb and Cr components).
[0122] Figure 8 This is a diagram illustrating an example of a palette-based encoding method.
[0123] Reference Figure 8 The decoding device can obtain the palette information based on the bitstream and / or previous palette information (S800).
[0124] In one implementation, the decoding device can receive palette pattern information for each sample position and run length information for each palette pattern while traversing samples, palette index information, and traversal direction (scanning order) information in the CU from the bit stream.
[0125] The decoding device can configure the color palette based on the color palette information (S810).
[0126] In this implementation, the decoding device can configure a palette predictor. Palette information used in previous blocks can be stored for use in the next generated palette CU (i.e., the CU encoded in palette mode), and this can be defined as a palette predictor entry. The decoding device can receive new palette entry information and configure a palette for the current CU. For example, after receiving new palette entry information to be used in the current CU and received palette predictor reuse information, the decoding device can combine these two entry information to form a palette representing the current CU.
[0127] The decoding device can deduce the sample values (sample predictions) in the current block based on the palette (S820).
[0128] In this implementation, the decoding device can configure samples from the obtained palette information while traversing samples in the CU along the horizontal or vertical direction based on traversal direction (scanning order) information. If the palette mode information indicates COPY_ABOVE mode, each sample value in the CU can be derived by copying the index information of the left sample position in the vertical scan and the index information of the upper sample position in the horizontal scan. That is, the predicted sample in the CU can be derived by deriving the color value of each sample from the configured palette table based on the index information of each sample in the CU. Then, the decoding device can use the palette information to reconfigure the information of each sample in the CU and update the palette predictor.
[0129] In addition, the aforementioned palette encoding (palette mode or palette encoding mode) can signal information indicating whether the current CU is encoded in palette mode, and encode it by applying palette mode.
[0130] As an example, the Sequence Parameter Set (SPS) shown in Table 1 below can be used to signal whether a palette encoding mode is available.
[0131] [Table 1]
[0132]
[0133] The semantics of the grammatical elements included in the grammar in Table 1 can be represented as shown in Table 2 below.
[0134] [Table 2]
[0135]
[0136] Referring to Tables 1 and 2, the `sps_palette_enabled_flag` syntax element can be parsed / signed in the SPS. The `sps_palette_enabled_flag` syntax element can indicate whether a palette encoding mode is available. For example, when `sps_palette_enabled_flag` is 1, it indicates that a palette encoding mode is available, and in this case, information in the encoding unit syntax indicating whether to apply a palette encoding mode to the current encoding unit (e.g., `pred_mode_plt_flag`) can be parsed / signed. When `sps_palette_enabled_flag` is 0, it indicates that a palette encoding mode is unavailable, and in this case, information in the encoding unit syntax indicating whether to apply a palette encoding mode to the current encoding unit (e.g., `pred_mode_plt_flag`) may not be parsed / signed.
[0137] Additionally, for example, information about whether to perform encoding by applying the palette mode can be signaled based on information about whether the palette encoding mode is available (e.g., sps_palette_enabled_flag), and this information can be signaled through the encoding unit syntax shown in Table 3 below.
[0138] [Table 3]
[0139]
[0140] The semantics of the grammatical elements included in the grammar in Table 3 can be represented as shown in Table 4 below.
[0141] [Table 4]
[0142]
[0143] Referring to Tables 3 and 4, the `pred_mode_plt_flag` syntax element can be parsed / signed in the coding unit syntax. The `pred_mode_plt_flag` syntax element indicates whether to apply a palette mode to the current coding unit. For example, a value of 1 for `pred_mode_plt_flag` indicates that the palette mode should be applied to the current coding unit, and a value of 0 indicates that the palette mode should not be applied to the current coding unit.
[0144] In this case, the pred_mode_plt_flag can be parsed / signaled based on information about whether a palette encoding mode is available (e.g., sps_palette_enabled_flag). For example, when the value of sps_palette_enabled_flag is 1 (i.e., when a palette encoding mode is available), the pred_mode_plt_flag can be parsed / signaled.
[0145] Alternatively, encoding can be performed by applying a palette mode to the current encoding unit based on the pred_mode_plt_flag. For example, when the value of pred_mode_plt_flag is 1, the palette mode can be applied to the current encoding unit to generate reconstructed samples by parsing / signaling the palette_coding() syntax.
[0146] As an example, Table 5 below shows the palette coding syntax.
[0147] [Table 5]
[0148]
[0149]
[0150]
[0151] The semantics of the grammatical elements included in the grammar in Table 5 can be represented as shown in Table 6 below.
[0152] [Table 6]
[0153]
[0154]
[0155]
[0156]
[0157]
[0158] Referring to Tables 5 and 6, when applying a palette mode to the current block (i.e., the current coding unit), the palette coding syntax (e.g., palette_coding()) can be parsed / signaled as shown in Table 5.
[0159] For example, a palette table can be configured based on palette entry information. Palette entry information can include syntax elements such as palette_predictor_run, num_signalled_palette_entry, and new_palette_entry.
[0160] Additionally, a palette index map can be configured for the current block based on palette index information. Palette index information can include syntax elements such as `num_palette_indices_minus1`, `palette_idx_idc`, `copy_above_indices_for_final_run_flag`, and `palette_transpose_flag`. Based on the palette index information described above, palette index values (e.g., `PaletteIndexIdc`) can be derived for samples in the current block while traversing the scan direction (vertical or horizontal) to configure a palette index map (e.g., `PaletteIndexMap`).
[0161] Furthermore, sample values of palette entries in the palette table can be derived based on the palette index map, and reconstructed samples of the current block can be generated based on the sample values mapped to the palette entries (i.e., color values).
[0162] When a sample has an escape value in the current block (i.e., when `palette_escape_val_present_flag` is 1), the escape value of the current block can be inferred based on the escape information. The escape information can include syntax elements such as `palette_escape_val_present_flag` and `palette_escape_val`. For example, the escape value of an escape-coded sample in the current block can be inferred based on quantized escape value information (e.g., `palette_escape_val`). A reconstructed sample of the current block can be generated based on the escape values.
[0163] As described above, the information (syntax elements) in the syntax table disclosed in this disclosure can be included in image / video information, configured / encoded according to the encoding technique (including palette encoding) performed in the encoding device, and transmitted to the decoding device in the form of a bitstream. The decoding device can parse / decode the information (syntax elements) in the syntax table. The decoding device can perform encoding techniques such as palette encoding based on the decoded information, and can perform a block / image / video reconstruction (decoding) process based on this. Below, this disclosure proposes syntax tables and syntax elements for efficiently encoding blocks / images / videos based on palette encoding.
[0164] This disclosure presents a method for efficiently encoding and signaling escape values in palette mode encoding. In palette mode, escape values can be used to additionally transmit the corresponding sample value of a sample having a value different from that of its neighboring samples in the block. Since such escape values are supplementary data, quantization can be performed to preserve the escape values. Furthermore, in escape encoding in palette mode, no transforms are applied, and quantized escape values can be directly signaled. This can be considered similar to transform skip mode, where no transforms are applied to the coding unit (CU).
[0165] In the current VVC standard, the full range of quantization parameter (QP) values is applied to escape values in palette mode. However, this disclosure proposes a method to limit the range of QP values to prevent the quantization step size of escape value encoding in palette mode from becoming less than 1. In one implementation, the same constraint as the minimum QP for transform skipping can be applied to escape value encoding in palette mode. The minimum QP for transform skipping can be used to clip the minimum QP for palette mode.
[0166] As an example, information about the minimum QP to which the transform is skipped can be signaled using the sequence parameter set (SPS) shown in Table 7 below.
[0167] [Table 7]
[0168]
[0169] The semantics of the grammatical elements included in the grammar in Table 7 can be represented as shown in Table 8 below.
[0170] [Table 8]
[0171]
[0172] Referring to Tables 7 and 8, the `min_qp_prime_ts_minus4` syntax element can be parsed / signaled in the SPS. The `min_qp_prime_ts_minus4` syntax element indicates the minimum quantization parameter allowed in the transform skip mode. In other words, the minimum quantization parameter value (e.g., `QpPrimeTsMin`) in the transform skip mode can be derived based on the `min_qp_prime_ts_minus4` syntax element. For example, the minimum quantization parameter value (e.g., `QpPrimeTsMin`) can be derived by adding 4 to the value of `min_qp_prime_ts_minus4`.
[0173] As described above, based on the min_qp_prime_ts_minus4 syntax element signaled via SPS, the QP for escape values in palette mode can be derived as in the algorithm disclosed in Table 9 below. That is, the QP value used for escape value reconfiguration in palette mode-based decoding processing can be derived as in the algorithm disclosed in Table 9 below.
[0174] [Table 9]
[0175]
[0176] Referring to Table 9, when a palette mode escape value exists, the QP value can be derived. That is, the QP of the palette mode escape value can be derived based on the minimum quantization parameter value (e.g., QpPrimeTsMin) in the transform skip mode derived from the min_qp_prime_ts_minus4 syntax element described above. For example, as shown in Table 9, the QP of the palette mode escape value can be derived as the larger value between QpPrimeTsMin and the quantization parameter Qp (Qp'Y for the luminance component and Qp'Cb or Qp'Cr for the chrominance component). Then, escape values can be derived based on the QP of the palette mode escape value to reconstruct samples in the block.
[0177] Furthermore, as described above in this disclosure, when the QP range in palette mode is limited to a value greater than or equal to the minimum quantization parameter value in transform skip mode (e.g., QpPrimeTsMin), the range of escape values quantized in palette mode can be limited. As an implementation, the range of escape values quantized in palette mode can be determined based on bit depth and can be limited such that it is not greater than, for example, (1 < 1 / 2). <BitDepth)-1。
[0178] For example, the escape value quantized in palette mode can be represented by the syntax element palette_escape_val. The syntax element palette_escape_val can be signaled using the palette encoding syntax shown in Table 10 below.
[0179] [Table 10]
[0180]
[0181] The semantics of the grammatical elements included in the grammar in Table 10 can be represented as shown in Table 11 below.
[0182] [Table 11]
[0183]
[0184] Referring to Tables 10 and 11, the `palette_escape_val` syntax element can be parsed / signaled in the palette encoding syntax. The `palette_escape_val` syntax element can indicate the quantization escape value. Additionally, as shown in Table 10, the value of the `palette_escape_val` syntax element can be set to `PaletteEscapeVal`, and `PaletteEscapeVal` can indicate the escape value for samples where the PaletteIndexMap equals the MaxPaletteIndex and the value of `palette_escape_val_present_flag` is 1. Here, a value of 1 for `palette_escape_val_present_flag` can mean that at least one escape-coded sample (escape value) is included in the current CU. For example, for the luma component, `PaletteEscapeVal` can be restricted to values from 0 to (1 << (BitDepth...)). Y The range is 0 to 1. For the chroma component, PaletteEscapeVal can be limited to 0 to (1 << (BitDepth) - 1. C Within the range of ))–1.
[0185] Furthermore, this disclosure proposes a method for defining a palette size and signaling that size. The palette size can indicate the number of entries in the palette table (i.e., the number of indices in the palette table). As an implementation, in this disclosure, the number of entries in the palette can be indicated by defining the palette size using one or more constants.
[0186] As an example, the palette size can be represented by the syntax element `palette_max_size`, and this element can be the same for the entire sequence, or it can vary depending on the CU size (i.e., the number of pixels in the CU). For instance, the palette size (`palette_max_size`) can indicate the maximum allowed index of the palette table and can be defined as 31. As another example, the palette size (`palette_max_size`) can indicate the maximum allowed index of the palette table and can be defined according to the CU size as shown in Table 12 below.
[0187] [Table 12]
[0188]
[0189] The palette sizes 63, 31, 15, etc. and CU sizes 1024, 256, etc. disclosed in Table 12 are for illustrative purposes only and may be changed to other numbers.
[0190] As an implementation, information indicating the size of the palette (e.g., palette_max_size) can be signaled via SPS, as shown in Table 13 below.
[0191] [Table 13]
[0192]
[0193] The semantics of the grammatical elements included in the grammar in Table 13 can be represented as shown in Table 14 below.
[0194] [Table 14]
[0195] palette_max_size specifies the maximum allowed index of the palette table and should be in the range of 1 to 63, inclusive.
[0196] Referring to Tables 13 and 14 above, the palette_max_size syntax element can be parsed / signaled in SPS. The palette_max_size syntax element can indicate the maximum allowed index of the palette table and can be limited to a range from 1 to 63.
[0197] In this case, the palette_max_size syntax element can be parsed / signaled based on the sps_palette_enabled_flag syntax element, which serves as information indicating whether palette mode is enabled. For example, the palette_max_size syntax element can be parsed / signaled when the value of sps_palette_enabled_flag is 1 (i.e., when it indicates that palette mode is enabled).
[0198] Alternatively, as an implementation, information indicating the palette size (e.g., log2_palette_max_size) can be signaled via SPS, as shown in Table 15 below.
[0199] [Table 15]
[0200]
[0201] The semantics of the grammatical elements included in the grammar in Table 15 can be represented as shown in Table 16 below.
[0202] [Table 16]
[0203]
[0204] Referring to Tables 15 and 16, the log2_palette_max_size syntax element can be parsed / signaled in the SPS. The log2_palette_max_size syntax element can indicate the log2 value of the palette size (i.e., palette_max_size + 1). Therefore, the palette_max_size that indicates the maximum allowable index of the palette table can be derived by calculating (1 << log2_palette_max_size) - 1 and can be restricted to the range from 1 to 63.
[0205] In this case, the log2_palette_max_size syntax element can be parsed / signaled based on the sps_palette_enabled_flag syntax element which is information for indicating whether the palette mode is enabled. For example, when the value of the sps_palette_enabled_flag is 1 (i.e., when it indicates that the palette mode is enabled), the log2_palette_max_size syntax element can be parsed / signaled.
[0206] Alternatively, as an implementation, the information indicating the palette size (e.g., log2_palette_CU_size_TH1, log2_palette_max_size_TH1, log2_palette_max_size_default) can be signaled through the SPS as shown in Table 17 below.
[0207] [Table 17]
[0208]
[0209] The semantics of the syntax elements included in the syntax of Table 17 can be represented as shown in Table 18 below.
[0210] [Table 18]
[0211]
[0212] Referring to Tables 17 and 18, the log2_palette_CU_size_TH1, log2_palette_max_size_TH1, and log2_palette_max_size_default syntax elements can be parsed / signaled in the SPS.
[0213] The log2_palette_CU_size_TH1 syntax element indicates the log2 value of the size limit of palette_max_size_TH1, and palette_max_size_TH1 can be derived as 1 << log2_Palette_CU_size_TH1.
[0214] The log2_palette_max_size_TH1 syntax element indicates the log2 value of (palette_max_size_TH1 + 1), and palette_max_size_TH1 can be derived as (1 << log2_palette_max_size_TH1) - 1. palette_max_size_TH1 indicates the maximum allowable index of the palette table for CUs with a size larger than Palette_CU_size_TH1 and can be restricted to the range of 1 to 63.
[0215] The log2_palette_max_size_default syntax element indicates the log2 value of (palette_max_size_default + 1), and palette_max_size_default can be derived as (1 << log2_palette_max_size_default) - 1. palette_max_size_default indicates the maximum allowable index of the palette table and can be restricted to the range of 1 to 63.
[0216] Here, the log2_palette_CU_size_TH1, log2_palette_max_size_TH1, and log2_palette_max_size_default syntax elements can be parsed / signaled based on the sps_palette_enabled_flag syntax element which is information indicating whether the palette mode is enabled. For example, when the value of sps_palette_enabled_flag is 1 (i.e., when it indicates that the palette mode is enabled), the log2_palette_CU_size_TH1, log2_palette_max_size_TH1, and log2_palette_max_size_default syntax elements can be parsed / signaled.
[0217] In addition, one or more sets of palette_CU_size_TH and palette_max_size_TH can be signaled and used to indicate palette_max_size.
[0218] The following figures were created to illustrate specific examples of this disclosure. The names of specific devices or specific terms or names (e.g., names of grammars / grammatical elements, etc.) illustrated in the figures are presented by way of example, and therefore the technical features of this disclosure are not limited to the specific names used in the figures below.
[0219] Figure 9 An example of a video / image encoding method according to an embodiment of the present disclosure is illustrated.
[0220] It can be by Figure 2 The encoding device 200 shown in the example performs... Figure 9 The method disclosed in the document. Specifically, Figure 9 Steps S900 and S910 in the process can be performed by Figure 2 The predictor 220 illustrated in the example executes, and Figure 9 Step S920 in the process can be performed by Figure 2 The entropy encoder 240 illustrated in the example is executed. Additionally... Figure 9 The methods disclosed herein may include the embodiments described above. Therefore, in Figure 9 In the text, omissions or simplifications will be made for pairs of characters. Figure 9 A detailed description of the redundant parts of the above-described embodiments.
[0221] Reference Figure 9 The encoding device can deduce the escape value (S900) in the palette mode of the current block.
[0222] As an implementation, the encoding device can determine the prediction mode for the current block and perform prediction. For example, the encoding device can determine whether to perform inter-frame prediction or intra-frame prediction for the current block. Alternatively, the encoding device can determine whether to perform prediction for the current block based on a CIIP mode, an IBC mode, or a palette mode. The encoding device can determine the prediction mode based on RD cost. The encoding device can perform prediction according to the determined prediction mode to derive prediction samples for the current block. In addition, the encoding device can generate and encode information related to the prediction applied to the current block (e.g., prediction mode information).
[0223] When performing palette pattern-based prediction on the current block, the encoding device can apply the palette pattern encoding disclosed in the above embodiments. That is, the encoding device can deduce palette entries, palette indices, escape values, etc., by applying palette pattern encoding to the current block.
[0224] As an example, the encoding device can generate palette entry information based on the sample values of the current block. That is, the encoding device can deduce the palette predictor entries and palette entry reuse information used in blocks encoded in previous palette modes to configure the palette table, and can deduce the palette entries for the current block. For example, as shown in Tables 5 and 6, the encoding device can deduce palette entry information such as `palette_predictor_run`, `num_signalled_palette_entries`, and `new_palette_entries` for configuring the palette table.
[0225] Additionally, the encoding device can generate palette index information for the current block based on palette entry information. That is, the encoding device can derive the palette index value for each sample and configure the palette index map while traversing the samples of the current block in the traversal scanning direction (vertical or horizontal). For example, as shown in Tables 5 and 6 above, the encoding device can derive palette entry information such as `palette_transpose_flag`, `palette_idx_idc`, `copy_above_indices_for_final_run_flag`, and `num_palette_indices_minus1` for configuring the palette index map.
[0226] Here, the palette table may include representative color values (palette entries) for samples in the current block, and may consist of palette index values corresponding to the respective color values. That is, the encoding device can derive the palette index value corresponding to the entry (color value) in the palette table for each sample in the current block and signal it to the decoding device.
[0227] The encoding device can encode image information, including palette entry information and palette index information, and send a signal to the decoding device.
[0228] In addition, when performing a palette-pattern-based prediction on the current block, the encoding device can derive the escape value of the current block, which includes at least one escape code sample.
[0229] As described above, since it is efficient in terms of coding to send additional sample values for samples with values different from those of neighboring samples in the current block in palette mode, such sample values can be signaled as escape values. In this case, since the escape value is additional data, quantization can be performed to preserve it. Alternatively, no transformation is applied to the escape value in palette mode, and the quantized value can be signaled directly.
[0230] The encoding device can derive the quantization escape value based on the escape value (S910).
[0231] As an implementation method, the encoding device can derive a quantized escape value by applying a quantization parameter for the escape value to the escape value.
[0232] Here, the quantization parameters can be derived based on the minimum quantization parameter information regarding the transform skip mode. For example, the quantization parameters can be derived based on the minimum quantization parameter information (e.g., min_qp_prime_ts_minus4) for the transform skip modes shown in Tables 7 to 9. As mentioned above, since no transform is applied to the escape values in the palette mode, the escape values can be quantized based on the minimum quantization parameter information used in the transform skip mode.
[0233] As a specific example, as shown in Table 9, firstly, the encoding device can derive the minimum quantization parameter value (e.g., QpPrimeTsMin) based on the minimum quantization parameter information regarding the transform skip mode (e.g., min_qp_prime_ts_minus4). Alternatively, the encoding device can select the larger value between the minimum quantization parameter value (e.g., QpPrimeTsMin) and the quantization parameter Qp (Qp'Y for the luminance component and Qp'Cb or Qp'Cr for the chrominance component) and use it as the quantization parameter in the palette mode.
[0234] In other words, the quantization parameter in palette mode can have a value greater than or equal to the value of the minimum quantization parameter derived from the minimum quantization parameter information about the transform skip mode (e.g., min_qp_prime_ts_minus4) (e.g., min_qp_prime_ts_minus4).
[0235] The encoding device can derive quantization escape values using the quantization parameters in palette mode as described above. The encoding device can generate the quantization escape values as the `palette_escape_val` syntax elements shown in Tables 5 and 6, and signal them accordingly. Additionally, the encoding device can generate information indicating the presence of a sample with an escape value in the current block (e.g., `palette_escape_val_present_flag`), and signal it accordingly.
[0236] According to the implementation method, the encoding device can limit the quantized escape values to a specific range. Since the escape values have characteristics different from those of neighboring samples, they are quantized and directly signaled. However, this can lead to errors due to quantization. To reduce such errors and encode more accurate values, the range of quantized escape values can be limited based on bit depth.
[0237] For example, the range of information on quantization escape values can be determined based on the bit depths shown in Tables 10 and 11, and can be restricted so that it is not greater than, for example, (1 << BitDepth) – 1. Additionally, the bit depth can include a bit depth BitDepthY for the luminance component and a bit depth BitDepthC for the chrominance component. Here, the range of quantization escape value information for the luminance component can have values between 0 and (1 << BitDepthY) – 1, and the range of quantization escape value information for the chrominance component can have values between 0 and (1 << BitDepthC) – 1.
[0238] Additionally, in one embodiment, the encoding device can define the number of entries in the palette table (i.e., the number of indices of the palette table) and signal it to the decoding device. That is, the encoding device can determine the palette size information regarding the maximum index of the palette table and signal it. The palette size information can be a preset value or can be determined based on the size of the coding unit.
[0239] For example, the palette size can be represented as palette_max_size as shown in Table 12, can be the same for the entire sequence, or can be determined differently according to the CU size (i.e., the number of pixels in the CU).
[0240] For example, the palette size can be represented as palette_max_size as shown in Tables 13 and 14 and can be signaled via the SPS. In this case, the palette size (e.g., palette_max_size) can indicate the maximum allowable index of the palette table and can be restricted within the range from 1 to 63. Additionally, the palette size (e.g., palette_max_size) can be signaled based on the information (e.g., sps_palette_enabled_flag) for indicating whether the palette mode is enabled.
[0241] Additionally, for example, the palette size can be represented as log2_palette_max_size as shown in Tables 15 and 16, and can be signaled via the SPS. In this case, the palette size (e.g., log2_palette_max_size) can indicate the log2 value of the palette size (i.e., palette_max_size + 1). Thus, palette_max_size, which indicates the maximum allowable index of the palette table, can be derived by calculating (1 << log2_palette_max_size) - 1 and can be restricted to the range from 1 to 63. Additionally, the palette size (e.g., log2_palette_max_size) can be signaled based on information (e.g., sps_palette_enabled_flag) indicating whether the palette mode is enabled.
[0242] Additionally, for example, the palette size can be derived based on log2_palette_CU_size_TH1, log2_palette_max_size_TH1, and log2_palette_max_size_default as shown in Tables 17 and 18, and can be signaled via the SPS. Since the specific embodiments of deriving and signaling the palette size have been described above in Tables 17 and 18, the description thereof will be omitted herein.
[0243] The encoding device can encode image information (or video information) (S920). Here, the image information can include various types of information for the above-mentioned palette mode encoding.
[0244] As an example, the encoding device can generate and encode image information including information about quantization escape values. Additionally, the encoding device can generate and encode image information including palette entry information and palette index information. Additionally, the encoding device can generate and encode image information including information about the minimum quantization parameter for the transform skip mode. In this case, the image information can include the SPS, and the SPS can include information about the minimum quantization parameter for the transform skip mode.
[0245] Additionally, according to an embodiment, the encoding device can determine whether to perform encoding on the current block using the above-mentioned palette mode based on information indicating whether the palette mode is enabled.
[0246] For example, as shown in Tables 1 to 4, the encoding device can determine whether the palette mode is enabled, generate information about whether the palette mode is enabled (e.g., sps_palette_enabled_flag) based on the determination, and signal the information via SPS.
[0247] Additionally, the encoding device can generate information indicating whether to encode the current block by applying the palette mode to it, based on information about whether palette mode is enabled (e.g., `sps_palette_enabled_flag`), and signal this information via the encoding unit syntax. For example, when the value of `pred_mode_plt_flag` is 1, the `palette_coding()` syntax can be signaled to apply the palette mode to the current block to generate a reconstructed sample.
[0248] Information about whether palette mode is enabled (e.g., sps_palette_enabled_flag) and information indicating whether the current block is encoded by applying palette mode to the current block (e.g., pred_mode_plt_flag) can be encoded by being included in the image information.
[0249] Image information, including the various types of information described above, can be encoded and output as a bitstream. The bitstream can be sent to a decoding device via a network or (digital) storage medium. Here, the network can include broadcast networks and / or communication networks, and the digital storage medium can include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, and SSD.
[0250] Figure 10 An example of a video / image decoding method according to an embodiment of the present disclosure is illustrated.
[0251] It can be by Figure 3 The decoding device 300 shown in the example performs... Figure 10 The method disclosed in the document. Specifically, Figure 10 Step S1000 in the process can be performed by Figure 3 The entropy decoder 310 illustrated in the example executes, and Figure 10 Steps S1010 and S1020 in the process can be performed by Figure 3 The predictor 330 illustrated in the example executes. Additionally, Figure 10 The methods disclosed herein may include the embodiments described above. Therefore, in Figure 10 In the text, omissions or simplifications will be made for pairs of characters. Figure 10 A detailed description of the redundant parts in the above embodiments.
[0252] Reference Figure 10 The decoding device can receive image information (or video information) from the bit stream (S1000).
[0253] The decoding device can parse the bitstream to deduce the information necessary for image reconstruction (or picture reconstruction) (e.g., video / image information). In this case, the image information may include prediction-related information (e.g., prediction mode information). Additionally, the image information may include various types of information used for the aforementioned palette mode encoding. For example, the image information may include information about quantization escape values, palette entry information, palette index information, minimum quantization parameter information about transform skip modes, etc. That is, the image information may include various types of information required in the decoding process and can be decoded based on encoding methods such as Exponential Golomb coding, CAVLC, or CABAC.
[0254] As an implementation, the decoding device can obtain image information, including quantization escape value information in palette mode, from the bitstream. For example, the quantization escape value information can be the `palette_escape_val` syntax element as shown in Tables 5 and 6. In this case, the quantization escape value information (e.g., `palette_escape_val`) can be obtained based on information indicating whether a sample with an escape value exists in the current block (e.g., `palette_escape_val_present_flag`). For example, when a sample with an escape value exists in the current block (i.e., when the value of `palette_escape_val_present_flag` is 1), the decoding device can obtain the quantization escape value information (e.g., `palette_escape_val`) from the bitstream.
[0255] The decoding device can deduce the escape value of the current block based on the quantization escape value information (S1010).
[0256] As an implementation method, the decoding device can derive the escape value by performing dequantization (scaling) on the quantized escape value based on the quantization parameters.
[0257] Here, the quantization parameters can be derived based on the minimum quantization parameter information regarding the transform skip mode. For example, the quantization parameters can be derived based on the minimum quantization parameter information (e.g., min_qp_prime_ts_minus4) for the transform skip modes shown in Tables 7 to 9. As mentioned above, since no transform is applied to the escape values in the palette mode, the escape values can be quantized based on the minimum quantization parameter information used in the transform skip mode. Here, the minimum quantization parameter information (e.g., min_qp_prime_ts_minus4) for the transform skip mode can be parsed / signaled from the SPS.
[0258] As a specific example, as shown in Table 9, firstly, the decoding device can derive the minimum quantization parameter value (e.g., QpPrimeTsMin) based on the minimum quantization parameter information regarding the transform skip mode (e.g., min_qp_prime_ts_minus4). Additionally, the decoding device can choose the larger value between the minimum quantization parameter value (e.g., QpPrimeTsMin) and the quantization parameter Qp (Qp'Y for the luminance component and Qp'Cb or Qp'Cr for the chrominance component) and use it as the quantization parameter in the palette mode.
[0259] In other words, the quantization parameter in palette mode can have a value greater than or equal to the value of the minimum quantization parameter derived from the minimum quantization parameter information about the transform skip mode (e.g., min_qp_prime_ts_minus4) (e.g., min_qp_prime_ts_minus4).
[0260] The decoding device can derive escape values from quantization escape values based on the quantization parameters in the palette mode derived as described above.
[0261] According to the implementation method, the decoding device can limit the quantized escape values to a specific range. Since the escape values have characteristics different from those of neighboring samples, they are quantized and directly signaled. However, this can lead to errors due to quantization. To reduce such errors and encode more accurate values, the range of quantized escape values can be limited based on bit depth.
[0262] For example, the range of information on quantization escape values can be determined based on the bit depths shown in Tables 10 and 11, and can be restricted so that it is not greater than, for example, (1<<BitDepth)–1. Additionally, the bit depth can include the bit depth BitDepthY for the luminance component and the bit depth BitDepthC for the chrominance component. Here, the range of quantization escape value information for the luminance component can have values between 0 and (1<<BitDepthY)–1, and the range of quantization escape value information for the chrominance component can have values between 0 and (1<<BitDepthC)–1.
[0263] Additionally, in one embodiment, the decoding device can obtain image information including the number of entries in the palette table (i.e., the number of indices of the palette table). That is, the decoding device can obtain image information including palette size information regarding the maximum index of the palette table. Here, the palette size information can be a preset value, or can be determined based on the size of the coding unit.
[0264] For example, the palette size can be represented as palette_max_size as shown in Table 12, can be the same for the entire sequence, or can be determined differently according to the CU size (i.e., the number of pixels in the CU).
[0265] For example, the palette size can be represented as palette_max_size as shown in Tables 13 and 14, and can be parsed / signaled via the SPS. In this case, the palette size (e.g., palette_max_size) can indicate the maximum allowable index of the palette table and can be restricted to a range from 1 to 63. Additionally, the palette size (e.g., palette_max_size) can be parsed / signaled based on information (e.g., sps_palette_enabled_flag) for indicating whether the palette mode is enabled.
[0266] Additionally, for example, the palette size can be represented as log2_palette_max_size as shown in Tables 15 and 16, and can be parsed / signaled via the SPS. In this case, the palette size (e.g., log2_palette_max_size) can indicate the log2 value of the palette size (i.e., palette_max_size + 1). Thus, the palette_max_size, which indicates the maximum allowable index of the palette table, can be derived by calculating (1 << log2_palette_max_size) - 1 and can be restricted within the range from 1 to 63. Additionally, the palette size (e.g., log2_palette_max_size) can be parsed / signaled based on information (e.g., sps_palette_enabled_flag) indicating whether the palette mode is enabled.
[0267] Additionally, for example, the palette size can be derived based on log2_palette_CU_size_TH1, log2_palette_max_size_TH1, and log2_palette_max_size_default as shown in Tables 17 and 18, and can be parsed / signaled via the SPS. Since the specific implementation manners of deriving and parsing / signaling the palette size have been described above in Tables 17 and 18, the description thereof will be omitted herein.
[0268] The decoding device generates reconstructed samples based on the escape value (S1020).
[0269] As an implementation manner, the decoding device can generate reconstructed samples based on the escape value related to the current block including at least one escape-encoded sample. For example, if there is a sample with an escape value in the current block (i.e., when the value of palette_escape_val_present_flag is 1), the decoding device can derive the escape value as described above to generate the reconstructed sample of the escape-encoded sample.
[0270] Additionally, when performing palette-mode-based prediction on the current block (i.e., when the palette mode is applied to the current block), for samples other than the escape-encoded samples in the current block, the decoding device can obtain image information including palette entry information and palette index information, and generate reconstructed samples based on the obtained image information.
[0271] As an example, the decoding device can configure the palette table for the current block based on palette entry information. For instance, palette entry information may include `palette_predictor_run`, `num_signalled_palette_entries`, `new_palette_entries`, etc., as shown in Tables 5 and 6. That is, the decoding device can deduce the palette predictor entries and palette entry reuse information used in blocks encoded in previous palette modes, and deduce the palette entries for the current block to configure the palette table. Alternatively, the decoding device can configure the palette table based on previous palette predictor entries and current palette entries.
[0272] Additionally, the decoding device can configure a palette index map for the current block based on palette index information. For example, the palette index information may include `palette_transpose_flag`, `palette_idx_idc`, `copy_above_indices_for_final_run_flag`, `num_palette_indices_minus1`, etc., as shown in Tables 5 and 6 for configuring the palette index map. That is, the decoding device can configure the palette index map (e.g., `PaletteIndexMap`) based on information indicating the traversal scan direction (vertical or horizontal) (e.g., `palette_transpose_flag`) while simultaneously configuring the palette index map based on information indicating the palette index value of each sample (e.g., `palette_idx_idc`).
[0273] Additionally, the decoding device can deduce sample values for palette entries in the palette table based on the palette index map. The decoding device can then generate reconstructed samples based on the palette index map and the sample values of the palette entries.
[0274] Here, the palette table may include representative color values (palette entries) for samples in the current block, and may consist of palette index values corresponding to the respective color values. Therefore, the decoding device can deduce the sample values (i.e., color values) of the entries in the palette table corresponding to the index values of the palette index map, and generate them as reconstructed sample values for the current block.
[0275] In addition, according to the implementation method, the decoding device can determine whether to use the above-mentioned palette mode to encode the current block based on information about whether the palette mode is enabled.
[0276] For example, as shown in Tables 1 to 4, the decoding device can obtain image information including information about whether the palette mode is enabled (e.g., s_palette_enabled_flag), and based on this information, obtain palette entry information, palette index information, quantization escape value information, etc. from the bitstream.
[0277] Additionally, for example, the decoding device can obtain information from the bitstream indicating whether to encode the current block by applying the palette mode to it, based on information about whether palette mode is enabled (e.g., `sps_palette_enabled_flag`). For instance, when the value of `pred_mode_plt_flag` is 1, the decoding device can also obtain the `palette_coding()` syntax and apply the palette mode to the current block based on the information included in the `palette_coding()` syntax to obtain a reconstructed sample.
[0278] In the exemplary system described above, a method is described according to a flowchart using a series of steps and blocks. However, this disclosure is not limited to a specific order of steps, and some steps may be performed together with different steps and in a different order or simultaneously with the steps described above. In addition, those skilled in the art should understand that the steps shown in the flowchart are not exclusive, and may include other steps, or one or more steps in the flowchart may be deleted without affecting the technical scope of this disclosure.
[0279] The method according to this disclosure can be implemented in software, and the encoding and / or decoding devices according to this disclosure can be included in devices performing image processing, such as TVs, computers, smartphones, set-top boxes, and display devices.
[0280] When the embodiments of this disclosure are implemented in software, the aforementioned methods can be implemented by modules (processes or functions) that perform the aforementioned functions. Modules can be stored in memory and executed by a processor. Memory can be installed inside or outside the processor and can be connected to the processor via various known means. The processor may include application-specific integrated circuits (ASICs), other chipsets, logic circuits, and / or data processing devices. Memory may include read-only memory (ROM), random access memory (RAM), flash memory, memory cards, storage media, and / or other storage devices. In other words, embodiments according to this disclosure can be implemented and executed on a processor, microprocessor, controller, or chip. For example, the functional units illustrated in the figures can be implemented and executed on a computer, processor, microprocessor, controller, or chip. In this case, information about the implementation (e.g., information about instructions) or algorithms can be stored in a digital storage medium.
[0281] Furthermore, the decoding and encoding devices to which this disclosure applies may include: multimedia broadcast transceivers, mobile communication terminals, home theater video devices, digital cinema video devices, surveillance cameras, video chat devices, and real-time communication devices such as video communication, mobile streaming devices, storage media, cameras, video-on-demand (VoD) service providers, over-the-air (OTT) video devices, internet streaming service providers, 3D video devices, virtual reality (VR) devices, augmented reality (AR) devices, image telephony video devices, vehicle terminals (e.g., vehicle (including autonomous vehicle) terminals, aircraft terminals, or ship terminals), and medical video devices; and may be used to process image signals or data. For example, OTT video devices may include game consoles, Blu-ray players, internet-connected TVs, home theater systems, smartphones, tablet PCs, and digital video recorders (DVRs).
[0282] Furthermore, the processing methods to which this disclosure applies can be generated in the form of a computer-executable program and can be stored in a computer-readable recording medium. Multimedia data having the data structure according to this disclosure can also be stored in a computer-readable recording medium. Computer-readable recording media include all kinds of storage devices and distributed storage devices in which computer-readable data is stored. Computer-readable recording media may include, for example, Blu-ray discs (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical data storage devices. Computer-readable recording media also include media embodied in the form of a carrier wave (e.g., transmission over the Internet). Additionally, bitstreams generated by encoding methods can be stored in computer-readable recording media or transmitted via wired or wireless communication networks.
[0283] Alternatively, embodiments of this disclosure can be embodied as a computer program product based on program code, and the program code can be executed on a computer according to embodiments of this disclosure. The program code can be stored on a computer-readable medium.
[0284] Figure 11 An example of a content streaming system to which the embodiments disclosed in this disclosure are applicable is shown.
[0285] Reference Figure 11 The content streaming system using the embodiments of this disclosure may include an encoding server, a streaming server, a network server, a media storage device, a user device, and a multimedia input device.
[0286] An encoding server is used to compress content input from multimedia input devices such as smartphones, cameras, and camcorders into digital data, generate a bitstream, and transmit it to a streaming server. As another example, if the multimedia input device, such as a smartphone, camera, or camcorder, directly generates the bitstream, the encoding server can be omitted.
[0287] Bitstreams can be generated using the encoding methods or bitstream generation methods described herein. Furthermore, the streaming server can temporarily store the bitstream during transmission or reception.
[0288] A streaming server transmits multimedia data to a user's device via a web server based on a user's request. The web server acts as a tool to notify the user of available services. When a user requests a desired service, the web server forwards the request to the streaming server, which then delivers the multimedia data to the user. In this respect, the content streaming system may include a separate control server, which in this case controls the commands / responses between the various devices in the content streaming system.
[0289] A streaming server can receive content from media storage devices and / or encoding servers. For example, if content is received from an encoding server, it can be received in real time. In this case, the streaming server can store the bitstream for a predetermined period of time to provide a smooth streaming service.
[0290] For example, user equipment may include mobile phones, smartphones, laptops, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation systems, board PCs, tablet PCs, ultrabooks, wearable devices (e.g., watch-type terminals (smartwatches), glasses-type terminals (smart glasses), head-mounted displays (HMDs)), digital TVs, desktop computers, digital signage, etc.
[0291] Each server in the content streaming system can be operated as a distributed server, and in this case, the data received by each server can be processed in a distributed manner.
[0292] The claims described in this disclosure can be combined in various ways. For example, the technical features of the method claims of this disclosure can be combined and implemented as a device, and the technical features of the device claims of this disclosure can be combined and implemented as a method. Furthermore, the technical features of the method claims and the device claims of this disclosure can be combined and implemented as a device, and the technical features of the method claims and the device claims of this disclosure can be combined and implemented as a method.
Claims
1. A decoding device for image decoding, the decoding device comprising: Memory; as well as At least one processor connected to the memory, the at least one processor being configured to: Obtain image information, including quantization escape value information in palette mode, from the bitstream; The escape value of the current block is derived based on the quantized escape value information; as well as Reconstructed samples are generated based on the escape values. The escape value is derived based on the quantization escape value information and the quantization parameters. The quantization parameters are derived based on information about the minimum quantization parameters for the transform skip mode, wherein the information about the minimum quantization parameters for the transform skip mode is a syntax element obtained from the sequence parameter set (SPS) included in the image information. The quantization parameter includes a value greater than or equal to the minimum quantization parameter value derived from the information regarding the minimum quantization parameter for the transform skip mode.
2. An encoding device for image encoding, the encoding device comprising: Memory; as well as At least one processor connected to the memory, the at least one processor being configured to: For escape values in the current block derivation palette mode; The quantization escape value is derived based on the escape value; and Encode image information including quantization escape value information. The quantized escape value is derived based on the quantization parameter of the escape value in the current block. The quantization parameters are derived based on information about the minimum quantization parameters for the transform skip mode. This information is comprised of syntax elements included in the Sequence Parameter Set (SPS), which is incorporated into the image information. The quantization parameter includes a value greater than or equal to the minimum quantization parameter value derived from the information regarding the minimum quantization parameter for the transform skip mode.
3. An apparatus for transmitting data for an image, the apparatus comprising: At least one processor configured to obtain a bitstream, wherein the bitstream is generated based on the following operations: deriving escape values for a current block in a palette mode, deriving quantization escape values based on the escape values, and encoding image information including quantization escape value information to generate the bitstream; and A transmitter configured to send the data including the bit stream. Specifically, the quantized escape value is derived based on the quantization parameter of the escape value in the current block. The quantization parameters are derived based on information about the minimum quantization parameters for the transform skip mode, wherein the information about the minimum quantization parameters for the transform skip mode is a syntax element included in the Sequence Parameter Set (SPS), which is included in the image information. The quantization parameter includes a value greater than or equal to the minimum quantization parameter value derived from the information regarding the minimum quantization parameter for the transform skip mode.