Computer-implemented method of video coding and computer-readable storage medium
By determining the maximum transform size of the prediction block in video coding and representing this size in SPS, the transformation process of the prediction residual is skipped, thus optimizing video coding, overcoming the limitations of coding efficiency improvement in existing technologies, and achieving higher compression efficiency and subjective quality.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIBABA (CHINA) CO LTD
- Filing Date
- 2020-08-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing video coding standards have limitations in improving coding efficiency, especially in terms of further improvements based on the High Efficiency Video Coding (HEVC/H.265) standard.
By determining the maximum transform size of the prediction block and representing this maximum transform size in the Sequence Parameter Set (SPS), the transformation process of the prediction residuals can be skipped, thus optimizing the video coding process.
It improves the compression efficiency of video encoding, achieves the same subjective quality as HEVC/H.265 under the same bandwidth, and enhances encoding performance.
Smart Images

Figure CN119996692B_ABST
Abstract
Description
[0001] Cross-references to related applications
[0002] This application claims priority to U.S. Provisional Application No. 62 / 899,738, filed September 12, 2019, and U.S. Provisional Application No. 62 / 904,880, filed September 24, 2019, both of which are incorporated herein by reference. Background Technology
[0003] Video is a set of still images (or "frames") that capture visual information. To reduce storage memory and transmission bandwidth, video can be compressed before storage or transmission and decompressed before display. The compression process is usually called encoding, and the decompression process is usually called decoding. Currently, there are various video coding formats that use standardized video coding technologies, the most common being those based on prediction, transform, quantization, entropy coding, and in-loop filtering. Video coding standards that specify specific video coding formats, such as the High Efficiency Video Coding (HEVC / H.265) standard, the Universal Video Coding (VVC / H.266) standard, and the AVS standard, are developed by standardization organizations. As more and more advanced video coding technologies are adopted in video standards, the coding efficiency of new video coding standards is also increasing. Summary of the Invention
[0004] This application provides a video processing method and apparatus. In one example embodiment, a method includes: determining a transformation process that skips prediction residuals based on the maximum transform size of a prediction block; and representing the maximum transform size as a signal in a sequence parameter set (SPS).
[0005] In another embodiment, an apparatus includes a memory configured to store instructions and a processor configured to cause the apparatus to execute the following instructions: determining a transformation process that skips the prediction residual based on the maximum transformation size of the prediction block; and representing the maximum transformation size with a signal in a sequence parameter set (SPS).
[0006] In another example embodiment, a non-transitory computer-readable medium stores a set of instructions executable by at least one processor of the device to cause the device to perform a method. The method includes: determining a transformation process that skips prediction residuals based on a maximum transformation size of a prediction block; and representing the maximum transformation size as a signal in a sequence parameter set (SPS).
[0007] In another example embodiment, a method includes: receiving a bitstream of a video sequence; determining a maximum transform size for a prediction block based on a sequence parameter set (SPS) of the video sequence; and determining, based on the maximum transform size, a transform process to skip the prediction residuals of the prediction block.
[0008] In another embodiment, an apparatus includes a memory configured to store instructions and a processor configured to cause the apparatus to execute the following instructions: receiving a bitstream of a video sequence; determining a maximum transform size for a prediction block based on a sequence parameter set (SPS) of the video sequence; and determining, based on the maximum transform size, to skip a transform process for the prediction residuals of the prediction block.
[0009] In another example embodiment, a non-transitory computer-readable medium stores a set of instructions executable by at least one processor of the device to cause the device to perform a method. The method includes: receiving a bitstream of a video sequence; determining a maximum transform size for a prediction block based on a sequence parameter set (SPS) of the video sequence; and determining, based on the maximum transform size, a transform process to skip the prediction residuals of the prediction block. Attached Figure Description
[0010] Embodiments and aspects of this application are illustrated in the following detailed description and accompanying drawings. Various features shown in the figures are not drawn to scale.
[0011] Figure 1 This is a schematic diagram of an example video sequence structure according to some embodiments of this application.
[0012] Figure 2A A schematic diagram of an example encoding process for a hybrid video coding system consistent with embodiments of this application is shown.
[0013] Figure 2B A schematic diagram of another example encoding process of a hybrid video coding system consistent with the embodiments of this application is shown.
[0014] Figure 3A A schematic diagram of an example decoding process for a hybrid video coding system consistent with embodiments of this application is shown.
[0015] Figure 3B A schematic diagram of another example decoding process of a hybrid video coding system consistent with the embodiments of this application is shown.
[0016] Figure 4 A block diagram of an example apparatus for encoding or decoding video according to some embodiments of this application is shown.
[0017] Figure 5 Table 1 illustrates example syntax structures of Sequence Parameter Sets (SPS) according to some embodiments of this application.
[0018] Figure 6 Table 2 illustrates example syntax structures of Sequence Parameter Sets (SPS) according to some embodiments of this application.
[0019] Figure 7 Table 3 shows example syntactic structures of transformation units according to some embodiments of this application.
[0020] Figure 8 An example syntactic structure diagram relating to the transmit packet differential pulse code modulation (BDPCM) mode according to some embodiments of this application is shown.
[0021] Figure 9 Table 5 shows another example syntax structure of SPS according to some embodiments of this application.
[0022] Figure 10 Table 6 shows another example syntax structure of the transformation unit according to some embodiments of this application.
[0023] Figure 11 This is a schematic diagram of an example diagonal scan of a 64×64 transform block (TB) according to some embodiments of this application.
[0024] Figure 12A-12D Example residual units (RUs) according to some embodiments of this application are shown.
[0025] Figure 13 This is a schematic diagram illustrating a 64×64TB diagonal scan example according to some embodiments of this application, wherein the TB is divided into four 32×32RUs.
[0026] Figures 14A-14D Table 7, according to some embodiments, illustrates an example syntax structure diagram for residual coding when the TB is partitioned into RUs.
[0027] Figures 15A-15D Table 8, which illustrates some embodiments according to this application, shows another example syntax structure for residual decoding.
[0028] Figure 16 Table 9, which shows example parameter values derived from a chroma format, is illustrated according to some embodiments of this application.
[0029] Figure 17 Table 10 shows some embodiments according to this application, illustrating an example syntax structure for residual coding to perform inverse level mapping, General Video Coding Draft 6.
[0030] Figure 18 This is a flowchart of an example decoding method according to some embodiments of this application.
[0031] Figure 19 Table 11, showing some embodiments according to this application, illustrates an example syntax structure diagram for residual decoding without performing inverse level mapping.
[0032] Figure 20 Table 12, according to some embodiments of this application, shows an example lookup table for selecting Rice parameters.
[0033] Figure 21 A flowchart illustrating an example process for video processing according to some embodiments of this application is shown.
[0034] Figure 22 A flowchart illustrating another example process for video processing according to some embodiments of this application is shown. Detailed Implementation
[0035] Reference can now be made to exemplary embodiments, examples of which are shown in the accompanying drawings. The following description refers to the accompanying drawings, wherein the same numbers in different drawings denote the same or similar elements unless otherwise stated. The embodiments described in the following example embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with the aspects related to this application as described in the appended claims. Specific aspects of this application are described in more detail below. In case of conflict with terms and / or definitions incorporated by reference, the terms and definitions provided herein shall prevail.
[0036] The Joint Video Experts Group (JVET) of the ITU-T Video Coding Experts Group (ITU-T VCEG) and the ISO / IEC Moving Picture Experts Group (ISO / IEC MPEG) is currently developing the Universal Video Coding (VVC / H.266) standard. The VVC standard aims to double the compression efficiency of its predecessor, the High Efficiency Video Coding (HEVC / H.265) standard. In other words, VVC aims to achieve the same subjective quality as HEVC / H.265 using half the bandwidth.
[0037] To achieve the same subjective quality as HEVC / H.265 using half the bandwidth, JVET has been developing techniques beyond HEVC using the Joint Exploratory Model (JEM) reference software. With the incorporation of coding techniques into JEM, JEM achieves higher coding performance than HEVC.
[0038] The VVC standard was developed recently and continues to include more coding techniques that provide better compression performance. VVC is based on the same hybrid video coding system used in modern video compression standards such as HEVC, H.264 / AVC, MPEG2, H.263, etc.
[0039] Video is a set of still images (or "frames") arranged in a time sequence to store visual information. Video capture devices (e.g., cameras) can be used to capture and store these images in chronological order, and video playback devices (e.g., televisions, computers, smartphones, tablets, video players, or any end-user terminal with a display capability) can be used to display such images in chronological order. Furthermore, in some applications, video capture devices can transmit captured video in real time to video playback devices (e.g., computers with monitors), such as for surveillance, conferencing, or live streaming.
[0040] To reduce the storage space and transmission bandwidth required for such applications, video can be compressed before storage and transmission, and decompressed before display. Compression and decompression can be implemented by software executed by a processor (e.g., a processor in a general-purpose computer) or dedicated hardware. The module used for compression is typically called an "encoder," and the module used for decompression is typically called a "decoder." Encoders and decoders can be collectively referred to as a "codec." Encoders and decoders can be implemented as any of a variety of suitable hardware, software, or combinations thereof. For example, hardware implementations of encoders and decoders can include circuits such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, or any combination thereof. Software implementations of encoders and decoders can include program code, computer-executable instructions, firmware, or any suitable computer-implemented algorithm or process embedded in a computer-readable medium. Video compression and decompression can be implemented using various algorithms or standards, such as MPEG-1, MPEG-2, MPEG-4, H.26x series, etc. In some applications, a codec can decompress video from a first encoding standard and recompress the decompressed video using a second encoding standard; in this case, the codec can be called a "transcoder."
[0041] Video encoding processes identify and retain useful information that can be used to reconstruct images, while ignoring unimportant information during the reconstruction process. If the ignored, unimportant information cannot be fully reconstructed, this encoding process can be called "lossy." Otherwise, it can be called "lossless." Most encoding processes are lossy, a trade-off made to reduce required storage space and transmission bandwidth.
[0042] Useful information about the image being encoded (referred to as the "current image") includes changes relative to a reference image (e.g., a previously encoded and reconstructed image). These changes can include variations in pixel position, brightness, or color, with positional changes being of most interest. The positional changes of a set of pixels representing an object can reflect the object's movement between the reference and current images.
[0043] An image encoded without referencing another image (i.e., it is its own reference image) is called an "I-image". An image encoded using a previous image as a reference image is called a "P-image". An image encoded using both a previous image and a future image as reference images (i.e., the reference is "bidirectional") is called a "B-image".
[0044] Figure 1 The illustration shows the structure of an example video sequence 100 according to some embodiments of this application. The video sequence 100 can be live video or video that has been captured and archived. The video 100 can be real-life video, computer-generated video (e.g., computer game video), or a combination thereof (e.g., real-life video with augmented reality effects). The video sequence 100 can be input from a video capture device (e.g., a camera), a video archive containing previously captured video (e.g., a video file stored on a storage device), or a video input interface (e.g., a video broadcast transceiver) to receive video from a video content provider.
[0045] like Figure 1 As shown, video sequence 100 may include a series of images arranged chronologically along a timeline, including images 102, 104, 106, and 108. Images 102-106 are consecutive, with more images between images 106 and 108. Figure 1 In this diagram, image 102 is an I-image, and its reference image is image 102 itself. Image 104 is a P-image, as indicated by the arrow, and its reference image is image 102. Image 106 is a B-image, as indicated by the arrow, and its reference images are images 104 and 108. In some embodiments, the reference image of an image (e.g., image 104) may not be directly located before or after the image. For example, the reference image of image 104 may be an image preceding image 102. It should be noted that the reference images of images 102-106 are merely examples, and this application does not limit the embodiments of the reference images to... Figure 1 The example shown.
[0046] Due to the computational complexity of such tasks, video codecs typically do not encode or decode the entire image at once. Instead, they can segment the image into basic segments and encode or decode each segment sequentially. In this application, these basic segments are referred to as Basic Processing Units (“BPUs”). For example, Figure 1Structure 110 illustrates an example structure of a picture (e.g., any one of pictures 102-108) of video sequence 100. In structure 110, the picture is divided into 4×4 basic processing units, the boundaries of which are shown as dashed lines. In some embodiments, the basic processing unit may be referred to as a “macroblock” in some video coding standards (e.g., MPEG series, H.261, H.263, or H.264 / AVC), or as a “coding tree unit” (“CTU”) in some other video coding standards (e.g., H.265 / HEVC or H.266 / VVC). The basic processing units in the picture can have different sizes, such as 128×128, 64×64, 32×32, 16×16, 4×8, 16×32, or pixels of any shape and size. The size and shape of the basic processing units for the picture can be selected based on a balance between coding efficiency and the level of detail to be preserved in the basic processing units.
[0047] A basic processing unit can be a logical unit that may include a set of different types of video data stored in computer memory (e.g., in a video frame buffer). For example, a basic processing unit for a color picture may include a luminance component (Y) representing achromatic luminance information, one or more chrominance components (e.g., Cb and Cr) representing color information, and associated syntax elements, where the luminance and chrominance components may have basic processing units of the same size. In some video coding standards (e.g., H.265 / HEVC or H.266 / VVC), the luminance and chrominance components may be referred to as “code tree blocks” (“CTBs”). Any operation performed on a basic processing unit can be repeated for each of its luminance and chrominance components.
[0048] Video encoding involves multiple operational stages, examples of which are shown in... Figure 2A-2B and Figures 3A-3BAs shown in the diagram. For each stage, the size of the basic processing unit may still be too large to process, and therefore can be further divided into segments referred to herein as "basic processing subunits". In some embodiments, the basic processing subunit may be referred to as a "block" in some video coding standards (e.g., MPEG series, H.261, H.263, or H.264 / AVC), or as a "coding unit" ("CU") in some other video coding standards (e.g., H.265 / HEVC or H.266 / VVC). The basic processing subunit may have the same or smaller size as the basic processing unit. Similar to the basic processing unit, the basic processing subunit is also a logical unit, which may include storage in computer memory (e.g., in a video frame buffer). Any operation performed on the basic processing subunit may be repeated for each of its luma and chroma components. It should be noted that this division can be performed at further levels as needed for processing. It should also be noted that different stages may use different schemes to divide the basic processing units.
[0049] For example, in the pattern determination phase (examples of which are in...) Figure 2B As shown in the diagram, the encoder can decide which prediction mode to use for a basic processing unit (e.g., intra-image prediction or inter-image prediction), even if the basic processing unit is too large to make such a decision. The encoder can break down the basic processing unit into multiple basic processing subunits (e.g., CUs in H.265 / HEVC or H.266 / VVC) and determine the prediction type for each individual basic processing subunit.
[0050] For another example, in the prediction phase (the example is in...) Figure 2A-2B As shown in the diagram, the encoder can perform prediction operations at the level of a basic processing subunit (e.g., a CU). However, in some cases, the basic processing subunit may still be too large to process. The encoder can further break down the basic processing subunit into smaller segments (e.g., referred to as "prediction blocks" or "PBs" in H.265 / HEVC or H.266 / VVC), at which prediction operations can be performed.
[0051] For another example, in the transformation phase (the example of which is in...) Figure 2A-2BAs shown in the diagram, the encoder can perform transform operations on residual basic processing subunits (e.g., CUs). However, in some cases, these basic processing subunits may still be too large to process. The encoder can further divide the basic processing subunits into smaller segments (e.g., referred to as "transform blocks" or "TBs" in H.265 / HEVC or H.266 / VVC), at which transform operations can be performed. It should be noted that the partitioning scheme of the same basic processing subunit can differ between the prediction and transform phases. For example, in H.265 / HEVC or H.266 / VVC, the prediction blocks and transform blocks of the same CU can have different sizes and numbers.
[0052] exist Figure 1 In structure 110, the basic processing unit 112 is further divided into 3×3 basic processing sub-units, the boundaries of which are shown by dashed lines. In different schemes, different basic processing units of the same image can be divided into different basic processing sub-units.
[0053] In some implementations, to provide parallel processing and fault tolerance for video encoding and decoding, an image can be divided into multiple processing regions. This allows the encoding or decoding process for a particular region of the image to be independent of information from any other region. In other words, each region of the image can be processed independently. In this way, the codec can process different regions of the image in parallel, thereby improving encoding efficiency. Furthermore, when data in one region is corrupted during processing or lost during network transmission, the codec can correctly encode or decode other regions of the same image without relying on the corrupted or lost data, thus providing fault tolerance. In some video coding standards, images can be divided into different types of regions. For example, H.265 / HEVC and H.266 / VVC provide two types of regions: "slices" and "pastes." It should also be noted that different images in the video sequence 100 can have different partitioning schemes for dividing the images into regions.
[0054] For example, in Figure 1 In the diagram, structure 110 is divided into three regions 114, 116, and 118, whose boundaries are shown as solid lines within structure 110. Region 114 comprises four basic processing units. Each of regions 116 and 118 comprises six basic processing units. It should be noted that... Figure 1 The basic processing unit, basic processing sub-unit, and region of structure 110 in this application are merely examples, and this application does not limit its implementation.
[0055] Figure 2A A schematic diagram of an example encoding process 200A consistent with embodiments of this application is shown. For example, encoding process 200A can be performed by an encoder. Figure 2A As shown, the encoder can encode the video sequence 202 into a video stream 228 according to the encoding process 200A. Similar to... Figure 1 Video sequence 100 and video sequence 202 may include a set of images arranged in chronological order (referred to as "original images"). Similar to... Figure 1 In structure 110, the encoder can divide each raw image of video sequence 202 into a basic processing unit, a basic processing subunit, or a region for processing. In some embodiments, the encoder can perform encoding process 200A at the basic processing unit level for each raw image of video sequence 202. For example, the encoder can perform encoding process 200A iteratively, wherein the encoder can encode a basic processing unit in one iteration of process 200A. In some embodiments, the encoder can perform process 200A in parallel for a region (e.g., region 114-118) of each raw image of video sequence 202.
[0056] exist Figure 2A In this process, the encoder can input the basic processing unit (referred to as the "raw BPU") of the original image of video sequence 202 into prediction stage 204 to generate prediction data 206 and prediction BPU 208. The encoder can subtract prediction BPU 208 from the raw BPU to generate residual BPU 210. The encoder can input residual BPU 210 into transform stage 212 and quantization stage 214 to generate quantization transform coefficients 216. The encoder can input prediction data 206 and quantization transform coefficients 216 into binary encoding stage 226 to generate video bitstream 228. Components 202, 204, 206, 208, 210, 212, 214, 216, 226, and 228 can be referred to as the "forward path". During process 200A, after quantization stage 214, the encoder can input quantization transform coefficients 216 into inverse quantization stage 218 and inverse transform stage 220 to generate reconstructed residual BPU 222. The encoder can add the reconstruction residual BPU 222 to the prediction BPU 208 to generate a prediction reference 224, which is used in the next iteration of process 200A in the prediction phase 204. Components 218, 220, 222, and 224 of process 200A can be referred to as the "reconstruction path". The reconstruction path can be used to ensure that both the encoder and decoder use the same reference data for prediction.
[0057] The encoder can iteratively execute process 200A to encode each raw BPU (in the forward path) of the original image and generate a prediction reference 224 for encoding the next raw BPU (in the reconstruction path) of the original image. After encoding all raw BPUs of the original image, the encoder can continue to encode the next image in the video sequence 202.
[0058] Referring to process 200A, the encoder may receive a video sequence 202 generated by a video capture device (e.g., a camera). The term "receive" as used herein may refer to any action that receives, inputs, acquires, retrieves, obtains, reads, accesses, or in any way is used to input data.
[0059] In prediction phase 204, during the current iteration, the encoder can receive the original BPU and prediction reference 224, and perform prediction operations to generate prediction data 206 and prediction BPU 208. Prediction reference 224 can be generated from the reconstruction path of the previous iteration of process 200A. The purpose of prediction phase 204 is to reduce information redundancy by extracting prediction data 206 from prediction data 206 and prediction reference 224 that can be used to reconstruct the original BPU into prediction BPU 208.
[0060] Ideally, the predicted BPU 208 should be identical to the original BPU. However, due to non-ideal prediction and reconstruction operations, the predicted BPU 208 is typically slightly different from the original BPU. To record this difference, after generating the predicted BPU 208, the encoder can subtract it from the original BPU to generate the residual BPU 210. For example, the encoder can subtract the value of the corresponding pixel in the original BPU from the value (e.g., grayscale or RGB value) of the pixel corresponding to the predicted BPU 208. Each pixel in the residual BPU 210 can have a residual value as the result of this subtraction between the corresponding pixel in the original BPU and the predicted BPU 208. Compared to the original BPU, the predicted data 206 and the residual BPU 210 can have fewer bits, but they can be used to reconstruct the original BPU without significantly reducing quality, thus compressing the original BPU.
[0061] To further compress the residual BPU 210, in the transform phase 212, the encoder can reduce its spatial redundancy by decomposing the residual BPU 210 into a set of two-dimensional “fundamental patterns,” each of which is associated with “transform coefficients.” The fundamental patterns can have the same size (e.g., the size of the residual BPU 210). Each fundamental pattern can represent a frequency component of the residual BPU 210 (e.g., the frequency of brightness variation). No fundamental pattern can be reproduced from any combination of any other fundamental patterns (e.g., a linear combination). In other words, the decomposition decomposes the variations of the residual BPU 210 into the frequency domain. This decomposition is analogous to the discrete Fourier transform of a function, where the fundamental patterns are analogous to the basis functions of the discrete Fourier transform (e.g., trigonometric functions), and the transform coefficients are analogous to the coefficients associated with the basis functions.
[0062] Different transform algorithms can use different base modes. Various transform algorithms, such as discrete cosine transform, discrete sine transform, etc., can be used in transform stage 212. The transform in transform stage 212 is reversible. That is, the encoder can recover the residual BPU 210 through the inverse operation of the transform (called the "inverse transform"). For example, to recover the pixels of the residual BPU 210, the inverse transform can be to multiply the value of the corresponding pixel in the base mode by the corresponding correlation coefficient and sum the products to produce a weighted sum. For video coding standards, both the encoder and decoder can use the same transform algorithm (and therefore the same base mode). Therefore, the encoder can only record the transform coefficients, and the decoder can reconstruct the residual BPU 210 from the transform coefficients without receiving the base mode from the encoder. Compared to the residual BPU 210, the transform coefficients can have fewer bits, but they can be used to reconstruct the residual BPU 210 without significantly degrading the quality. Therefore, the residual BPU 210 is further compressed.
[0063] The encoder can further compress the transform coefficients in the quantization stage 214. During the transform process, different fundamental modes can represent different change frequencies (e.g., brightness change frequencies). Since the human eye is generally better at recognizing low-frequency changes, the encoder can ignore information about high-frequency changes without causing a significant degrade in decoding quality. For example, in the quantization stage 214, the encoder can generate quantized transform coefficients 216 by dividing each transform coefficient by an integer value (called a "quantization parameter") and rounding the quotient to its nearest integer. This operation converts some transform coefficients of high-frequency fundamental modes to zero, while transform coefficients of low-frequency fundamental modes can be converted to smaller integers. The encoder can ignore zero-valued quantized transform coefficients 216, further compressing the transform coefficients. The quantization process is also reversible, where the quantized transform coefficients 216 can be reconstructed into transform coefficients in the inverse operation of quantization (called "inverse quantization").
[0064] Because the encoder ignores the remainder of such division during rounding operations, quantization stage 214 can be lossy. Typically, quantization stage 214 can cause the greatest information loss in process 200A. The greater the information loss, the fewer bits the quantization transform coefficients 216 may require. To obtain different levels of information loss, the encoder can use different quantization parameter values or any other parameters in the quantization process.
[0065] In the binary encoding stage 226, the encoder may encode the prediction data 206 and the quantization transform coefficients 216 using binary encoding techniques such as entropy coding, variable-length coding, arithmetic coding, Huffman coding, context-adaptive binary arithmetic coding, or any other lossless or lossy compression algorithm. In some embodiments, in addition to the prediction data 206 and the quantization transform coefficients 216, the encoder may encode other information in the binary encoding stage 226, such as the prediction mode used in the prediction stage 204, the parameters of the prediction operation, the transform type of the transform stage 212, the parameters of the quantization process (e.g., quantization parameters), encoder control parameters (e.g., bitrate control parameters), etc. The encoder may use the output data of the binary encoding stage 226 to generate a video stream 228. In some embodiments, the video stream 228 may be further packaged for network transmission.
[0066] Referring to the reconstruction path of process 200A, in the inverse quantization stage 218, the encoder can perform inverse quantization on the quantized transform coefficients 216 to generate reconstructed transform coefficients. In the inverse transform stage 220, the encoder can generate reconstructed residual BPU 222 based on the reconstructed transform coefficients. The encoder can add the reconstructed residual BPU 222 to the predicted BPU 208 to generate a prediction reference 224 that will be used in the next iteration of process 200A.
[0067] It should be noted that other variations of process 200A can also be used to encode video sequence 202. In some embodiments, the encoder may execute the various stages of process 200A in a different order. In some embodiments, one or more stages of process 200A may be combined into a single stage. In some embodiments, a single stage of process 200A may be divided into multiple stages. For example, transform stage 212 and quantization stage 214 may be combined into a single stage. In some embodiments, process 200A may include additional stages. In some embodiments, process 200A may be omitted. Figure 2A One or more stages in the process.
[0068] Figure 2B A schematic diagram of another example encoding process 200B consistent with embodiments of this application is shown. Process 200B can be modified from process 200A. For example, process 200B can be used by an encoder conforming to a hybrid video coding standard (e.g., H.26x series). Compared to process 200A, the forward path of process 200B additionally includes a mode determination stage 230 and divides the prediction stage 204 into a spatial prediction stage 2042 and a temporal prediction stage 2044. The reconstruction path of process 200B additionally includes a loop filter stage 232 and a buffer 234.
[0069] Generally, prediction techniques can be categorized into two types: spatial prediction and temporal prediction. Spatial prediction (e.g., intra-image prediction or "intra-frame prediction") uses pixels from one or more encoded neighboring BPUs within the same image to predict the current BPU. That is, the prediction reference 224 in spatial prediction can include neighboring BPUs. Spatial prediction can reduce the inherent spatial redundancy of images. Temporal prediction (e.g., inter-image prediction or "inter-frame prediction") uses regions from one or more encoded images to predict the current BPU. That is, the prediction reference 224 in temporal prediction can include encoded images. Temporal prediction can reduce the inherent temporal redundancy of images.
[0070] In reference process 200B, during the forward path, the encoder performs prediction operations in spatial prediction phase 2042 and temporal prediction phase 2044. For example, in spatial prediction phase 2042, the encoder may perform intra-frame prediction. For the original BPU of the picture being encoded, prediction reference 224 may include one or more adjacent BPUs that have already been encoded (in the forward path) and reconstructed (in the reconstruction path) in the same picture. The encoder can generate the predicted BPU 208 by extrapolating the adjacent BPUs. Extrapolation techniques may include, for example, linear extrapolation or interpolation, polynomial extrapolation or interpolation, etc. In some embodiments, the encoder may perform extrapolation at the pixel level, for example by extrapolating the value of a corresponding pixel for each pixel of the predicted BPU 208. The adjacent BPUs used for extrapolation may be positioned relative to the original BPU from various directions, such as in the vertical direction (e.g., at the top of the original BPU), the horizontal direction (e.g., to the left of the original BPU), the diagonal direction (e.g., the lower left, lower right, upper left, or upper right of the original BPU), or any direction defined in the video coding standard used. For intra-frame prediction, prediction data 206 may include, for example, the location (e.g., coordinates) of the neighboring BPU used, the size of the neighboring BPU used, extrapolation parameters, and the orientation of the neighboring BPU used relative to the original BPU.
[0071] In another example, during the temporal prediction phase 2044, the encoder can perform inter-frame prediction. For the original BPU of the current image, the prediction reference 224 can include one or more images (referred to as "reference images") that have been encoded (in the forward path) and reconstructed (in the reconstruction path). In some embodiments, the reference image can be encoded and reconstructed using the BPU. For example, the encoder can add the reconstructed residual BPU 222 to the predicted BPU 208 to generate a reconstructed BPU. When all reconstructed BPUs of the same image are generated, the encoder can generate the reconstructed image as the reference image. The encoder can perform a "motion estimation" operation to search for matching regions within the range of the reference image (referred to as a "search window"). The position of the search window in the reference image can be determined based on the position of the original BPU in the current image. For example, the search window can be centered in the reference image at a position with the same coordinates as the original BPU in the current image and can extend outward by a predetermined distance. When the encoder identifies a region similar to the original BPU in the search window (e.g., by using a pixel recursive algorithm, block matching algorithm, etc.), the encoder can determine such a region as a matching region. The matching region can have a different size than the original BPU (e.g., less than, equal to, greater than, or different shape). This is because the reference image and the current image are temporarily separated in the timeline (e.g., as...). Figure 1 As shown in the image, it can be assumed that over time, the matching region "moves" to the original BPU's location. The encoder can record the direction and distance of this movement as a "motion vector" when using multiple reference images (e.g., such as...). Figure 1 When working with image 106, the encoder can search for matching regions and determine the associated motion vector for each reference image. In some embodiments, the encoder can assign weights to the pixel values of the matching regions of each matching reference image.
[0072] Motion estimation can be used to identify various types of motion, such as translation, rotation, scaling, etc. For inter-frame prediction, prediction data 206 may include, for example, the location (e.g., coordinates) of the matching region, the motion vector associated with the matching region, the number of reference images, the weights associated with the reference images, etc.
[0073] To generate the predicted BPU 208, the encoder can perform a "motion compensation" operation. Motion compensation can be used to reconstruct the predicted BPU 208 based on the predicted data 206 (e.g., motion vectors) and the prediction reference 224. For example, the encoder can move the matching region of the reference image according to the motion vectors, where the encoder can predict the original BPU of the current image. When using multiple reference images (e.g., such as...), Figure 1In image 106), the encoder can move the matching region of the reference image based on the corresponding motion vector and average pixel value of the matching region. In some embodiments, if the encoder has already assigned weights to the pixel values of the matching regions of each matching reference image, the encoder can add a weighted sum of the pixel values of the moved matching regions.
[0074] In some embodiments, inter-frame prediction can be unidirectional or bidirectional. Unidirectional inter-frame prediction can use one or more reference images in the same temporal direction as the current image. For example, Figure 1 Image 104 in the example is a one-way inter-frame prediction image, where the reference image (i.e., image 102) precedes image 104. Two-way inter-frame prediction can use one or more reference images in two temporal directions relative to the current image. For example, Figure 1 Image 106 in the image is a bidirectional inter-frame prediction image, in which the reference images (i.e., images 104 and 108) are in two time directions relative to image 104.
[0075] Referring again to the forward path of process 200B, after spatial prediction stage 2042 and temporal prediction stage 2044, in mode determination stage 230, the encoder can select a prediction mode (e.g., one of intra-frame prediction or inter-frame prediction) for the current iteration of process 200B. For example, the encoder can perform rate-distortion optimization techniques, whereby the encoder selects a prediction mode based on the bit rate of candidate prediction modes and the distortion of the reference image reconstructed under the candidate prediction modes, in order to minimize the value of the cost function. Based on the selected prediction mode, the encoder can generate the corresponding prediction BPU 208 and prediction data 206.
[0076] In the reconstruction path of process 200B, if intra-frame prediction mode is selected in the forward path, the encoder can directly input prediction reference 224 (e.g., the current BPU that has been encoded and reconstructed in the current image) into spatial prediction stage 2042 for later use (e.g., for extrapolating the next BPU of the current image) after generating prediction reference 224. If inter-frame prediction mode has been selected in the forward path, the encoder can input prediction reference 224 (e.g., the current image where all BPUs have been encoded and reconstructed) into loop filter stage 232 after generating prediction reference 224. At this time, the encoder can apply loop filters to prediction reference 224 to reduce or eliminate distortions (e.g., blockiness) introduced by inter-frame prediction. The encoder can apply various loop filter techniques in loop filter stage 232, such as deblocking, sample adaptive offset, adaptive loop filter, etc. The loop-filtered reference image can be stored in buffer 234 (or "decoded image buffer") for later use (e.g., as an inter-frame prediction reference image for future images of video sequence 202). The encoder may store one or more reference images in buffer 234 for use in the time prediction stage 2044. In some embodiments, the encoder may encode parameters of the loop filter (e.g., loop filter strength), as well as quantization transform coefficients 216, prediction data 206, and other information in the binary encoding stage 226.
[0077] Figure 3A A schematic diagram of an example decoding process 300A consistent with an embodiment of this application is shown. Process 300A may be corresponding to... Figure 2A The compression process 200A in the video stream is followed by the decompression process. In some embodiments, process 300A can be similar to the reconstruction path of process 200A. The decoder can decode the video stream 228 into video stream 304 according to process 300A. Video stream 304 can be very similar to video sequence 202. However, due to information loss during compression and decompression (e.g., Figure 2A-2B In the quantization stage 214), the video stream 304 is typically different from the video sequence 202. Similar to... Figure 2A-2B In processes 200A and 200B, the decoder can perform process 300A on each image encoded in the video stream 228 at the Basic Processing Unit (BPU) level. For example, the decoder can perform process 300A iteratively, where the decoder can decode the Basic Processing Unit in one iteration of decoding process 300A. In some embodiments, the decoder can perform process 300A in parallel on a region (e.g., region 114-118) of each image encoded in the video stream 228.
[0078] In Figure A, the decoder may input a portion of the video bitstream 228 associated with a basic processing unit (referred to as an "encoded BPU") of the encoded image into binary decoding stage 302. In binary decoding stage 302, the decoder may decode this portion into prediction data 206 and quantization transform coefficients 216. The decoder may input the quantization transform coefficients 216 into inverse quantization stage 218 and inverse transform stage 220 to generate a reconstructed residual BPU 222. The decoder may input the prediction data 206 into prediction stage 204 to generate a predicted BPU 208. The decoder may add the reconstructed residual BPU 222 to the predicted BPU 208 to generate a prediction reference 224. In some embodiments, the prediction reference 224 may be stored in a buffer (e.g., a decoded image buffer in computer memory). The decoder may input the prediction reference 224 into prediction stage 204 for performing a prediction operation in the next iteration of process 300A.
[0079] The decoder can iteratively execute process 300A to decode each encoded BPU of the encoded image and generate a prediction reference 224 for encoding the next encoded BPU of the encoded image. After decoding all encoded BPUs of the encoded image, the decoder can output the image to video stream 304 for display and continue decoding the next encoded image in video stream 228.
[0080] In binary decoding stage 302, the decoder can perform the inverse operation of the binary encoding technique used by the encoder (e.g., entropy coding, variable-length coding, arithmetic coding, Huffman coding, context-adaptive binary arithmetic coding, or any other lossless compression algorithm). In some embodiments, in addition to the prediction data 206 and quantization transform coefficients 216, the decoder can also decode other information in binary decoding stage 302, such as the prediction mode, parameters of the prediction operation, transform type, quantization parameter process (e.g., quantization parameters), encoder control parameters (e.g., bit rate control parameters), etc. In some embodiments, if the video stream 228 is transmitted in packets over a network, the decoder can unpack it before inputting the video stream 228 into binary decoding stage 302.
[0081] Figure 3B A schematic diagram of another example decoding process 300B consistent with embodiments of this application is shown. Process 300B can be modified from process 300A. For example, process 300B can be used by a decoder conforming to a hybrid video coding standard (e.g., H.26x series). Compared to process 300A, process 300B further divides the prediction stage 204 into a spatial prediction stage 2042 and a temporal prediction stage 2044, and further includes a loop filter stage 232 and a buffer 234.
[0082] In process 300B, for the encoding basic processing unit (referred to as the "current BPU") of the encoded picture being decoded (referred to as the "current picture"), the prediction data 206 decoded by the decoder from the binary decoding level 302 can include various types of data depending on the prediction mode used by the encoder to encode the current BPU. For example, if the encoder uses intra-frame prediction to encode the current BPU, the prediction data 206 can include a prediction mode indicator (e.g., a flag value) indicating intra-frame prediction, parameters for the intra-frame prediction operation, etc. The parameters for the intra-frame prediction operation can include, for example, the positions (e.g., coordinates) of one or more neighboring BPUs used as references, the sizes of neighboring BPUs, extrapolation parameters, the orientations of neighboring BPUs relative to the original BPU, etc. For example, if the encoder uses inter-frame prediction to encode the current BPU, the prediction data 206 may include a prediction mode indicator (e.g., a flag value) indicating inter-frame prediction, parameters of the inter-frame prediction operation, etc. The parameters of the inter-frame prediction operation may include, for example, the number of reference images associated with the current BPU, the weights associated with the reference images respectively, the positions (e.g., coordinates) of one or more matching regions in the corresponding reference images, one or more motion vectors associated with the matching regions respectively, etc.
[0083] Based on the prediction mode indicator, the decoder can decide whether to perform spatial prediction (e.g., intra-frame prediction) in the spatial prediction phase 2042 or temporal prediction (e.g., inter-frame prediction) in the temporal prediction phase 2044. The details of performing this spatial or temporal prediction are detailed in... Figure 2B As described in the previous section, it will not be repeated below. After performing this spatial or temporal prediction, the decoder can generate the predicted BPU 208. The decoder can add the predicted BPU 208 and the reconstructed residual BPU 222 to generate the prediction reference 224, as follows. Figure 3A As described in [the text].
[0084] In process 300B, the decoder can input prediction reference 224 into spatial prediction stage 2042 or temporal prediction stage 2044 to perform prediction operations in the next iteration of process 300B. For example, if the current BPU is decoded using intra-frame prediction in spatial prediction stage 2042, the decoder can directly input prediction reference 224 into spatial prediction stage 2042 for later use (e.g., for extrapolating the next BPU of the current image) after generating prediction reference 224 (e.g., the decoded current BPU). If the current BPU is decoded using inter-frame prediction in temporal prediction stage 2044, the encoder can input prediction reference 224 into loop filter stage 232 after generating prediction reference 224 (e.g., a reference image where all BPUs have been decoded) to reduce or eliminate distortion (e.g., block artifacts). The decoder can... Figure 2B The loop filter is applied to prediction reference 224 in the manner described herein. The loop-filtered reference image can be stored in buffer 234 (e.g., a decoded image buffer in computer memory) for later use (e.g., as an inter-frame prediction reference image used as a future encoded image for video bitstream 228). The decoder can store one or more reference images in buffer 234 for use in the temporal prediction stage 2044. In some embodiments, the prediction data can further include parameters of the loop filter (e.g., loop filter strength) when the prediction mode indicator of the prediction data 206 indicates that inter-frame prediction is used to encode the current BPU.
[0085] Figure 4 This is a block diagram of an example apparatus 400 for encoding or decoding video, consistent with embodiments of this application. Figure 4 As shown, device 400 may include processor 402. When processor 402 executes the instructions described herein, device 400 may become a dedicated machine for video encoding or decoding. Processor 402 may be any type of circuit capable of manipulating or processing information. For example, processor 402 may include any number of central processing units (or “CPU”), graphics processing units (or “GPU”), neural processing units (“NPU”), microcontroller units (“MCU”), optical processors, programmable logic controllers, microcontrollers, microprocessors, digital signal processors, intellectual property (IP) cores, programmable logic arrays (PLAs), programmable array logic (PALs), general-purpose array logic (GALs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), system-on-a-chip (SoCs), application-specific integrated circuits (ASICs), etc. In some embodiments, processor 402 may also be a group of processors grouped into a single logic controller. For example, as Figure 4 As shown, processor 402 may include multiple processors, including processor 402a, processor 402b and processor 402n.
[0086] Device 400 may also include memory 404 configured to store data (e.g., a set of instructions, computer code, intermediate data, etc.). For example, as Figure 4As shown, the stored data may include program instructions (e.g., program instructions for implementing stages in processes 200A, 200B, 300A, or 300B) and data for processing (e.g., video sequence 202, video stream 228, or video stream 304). Processor 402 can access the program instructions and data for processing (e.g., via bus 410) and execute program instructions to manipulate or control the data for processing. Memory 404 may include a high-speed random access memory device or a non-volatile memory device. In some embodiments, memory 404 may include any number of random access memories (RAM), read-only memories (ROM), optical discs, magnetic disks, hard disks, solid-state drives, flash drives, secure digital cards (SD cards), memory sticks, compact flash (CF) cards, etc. Memory 404 may also be a group of memories grouped into single logical components. Figure 4 (Not shown in the image).
[0087] Bus 410 may be a communication device for transmitting data between components within device 400, such as an internal bus (e.g., CPU-memory bus), an external bus (e.g., a Universal Serial Bus port, a Peripheral Component Interconnect Fast Port), etc.
[0088] For ease of explanation and to avoid ambiguity, the processor 402 and other data processing circuits are collectively referred to as "data processing circuits" in this application. The data processing circuits can be implemented entirely as hardware, or as a combination of software, hardware, or firmware. Furthermore, the data processing circuits can be a single, independent module, or can be wholly or partially integrated into any other component of the device 400.
[0089] Device 400 may also include a network interface 406 to provide wired or wireless communication in relation to a network (e.g., the Internet, intranet, local area network, mobile communication network, etc.). In some embodiments, network interface 406 may include any number of network interface controllers (NICs), radio frequency (RF) modules, repeaters, transceivers, modems, routers, gateways, any combination of wired network adapters, wireless network adapters, Bluetooth adapters, infrared adapters, near field communication (“NFC”) adapters, cellular network chips, etc.
[0090] In some embodiments, optionally, the device 400 may further include a peripheral interface 408 to provide connectivity to one or more peripheral devices. Figure 4 As shown, peripheral devices may include, but are not limited to, cursor control devices (such as mice, touchpads, or touchscreens), keyboards, displays (such as cathode ray tube displays, liquid crystal displays, or light-emitting diode displays), video input devices (such as cameras or input interfaces coupled to video files), etc.
[0091] It should be noted that the video codec (e.g., the codec for executing processes 200A, 200B, 300A, or 300B) can be implemented as any combination of any software or hardware modules in device 400. For example, some or all stages of processes 200A, 200B, 300A, or 300B can be implemented as one or more software modules of device 400, such as program instructions that can be loaded into memory 404. As another example, some or all stages of processes 200A, 200B, 300A, or 300B can be implemented as one or more hardware modules of device 400, such as dedicated data processing circuitry (e.g., FPGA, ASIC, NPU, etc.).
[0092] In quantization and dequantization function blocks (e.g., Figure 2A or Figure 2B Quantization 214 and inverse quantization 218, Figure 3A or Figure 3B In inverse quantization (218), the amount of quantization (and inverse quantization) applied to the prediction residual is determined using quantization parameters (QP). For example, initial QP values for picture encoding or slice encoding can be sent at a higher level using the init_QP_minus26 syntax element in the Picture Parameter Set (PPS) and the slice_QP_delta syntax element in the slice header. Furthermore, incremental QP values can be sent at the local level for each CU using quantization groups as granularity.
[0093] In VVC 6 (“VVC 6”), the residual of a transform block (TB) of video data can be encoded using a transform skip (TS) mode that skips the transform stage. For example, a decoder can decode the video data to obtain the residual, thereby using the TS mode to decode the video data, and then perform inverse quantization and reconstruction without inverse transform. VVC 6 limits the applicability of the TS mode by the maximum block size, where the TS mode is only applicable to TBs with a maximum width and height of 32 pixels. The maximum block size for this application of the TS mode can be specified using a picture parameter set (PPS) level syntax, log2_transform_skip_max_size_minus2 (transform skip parameter), which ranges from 0 to 3. When it does not exist, the value of log2_transform_skip_max_size_minus2 (transform skip parameter) is inferred to be 0. The maximum value of the width or height of the maximum block in the TS mode, MaxTsSize (maximum transform size), can be determined according to formula (1):
[0094] MaxTsSize = 1 << ( log2_transform_skip_max_size_minus2 + 2 ) Formula 1
[0095] In other words, when log2_transform_skip_max_size_minus2 (transform skip parameter) is 0, TS mode is allowed if the TB width and height are at most 4. In the current VVC 6 design, the maximum allowed value for MaxTsSize is 32 because the maximum allowed value for log2_transform_skip_max_size_minus2 (transform skip parameter) is 3. If the TB width and height are at most MaxTsSize, a parameter transform_skip_flag (transform skip flag) can be sent to specify whether TS mode is selected. If the TB width or height is greater than 32, TS mode is not allowed for that TB.
[0096] In VVC 6, the residuals of the TS mode are encoded using a 4×4 non-overlapping coefficient group (CG). The transform jump coefficient hierarchy of the CG is encoded in three traversals at the scan position.
[0097] The first traversal can be represented by the following pseudocode:
[0098]
[0099] The second traversal can be represented by the following pseudocode:
[0100]
[0101]
[0102] The third traversal can be represented by the following pseudocode:
[0103] for(n=0;n<=numSbCoeff-1;n++)
[0104] rice=cctx.templateAbsSumTS(n,coeff);
[0105] decode abs_remainder_using_RG_Coding
[0106] In the above description, the syntax elements in the TS pattern used for residual coding (referred to as "TS residual coding") can be either context-coded (labeled "context") or bypass-coded (labeled "bypass").
[0107] In some embodiments, a coding tool called "hierarchical mapping" can be used for TS residual coding. The absolute coefficient hierarchy parameter absCoeffLevel can be mapped to a modified hierarchy that is encoded based on the values of the quantized residual samples, which are located to the left and above the current residual sample. Let X0 represent the absolute coefficient hierarchy to the left of the current coefficient, and let X1 represent the absolute coefficient hierarchy above the current coefficient. To represent coefficients using absolute coefficient hierarchy ("absCoeff"), the mapping parameter absCoeffMod can be encoded. absCoeffMod can be derived using the following pseudocode representation:
[0108]
[0109] Several challenges exist in the current design of TS mode. In VVC 6, TS mode is an encoding tool that can achieve mathematically lossless compression of blocks under two conditions: selecting appropriate quantization parameter values and closing the loop filter stage. Because VVC 6 does not allow TBs with a width or height greater than 32 to use TS mode, the current design in VVC 6 cannot achieve mathematically lossless compression of blocks when the TB width or height is greater than 32.
[0110] Furthermore, the newly adopted hierarchical mapping process significantly impacts the throughput of Context Adaptive Binary Arithmetic Coding (CABAC) because, for each coefficient level, the decoder needs to compute predictions from both the top and left sides. Since the derivation of the Rice parameters depends on the actual levels, the computation of the actual levels involved in the inverse mapping must be performed within the CABAC parsing loop. This interleaved approach of parsing and hierarchical decoding is undesirable as it reduces the throughput of the decoder hardware implementation.
[0111] In VVC 6, in addition to log2_transform_skip_max_size_minus2 (transform skip parameter) as mentioned above, another Sequence Parameter Set (SPS) level identifier, sps_max_luma_transform_size_64_flag, can specify the maximum TB size in the luminance sample. When sps_max_luma_transform_size_64_flag equals 1, the maximum TB size in the luminance sample is equal to 64. When sps_max_luma_transform_size_64_flag equals 0, the maximum TB size in the luminance sample is equal to 32. When the luminance coding tree block size (CTU) of a coding tree unit is less than 64, the value of sps_max_luma_transform_size_64_flag is equal to 0. Based on sps_max_luma_transform_size_64_flag, the parameters MaxTbLog2SizeY and the maximum TB size MaxTbSizeY can be derived based on formulas (2) and (3):
[0112] MaxTbLog2SizeY = sps_max_luma_transform_size_64_flag? 6∶5 Formula 2
[0113] MaxTbSizeY = 1 << MaxTbLog2SizeY Formula 3
[0114] Based on formulas (2) to (3), the maximum value of the PPS level syntax, log2_transform_skip_max_size_minus2 (transform skip parameter), can depend on the SPS level flag, sps_max_luma_transform_size_64_flag. log2_transform_skip_max_size_minus2 specifies the maximum block size used by the TS mode, and its value can be in the range of 0 to (3 + sps_max_luma_transform_size_64_flag). The encoder can be configured to ensure that the value of log2_transform_skip_max_size_minus2 is within the allowed range. When it does not exist, the value of log2_transform_skip_max_size_minus2 can be inferred to be 0. The maximum allowed MaxTsSize can be determined using formula (1). If the width and height of the TB are less than MaxTsSize, then the TS mode can be allowed to encode the TB.
[0115] As can be seen from the above description, in VVC 6, the `log2_transform_skip_max_size_Mins2` signal is only emitted when `sps_transform_skip_enabled_flag` is 1. `sps_transform_skip_enabled_flag` equals 0, indicating that `transform_skip_flag` does not exist in the transform unit syntax. Therefore, when `sps_transform_skip_enabled_flag` is 0, there is no need to send the `log2_transform_skip_max_size_minus2` signal. Currently, this signaling in VVC 6 has the problem of resolving the dependency relationship between SPS and PPS. The above embodiment also has the problem of resolving the dependency relationship between the PPS syntax `log2_transform_skip_max_size_minus2` and the SPS syntax `sps_max_luma_transform_size_64_flag`. Such dependency resolution is generally undesirable.
[0116] This application provides technical solutions to the aforementioned technical problems. To achieve lossless compression of large TB volumes using the TS mode, this application provides embodiments where the TS mode can be extended to apply to TB sizes up to the maximum TB size allowed for the encoded video sequence. Different coefficient scanning methods are also provided for TS residual coding.
[0117] Consistent with some embodiments of this application, in order to remove the resolution dependency between SPS and PPS, log2_transform_skip_max_size_minus2 can be moved from PPS to SPS. For example, Figure 5 Table 1 is shown, which illustrates example syntax structures of Sequence Parameter Sets (SPS) according to some embodiments of this disclosure. Figure 6 Table 2 is shown, illustrating an example syntax structure of the Picture Parameter Set (SPS) according to some embodiments of this application. Tables 1 and 2 show that log2_transform_skip_max_size_minus2 is moved from PPS to SPS, as shown in row 502 of Table 1 and rows 602-604 of Table 2.
[0118] Consistent with some embodiments of this application, the maximum block size for a block applying TS mode can be set to the maximum TB size (MaxTbSizeY), in which case log2_transform_skip_max_size_minus2 is not signaled. This allows TS mode to be enabled if the width and height of the TB are less than or equal to MaxTbSizeY. In some embodiments, MaxTbSizeY can be determined based on formulas (2) to (3).
[0119] As an example, Figure 7 Table 3 is shown, illustrating example syntax structures for transform units according to some embodiments of this disclosure. Table 3 shows that, according to the example syntax structures of transform units, the width and height of a TB can be less than or equal to the maximum value MaxTbSizeY (i.e., 32), as shown in line 706. By doing so, since the maximum block size for applying TS mode is the same as MaxTbSizeY, all TBs can be allowed to use TS mode without additional checks to determine whether the width and height of a TB are less than or equal to MaxTbSizeY, as shown in lines 702-704. It should be noted that VVC 6 also uses a Multiple Transform Selection (MTS) scheme for residual coding of inter-frame and intra-frame coded blocks. MTS uses multiple selected transforms from DCT8 / DST7. However, during MTS coding, additional checks are required because MTS is allowed when both tbWidth and tbHeight are less than or equal to 32.
[0120] VVC 6 provides an alternative coding tool called Block Differential Pulse Code Modulation (BDPCM). In BDPCM mode, horizontal and vertical differential pulse code modulation (DPCM) are applied to the residual domain, skipping the transform stage. The maximum permissible block width or height for applying BDPCM mode is the same as in TS mode.
[0121] According to some embodiments of this application, the maximum block size for applying BDPCM mode can also be extended to the maximum block size for applying TS mode. By doing so, BDPCM mode can be allowed if the width and height of the coding unit (CU) are less than or equal to MaxTbSizeY. For example, Figure 8 Table 4, illustrating some embodiments according to this application, shows example syntactic structures related to the signal transmission block differential pulse code modulation (BDPCM) mode. Table 4 shows that the maximum block size for applying the BDPCM mode can be extended to the maximum block size for applying the TS mode, as shown in line 802.
[0122] In some cases, the allowed values for log2_transform_skip_max_size_minus2 (transform skip parameter) can depend on the codec's configuration file. For example, the main configuration file might specify that the value of log2_transform_skip_max_size_minus2 can be the same as the maximum TB size. Any bitstream indicating a log2_transform_skip_max_size_minus2 value different from the maximum TB size can be considered an invalid bitstream by the codec. If the extended configuration file exceeds the scope of the main configuration file, the value of log2_transform_skip_max_size_minus2 may differ from the maximum TB size.
[0123] Consistent with some embodiments of this application, this document provides methods and syntactic structures to ensure that the value of log2_transform_skip_max_size_minus2 is always the same as the maximum TB size, for example, by not signaling log2_transform_skip_max_size_minus2 and inferring its relationship to the maximum TB size, or by constraining the configuration through a configuration file. By doing so, the burden on the decoder implementation is reduced because fewer combinations of syntax element values need to be tested.
[0124] In some embodiments, the SPS flag can be signaled to indicate that the maximum block size for applying TS mode is 32 or 64. For example, sending the SPS flag can be done in the same way as sending the maximum TB size signal. For instance, `sps_max_transform_skip_size_64_flag` can be set to 0 to specify a maximum block size of 32 for applying TS mode. As another example, `sps_max_transform_skip_size_64_flag` can be set to 1 to specify a maximum block size of 64 for applying TS mode. In some embodiments, when the `sps_max_transform_skip_size_64_flag` signal is not sent, its value can be inferred to be 0.
[0125] In some embodiments, the maximum block size for applying TS mode can be determined based on Equation 4:
[0126] MaxTsSize=sps_max_transform_skip_size_64_flag? 64:32 formula 4
[0127] In some embodiments, if both sps_max_luma_transform_size_64_flag and sps_transform_skip_enabled_flag are equal to 1, then the sps_max_transform_skip_size_64_flag signal can be sent.
[0128] As an example, Figure 9 Table 5 shows an example syntax structure of SPS for sending the sps_max_transform_skip_size_64_flag signal according to some embodiments of this application. Figure 10 Table 6 is shown, illustrating an example syntax structure of a transformation unit for transmitting the sps_max_transform_skip_size_64_flag signal according to some embodiments of this application. Tables 5 and 6 show implementations of the signaling sps_max_transform_skip_size_64_flag, as shown in row 902 of Table 5 and rows 1002-1006 of Table 6.
[0129] Consistent with some embodiments of this application, because the maximum block size when applying TS mode or BDPCM mode can be extended to a maximum TB size, residual coding in TS mode or BDPCM mode can also be extended to allow encoding a maximum TB size. According to embodiments of some applications, residual coding can be directly extended to allow up to a maximum TB size without modifying any scan mode.
[0130] In some embodiments, similar to VVC draft 6, the transform block can be divided into coefficient groups (CGs), and a diagonal scan can be performed. For example, Figure 11 This is a schematic diagram of an example diagonal scan of a 64×64 transform block (TB) according to some embodiments of this application. Figure 11 A diagonal scan pattern of 64×64TB (e.g., MaxTbSizeY = 64) is shown (indicated by zigzag arrow lines). Figure 11 Each cell in the array can represent a 4×4CG. It should be noted that, although... Figure 11 A 64×64 TB is shown to illustrate the diagonal scanning process, but a TB can be any size or shape and is not limited to the example illustrated herein. For example, when a TB is a rectangle instead of a square, it has only one dimension equal to 64.
[0131] Scan the entire TB in residual coding (e.g., Figure 11One challenge with the 64×64TB VVC is the need to modify the current VVC residual coding to support this extension, as the current residual VVC coding only supports a maximum block size of 32×32. In the current VVC design, even when the transform is applied to 64×64TB (e.g., in non-skip mode), the decoder may still need to apply the residual coding only to the 32×32 coefficient block representing the top-left 32×32 block of the 64×64TB. In this case, all remaining high-frequency coefficients are forced to zero (and therefore do not need to be encoded). For example, for an M×N TB (M is the block width, N is the block height), when M equals 64, only the left 32 columns of transform coefficients can be encoded. Similarly, when N equals 64, only the first 32 rows of transform coefficients can be encoded.
[0132] Consistent with some embodiments of this application, in order to reuse existing VVC 6 residual coding techniques, large TBs can be divided into smaller residual units (RUs). For example, if the width of the TB is greater than 32, the TB can be horizontally split into two partitions. As another example, if the height of the TB is greater than 32, the TB can be vertically split into two partitions. In another example, if both dimensions of the TB are greater than 32, the TB can be divided into four RUs, both horizontally and vertically. After splitting, 32×32 RUs can be encoded.
[0133] As an example, Figure 12A-12D Example residual units (RUs) according to some embodiments of this application are shown. Figure 12A In this diagram, a 64×64TB block is divided into four 32×32RU blocks (represented by dashed lines). Figure 12B In the diagram, a 64×16TB block is horizontally divided into two 32×16Rus blocks (represented by dashed lines). Figure 12C In the diagram, a 32×64TB is vertically divided into two 32×32Rus (represented by dashed lines). Figure 12D In this configuration, since both the height and width are no greater than 32, no partitioning is performed, and the RU size is the same as the TB size. In some embodiments, the maximum allowed RU size is 32×32.
[0134] As an example, Figure 13 This is a schematic diagram of a 64×64TB diagonal scan example according to some embodiments of this application, wherein the TB is divided into four 32×32RUs. Figure 13 In this model, the 64×64TB is divided into four RUs (represented by thick solid lines within the TB), and the coefficients of each RU are scanned individually (e.g., independently) within the RU in the same order as in the 32×32TB scan mode. Figure 13As shown, the context model and Rice parameter derivation of one RU can be independent of another RU. In some embodiments, a maximum number of context coding containers can also be allocated independently for each RU. This scheme differs from VVC 6, where the maximum number of context coding containers is defined at the TB level.
[0135] As an example, Figures 14A-14D Table 7, which illustrates some embodiments of this application, shows example syntax structures for residual coding when the TB is divided into RUs.
[0136] In VVC 6, for each coefficient group (CG) of a TS mode block, the signal `coded_sub_block_flag` is sent. `coded_sub_block_flag = 0` indicates that all coefficients of the CG are zero. `coded_sub_block_flag = 1` indicates that at least one coefficient within the CG is not zero. However, the `coded_sub_block_flag` of the last CG is not signaled, and it is inferred to be 1 if the `coded_sub_block_flag` of all previously encoded CGs (i.e., those before the last CG) are zero. This means that the resolution of the last CG of a TB depends on all previously decoded CGs. To eliminate the dependency between RUs, `coded_sub_block_flag` can be sent for all CGs of an RU (including the last CG).
[0137] Consistent with some embodiments of this application, an additional syntax, `coded_RU_flag`, can be introduced. In some embodiments, a `coded_RU_flag` signal can be emitted when the number of RUs within a TB is greater than 1. In some embodiments, if a `coded_RU_flag` does not exist, it can be inferred to be 1. `coded_RU_flag = 0` can specify that all coefficients of the RU are zero. `coded_RU_flag = 1` can specify that at least one coefficient of the RU is non-zero. In some embodiments, if all `coded_RU_flag`s except the last RU are zero, it is not necessary to emit a `coded_RU_flag` signal for the last RU, and it can be inferred to be 1. As an example, the following pseudocode shows an example signaling for `coded_RU_flag`:
[0138]
[0139] As an example, Figures 15A-15DTable 8, illustrating some embodiments according to this application, shows another example syntax structure for residual coding when the coded_RU_flag signal is sent. In some embodiments, if the coded_RU_flag signal is sent, the final CG flag can be maintained in the same manner as in VVC 6. That is, if coded_sub_block_flag is 0 in the same RU, then coded_sub_block_flag can be inferred to be 1.
[0140] The Joint Video Experts Group (JVET) AHG released lossless and near-lossless coding tools, AHG18, based on VTM-6.0. The lossless software introduces a CU-level flag called `cu_transquant_bypass_flag`. `cu_transquant_bypass_flag = 1` means that the transform and quantization of that CU are skipped, and the CU is encoded in lossless mode. In the current version of the lossless software, `sps_max_luma_transform_size_64_flag` is set to 0, meaning that the maximum TB size in a luma sample is limited to 32×32. For chroma samples, the maximum TB size is adjusted according to the YUV color format (e.g., 16×16 for YUV 420). In some embodiments, when `cu_transquant_bypass_flag = 1`, the luma transform block size can be increased to 64×64, and the residual coding techniques described above can be used when `cu_transquant_bypass_flag = 1`.
[0141] In some embodiments, the maximum TB size of the chromaticity component can be determined using formulas (2) and (3). Based on formulas (2) and (3), the maximum TB width maxTbWidth and maximum TB height maxTbHeight can be determined using formulas (5) and (6):
[0142] maxTbWidth=(cIdx= formula
[0143] = 0)? MaxTbSizeY : MaxTbSizeY / SubWidthC (5)
[0144] maxTbHeight=(cIdx= formula
[0145] = 0)? MaxTbSizeY : MaxTbSizeY / SubHeightC (6)
[0146] In formulas (5) and (6), cIdx = 0 represents the luminance component. cIdx = 1 and cIdx = 2 represent the two chrominance components. For example, the values of SubWidthC and SubHeightC can be derived from a chrominance format. Consistent with some embodiments of this application, Figure 16 Table 9 shows example parameter values derived from the chroma format according to some embodiments of this application.
[0147] In VVC 6, reverse hierarchical mapping is embedded in the CABAC module. Figure 17 Table 10, showing some embodiments according to this application, illustrates example syntax structures in VVC 6 for residual coding to perform reverse hierarchical mapping.
[0148] Consistent with some embodiments of this application, to improve the throughput of CABAC transformations that skip residual resolution, the Rice parameter can be derived based on the mapping level value rather than the actual level value. In some embodiments, both the context model and the Rice parameter can depend on the mapping value, and the inverse mapping operation may not be performed during residual resolution. By doing so, the inverse mapping is decoupled from the residual resolution process. The inverse mapping can be performed after the residual resolution of the entire TB is complete. In some embodiments, the inverse mapping and residual resolution can be performed simultaneously in a single traversal, which allows for a decision based on the actual situation whether to interleave resolution and mapping or separate them into two processes.
[0149] As an example, Figure 18 This is a flowchart of an example decoding method 1400 according to some embodiments of this application. Method 1800 can be performed with parsing and inverse mapping separated. Figure 18 As can be seen, the inverse mapping is performed after the residual parsing of the entire TB is completed and before the dequantization, thus decoupling it from the residual parsing.
[0150] Consistent with some embodiments of this application, Figure 19 Table 10 is shown, illustrating example syntax structures for residual coding without performing inverse hierarchical mapping according to some embodiments of this application. In some embodiments, inverse hierarchical mapping may be moved to the decoding process, which will be described below.
[0151] Consistent with some embodiments of this application, the following pseudocode illustrates an inverse hierarchical mapping process that can be performed after residual resolution and before inverse quantization (e.g., Figure 18 (As shown). In the following pseudocode, TransCoeffLevel[xC][yC] represents the coefficient value at position (xC, yC) after residual analysis, and TransCoeffLevelInvMapped[xC][yC] represents the coefficient value at position (xC, yC) after inverse mapping:
[0152]
[0153]
[0154] According to some embodiments of this application, the Rice parameter can be derived based on the mapped value, which differs from the Rice parameter in VVC 6, which is derived based on the actual level value. Assuming the array TransCoeffLevel[xC][yC] is the mapped level value of the TB for a given color component at position (xC, yC), the variable locSumAbs can be derived according to the following pseudocode:
[0155] locSumAbs = 0
[0156] AbsLevel[xC][yC]=abs(TransCoeffLevel[xC][yC])
[0157] if (xC > 0)
[0158] locSumAbs+=AbsLevel[xC-1][yC]
[0159] if (yC > 0)
[0160] locSumAbs+=AbsLevel[xC][yC-1]
[0161] locSumAbs=Clip3(0,31,locSumAbs)
[0162] Consistent with some embodiments of this application, Figure 20 Table 12, illustrating some embodiments of this application, shows an example lookup table for selecting the Rice parameter. In some embodiments of the application, the value of locSumAbs can be adjusted based on a predefined offset value. In some embodiments, the offset value is calculated based on offline training. The following example pseudocode shows an offset value of 2.
[0163] locSumAbs = 0
[0164] offset = 2;
[0165] AbsLevel[xC][yC]=abs(TransCoeffLevel[xC][yC])
[0166] if (xC > 0)
[0167] locSumAbs+=AbsLevel[xC-1][yC]
[0168] if (yC > 0)
[0169] locSumAbs+=AbsLevel[xC][yC-1]
[0170] locSumAbs-=offsetlocSumAbs=Clip3(0,31,locSumAbs)
[0171] Consistent with some embodiments of this application, Figure 21-22 A flowchart illustrating example processes 2100-2200 for video processing according to some embodiments of this application is shown. In some embodiments, processes 2100-2200 may be performed by a codec (e.g., Figure 2A-2B encoder or Figures 3A-3B The codec is executed by the decoder in the video processing apparatus (e.g., apparatus 400). For example, the codec can be implemented as one or more software or hardware components of an apparatus for video processing (e.g., apparatus 400).
[0172] For example, Figure 21 A flowchart of an example process 2100 for video processing according to some embodiments of this application is shown. In step 2102, the codec (e.g., Figure 2A-2B The encoder in the code can determine the transformation process to skip the prediction residual based on either the maximum size of the brightness sample of the prediction block or the maximum size value of the prediction block. The transformation process can be... Figure 2A-2B The transformation stage 212 in the middle. The prediction residual can be the residual BPU 210 in 2A-2B. The prediction block can be included in Figure 2A-2B The predicted data in block 206. For example, the transformed block (e.g., Figure 11-13 (Any of the transform blocks shown). The size of the prediction block can include either height or width.
[0173] In some embodiments, the codec may determine to skip the transformation process based on the determination that the number of predicted block sizes is not greater than a threshold, thereby determining to skip the transformation process on the prediction residual. In some embodiments, the threshold may be MaxTbSizeY, as shown and described in conjunction with formulas (2) to (3), and the maximum value of the threshold may be equal to one of the maximum value of the luminance sample size (e.g., 32, 64, or any number) or the maximum value of the predicted block size (e.g., 32, 64, or any number). In some embodiments, the maximum value of the luminance sample size or the maximum value of the predicted block size may be a dynamic value (e.g., not a constant).
[0174] In some embodiments, the threshold is equal to the maximum size of the luminance sample indicating the luminance information of the prediction block. In some embodiments, the maximum value of the threshold is 64. In some embodiments, the maximum value of the threshold is 32. In some embodiments, the minimum value of the threshold is 4. In some embodiments, the threshold may be equal to the maximum size of the prediction block that allows the transformation process to be performed (e.g., MaxTsSize as shown and described in Equation (1)).
[0175] Still referencing Figure 21 In step 2104, the codec can generate residual coefficients. In some embodiments, the maximum value of the threshold is determined based at least on a first parameter in a first parameter set. For example, the first parameter set may be a sequence parameter set (SPS). In some embodiments, the value of the first parameter is 0 or 1. For example, the first parameter may be sps_max_luma_transform_size_64_flag, such as... Figure 9 As shown and described in Table 5, in some embodiments, the threshold can be determined based on the value of the first parameter. For example, if the first parameter can be `sps_max_luma_transform_size_64_flag`, and if the threshold is `MaxTbSizeY`, then when `sps_max_luma_transform_size_64_flag` equals 1, `MaxTbSizeY` can be equal to 64. When `sps_max_luma_transform_size_64_flag` equals 0, `MaxTbSizeY` equals 32.
[0176] In some embodiments, the maximum value of the threshold can be determined based at least on a first parameter in a first parameter set. In some embodiments, the threshold can be determined based on the value of a second parameter in a second parameter set. In some embodiments, the second parameter set is a sequence parameter set (SPS). In some embodiments, the second parameter set is a picture parameter set (PPS). The second parameter can be log2_transform_skip_max_size_minus2 (e.g., as combined with...). Figure 5(As shown and described in Table 1). The value of the second parameter can be determined based on the value of the first parameter. In some embodiments, the value of the second parameter (e.g., log2_transform_skip_max_size_minus2) has a minimum value of 0 and equal to 3, and is the maximum value of the sum of the first parameter (e.g., sps_max_luma_transform_size_64_flag). For example, log2_transform_skip_max_size_minus2 can be in the range of 0 to (3 + sps_max_luma_transform_size_64_flag). In some embodiments, the second parameter can have a first value in the encoder's first profile (e.g., the main profile) and a second value in the encoder's second profile (e.g., the extended profile), and the first and second values are different.
[0177] Still referencing Figure 21 In step 2104, the codec generates residual coefficients for the prediction residual by performing at least one lossless compression or quantization process on the prediction residual. As described herein, the residual coefficients can be coefficients associated with the residual coding process. The quantization process can be... Figure 2A-2B The quantization stage 214 in the process. The lossless compression process may include generating residual coefficients using coefficient sets (CGs). For example, the coefficient sets may be non-overlapping. In some embodiments, the coefficient sets have a size of 4×4.
[0178] In some embodiments, the codec can use a Multiple Transform Selection (MTS) scheme to generate residual coefficients. For example, the codec can determine whether the size of the predicted block is greater than 32. If the size of the predicted block is not greater than 32, the codec can use the MTS scheme to generate residual coefficients.
[0179] In some embodiments, the codec may use either context coding or bypass coding to further determine the transform skip coefficient level of the coefficient group. The codec may also determine the Rice parameter based on the transform skip coefficient level. The codec may also generate the bitstream by entropy coding of at least one of the coefficient group, the transform skip coefficient level, or the Rice parameter.
[0180] In some embodiments, the codec may also map the transform jump coefficient hierarchy to a modified transform jump coefficient hierarchy based on the first value of the first residual coefficient of the first prediction block to the left of the prediction block and the second value of the second residual coefficient of the second prediction block to the top of the prediction block.
[0181] In some embodiments, the codec may use either context coding or bypass coding to determine the transform skip coefficient level of the coefficient group, map the transform skip coefficient level to a modified transform skip coefficient level based on a first value of a first residual coefficient of a first prediction block to the left of the prediction block and a second value of a second residual coefficient of a second prediction block to the top of the prediction block, generate a context model of the context coding technique based on the modified transform skip coefficient level, determine the Rice parameter based on the modified transform skip coefficient level, generate residual coefficients using the coefficient group, and generate a bitstream, transform skip coefficient level, or Rice parameter by entropy coding at least one of the coefficient groups.
[0182] Still referencing Figure 21 In step 2106, the codec can generate a bitstream by entropy encoding at least the residual coefficients. The bitstream can be... Figure 2A-2B The video stream bitrate is 228.
[0183] Figure 22 A flowchart of another example process 2200 for video processing according to some embodiments of this application is shown. For example, process 2200 may be performed by... Figures 3A-3B The decoder in the codec is used to execute it.
[0184] like Figure 22 As shown, in step 2202, the decoder receives a bitstream that includes encoding information of the video sequence. The bitstream includes the sequence parameter set (SPS) of the video sequence.
[0185] In step 2204, the decoder determines the maximum transform size of the prediction block based on parameters in the Sequence Parameter Set (SPS) of the video sequence. The prediction block can be included in... Figure 2A-2B Blocks in the predicted data 206, such as transform blocks (e.g., Figure 11-13 (Any transform block shown). In some embodiments, the maximum transform size may correspond to the maximum size of the luminance samples of the prediction block, or the maximum size of the prediction block. The size of the prediction block may include height or width. (The above is combined...) Figure 5-10 A detailed method for determining the maximum transform size based on parameters in SPS is described.
[0186] In step 2206, the decoder determines the transformation process to skip for the prediction residuals of the prediction block based on the maximum transform size. The transformation process can be... Figure 2A-2B The transformation stage 212 in the middle.
[0187] In some embodiments, a non-transitory computer-readable storage medium including instructions that can be executed by a device for performing the methods described above (e.g., the encoders and decoders disclosed herein). Common forms of non-transitory media include, for example, floppy disks, hard disks, solid-state drives, magnetic tape or any other magnetic data storage media, CD-ROMs, any other optical data storage media, any physical media with a perforated pattern, RAM, PROMs and EPROMs, FLASH-EPROMs or any other flash memory, NVRAM, caches, registers, any other memory chips or cassettes, and the same network versions. The device may include one or more processors (CPUs), input / output interfaces, network interfaces, and / or memory.
[0188] The above embodiments may be further described using the following terms:
[0189] 1. A video processing method, comprising:
[0190] Based on the maximum transformation size of the prediction block, determine which transformation process to skip on the prediction residuals; and
[0191] The maximum transform size is represented by a signal in the Sequence Parameter Set (SPS).
[0192] 2. The method according to Clause 1, wherein determining to skip the transformation process on the predicted residual comprises:
[0193] Based on the prediction block size not exceeding a threshold, the transformation process is skipped, where the threshold is equal to the maximum of one of the following:
[0194] The maximum size of the brightness sample of the predicted block, or
[0195] Predict the maximum size of the block.
[0196] 3. The method according to Clause 2, wherein one of the maximum values of the size of the luminance sample or the maximum value of the size of the prediction block is a dynamic value.
[0197] 4. The method according to any one of the foregoing clauses further includes:
[0198] The skip transformation process is further determined based on the parameters indicating the skip transformation mode.
[0199] 5. The method according to Clause 2, wherein the size of the prediction block includes height or width.
[0200] 6. The method according to Clause 2, wherein the maximum value of the threshold is determined based on at least one first parameter in the first parameter set.
[0201] 7. The method according to Clause 6, wherein the first parameter set is a sequence parameter set (SPS).
[0202] 8. The method according to any one of Clauses 6-7, wherein the value of the first parameter is 0 or 1.
[0203] 9. The method according to any one of clauses 2-8, wherein the maximum value of the threshold is 64.
[0204] 10. The method according to any one of clauses 2-8, wherein the maximum value of the threshold is 32.
[0205] 11. The method according to any one of clauses 2-10, wherein the maximum value of the threshold is determined based at least on a first parameter in the first parameter set and a third parameter in the first parameter set.
[0206] 12. The method according to any one of clauses 2-11, wherein the minimum value of the threshold is 4.
[0207] 13. The method according to any one of clauses 2-12, wherein the threshold is equal to the maximum value of the size of the brightness sample characterizing the brightness information of the predicted block.
[0208] 14. The method according to any one of clauses 6-13, wherein the maximum value of the threshold is determined based on the value of a second parameter in a second parameter set, and the value of the second parameter is determined based on the value of the first parameter.
[0209] 15. The method according to Clause 14, wherein the value of the second parameter is at least 0 and at most equal to the sum of the values of the first parameter and 3.
[0210] 16. The method according to Clause 14, wherein the second parameter has a first value in a first profile of the encoder and a second value in a second profile of the encoder, the first value and the second value being different.
[0211] 17. The method according to any one of clauses 14-16, wherein the second parameter set is SPS.
[0212] 18. The method according to any one of clauses 14-16, wherein the second parameter set is a picture parameter set (PPS).
[0213] 19. The method according to any one of clauses 2-12, wherein the threshold is equal to the maximum value of the size of the prediction block that allows the transformation process to be performed.
[0214] 20. The method according to Clause 19, wherein the threshold is determined based on the value of the first parameter.
[0215] 21. The method according to any one of the preceding clauses further includes:
[0216] The residual coefficients for the prediction block are generated using the Multiple Transform Selection (MTS) scheme.
[0217] 22. The method described in accordance with Clause 21 further includes:
[0218] Determine if the size of the predicted block is not greater than 32; and
[0219] Based on the premise that the size of the predicted block is no greater than 32, the MTS scheme is used to generate residual coefficients.
[0220] 23. The method according to any one of clauses 2-22, further comprising:
[0221] Determine whether the size of the predicted block is not greater than the threshold; and
[0222] Based on the judgment that the size of the prediction block is not greater than the threshold, the prediction residual is subjected to block differential pulse code modulation (BDPCM) before generating residual coefficients for the prediction block.
[0223] 24. The method according to any one of the preceding clauses further includes:
[0224] By performing a lossless compression process or generating residual coefficients for predicting the residuals, wherein the lossless compression process includes generating residual coefficients using a set of non-overlapping coefficients.
[0225] 25. The method according to Clause 24, wherein the size of the coefficient group is 4×4.
[0226] 26. The method according to any one of clauses 24-25, further comprising:
[0227] Using either context coding or bypass coding, determine the transformation of the coefficient group to skip coefficient levels;
[0228] Rice parameters are determined based on the transformation jump coefficient hierarchy; and
[0229] The bitstream is generated by entropy encoding of at least one of the coefficient groups, the transform skip coefficient level, or the Rice parameter.
[0230] 27. The method according to any one of clauses 24-26, further comprising:
[0231] Based on the first value of the first residual coefficient of the first prediction block to the left of the prediction block and the second value of the second residual coefficient of the second prediction block to the top of the prediction block, the transform skip coefficient level is mapped to the modified transform skip coefficient level.
[0232] 28. The method described in accordance with Clauses 24-26 further includes:
[0233] Use either context coding or bypass coding to determine the transformation of the coefficient group, skipping coefficient levels;
[0234] Based on the first value of the first residual coefficient of the first prediction block to the left of the prediction block and the second value of the second residual coefficient of the second prediction block to the top of the prediction block, the transform skip coefficient level is mapped to the modified transform skip coefficient level.
[0235] A context model for generating context coding techniques is generated based on the modified transform jump coefficient hierarchy.
[0236] Rice parameters are determined based on the modified transformation jump coefficient hierarchy;
[0237] The residual coefficients are generated using the set of coefficients; and
[0238] The bitstream is generated by entropy encoding at least one of the coefficient group, the transform skip coefficient level, or the Rice parameter.
[0239] 29. The method described under Clause 28 further includes:
[0240] After the quantization process is performed and during the generation of the residual coefficients, the transform skip coefficient hierarchy is mapped to the modified transform skip coefficient hierarchy.
[0241] 30. The method described under Clause 28 further includes:
[0242] After performing the quantization process and before generating the residual coefficients, the transform skip coefficient hierarchy is mapped to the modified transform skip coefficient hierarchy.
[0243] 31. The method according to any one of clauses 28-30, wherein determining the Rice parameter comprises:
[0244] The Rice parameter is determined based on the modified transform jump coefficient hierarchy of the color components of the predicted block.
[0245] 32. The method according to Clause 31, wherein the modified transformation jump coefficient level of the color component is offset by a predetermined offset value.
[0246] 33. The method according to Clause 32, wherein the predetermined offset value is determined using a machine learning model during offline training.
[0247] 34. The method according to any one of clauses 23-33, wherein generating the residual coefficients comprises:
[0248] At least one of lossless compression or BDPCM is performed on the prediction residuals using diagonal scanning, wherein the maximum size of the prediction block used for diagonal scanning is 64.
[0249] 35. The method according to any one of clauses 23-33, wherein generating the residual coefficients comprises:
[0250] Based on the prediction block size being greater than 32, the prediction block is divided into multiple sub-blocks in terms of size; and
[0251] For each specific sub-block among multiple sub-blocks, at least one of lossless compression or BDPCM is performed on the prediction residuals associated with the specific sub-block using a diagonal scan, wherein the corresponding parameters and output results of the lossless compression process or BDPCM associated with the multiple sub-blocks are independent.
[0252] 36. The method described under Clause 35 further includes:
[0253] Based on the determination that the two dimensions of the prediction block are greater than 32, the prediction block is divided into multiple sub-blocks of the two dimensions.
[0254] 37. The method according to any one of clauses 35-36, wherein the corresponding parameters and output results of the lossless compression process or BDPCM associated with the plurality of sub-blocks include at least one of the following: a context model associated with the context coding technique, a Rice parameter, or a maximum number of context coding bins associated with the context coding technique.
[0255] 38. The method according to any one of clauses 34-37, wherein the unit of the diagonal scan is the set of coefficients.
[0256] 39. The method described under Clause 38 further includes:
[0257] Set the first indicator parameter for each coefficient group of a specific sub-block to indicate the coefficient values in the coefficient group.
[0258] 40. The method described under Clause 38 further includes:
[0259] A second indicator parameter is set for each specific sub-block among multiple sub-blocks. This second indicator parameter is used to indicate the value of all coefficient groups in the specific sub-block.
[0260] 41. The method described under clause 40 further includes:
[0261] For each coefficient group of the specific sub-block, a first indication parameter is set to indicate the coefficient values in the coefficient group; and
[0262] Based on the fact that the first indicator parameter of all coefficient groups preceding the last coefficient group of a specific sub-block is zero, the first indicator parameter of the last coefficient group is set to 1.
[0263] 42. The method according to any one of clauses 24-41, wherein generating the residual coefficients comprises:
[0264] Based on the parameters indicating the lossless coding mode, the residual coefficients are generated by lossless compression of the prediction residuals, wherein the maximum size of the luminance sample is 64.
[0265] 43. The method according to any one of the preceding clauses further includes:
[0266] Receive videos and images;
[0267] Divide the video images into multiple blocks;
[0268] A prediction block is generated by performing either intra-frame prediction or inter-frame prediction on the block; and
[0269] The prediction residual is generated by subtracting the prediction block from the block.
[0270] 44. An apparatus comprising:
[0271] The memory is configured to store instructions; and
[0272] The processor is configured to execute the following instructions:
[0273] Based on the maximum transformation size of the prediction block, determine which transformation process to skip on the prediction residuals; and
[0274] The maximum transform size is represented by a signal in the Sequence Parameter Set (SPS).
[0275] 45. A non-transitory computer-readable medium storing an instruction set, the instruction set being executable by at least one processor of a device to cause the device to perform a method, the method comprising:
[0276] Based on the maximum transformation size of the prediction block, determine which transformation process to skip on the prediction residuals; and
[0277] The maximum transform size is represented by a signal in the Sequence Parameter Set (SPS).
[0278] 46. A video processing method, comprising:
[0279] Receive the bitstream of the video sequence;
[0280] Based on the sequence parameter set (SPS) of the video sequence, the maximum transform size of the prediction block is determined; and
[0281] Based on the maximum transformation size, the transformation process for the prediction residuals of the prediction block is determined to be skipped.
[0282] 47. The method according to clause 46, wherein determining the transformation process to skip the predicted residuals comprises:
[0283] In response to determining that the size of the predicted block is not greater than a threshold, it is determined to skip the transformation process, the threshold having a maximum value equal to one of the following:
[0284] The maximum size of the brightness sample of the predicted block, or
[0285] Predict the maximum size of the block.
[0286] 48. The method according to Clause 47, wherein the size of the prediction block includes height or width.
[0287] 49. The method according to Clause 47, wherein the maximum value of the threshold is determined based at least on a first parameter in the SPS.
[0288] 50. The method according to Clause 49, wherein the value of the first parameter is 0 or 1.
[0289] 51. The method according to any one of clauses 47-50, wherein the maximum value of the threshold is 64.
[0290] 52. The method according to any one of clauses 47-50, wherein the maximum value of the threshold is 32.
[0291] 53. The method according to any one of clauses 47-52, wherein the maximum value of the threshold is determined based at least on a first parameter and a third parameter in the SPS.
[0292] 54. The method according to any one of clauses 47-53, wherein the minimum value of the threshold is 4.
[0293] 55. The method according to any one of clauses 47-54, wherein the threshold is equal to the maximum value of the size of the brightness sample used to indicate the brightness information of the predicted block.
[0294] 56. The method according to any one of clauses 49-54, wherein the maximum value of the threshold is determined based on the value of a second parameter in a second parameter set, and the value of the second parameter is determined based on the value of the first parameter.
[0295] 57. The method according to Clause 56, wherein the value of the second parameter is at least 0 and at most equal to the sum of the values of the first parameter and 3.
[0296] 58. The method according to Clause 56, wherein the second parameter has a first value in a first profile of the encoder and a second value in a second profile of the encoder, the first value and the second value being different.
[0297] 59. The method according to any one of clauses 56-58, wherein the second parameter set is SPS.
[0298] 60. The method according to any one of clauses 56-58, wherein the second parameter set is a picture parameter set (PPS).
[0299] 61. An apparatus comprising:
[0300] Memory, used to store instructions; and
[0301] The processor is configured to execute the following instructions:
[0302] Receive the bitstream of the video sequence;
[0303] Based on the sequence parameter set (SPS) of the video sequence, the maximum transform size of the prediction block is determined; and
[0304] Based on the maximum transformation size, the transformation process for the prediction residuals of the prediction block is determined to be skipped.
[0305] 62. A non-transitory computer-readable medium storing an instruction set executable by at least one processor of a device to cause the device to perform a method, the method comprising:
[0306] Receive the bitstream of the video sequence;
[0307] The maximum transform size of the prediction block is determined based on the sequence parameter set (SPS) of the video sequence; and
[0308] Based on the maximum transformation size, the transformation process for the prediction residuals of the prediction block is determined to be skipped.
[0309] It should be noted that the relational terms such as "first" and "second" used in this document are used only to distinguish one entity or operation from another, and do not require or imply any actual relationship or order between these entities or operations. Furthermore, words such as "contains," "has," "includes," and "includes," as well as other similar forms, have the same meaning and are open-ended, as one or more items following any of these words are not intended to be an exhaustive list of such items, or limited to the listed items.
[0310] As used herein, unless otherwise expressly stated, the term "or" covers all possible combinations unless impractical. For example, if a component is declared to include A or B, then unless otherwise expressly stated or impractical, the component may include A, or B, or A and B. As a second example, if a component is declared to include A, B, or C, then unless otherwise expressly stated or impractical, the component may include A, or B, or C, or A and B, or A and C, or B and C, or A and B and C.
[0311] It is understood that the above embodiments can be implemented by hardware or software (program code) or a combination of hardware and software. If implemented by software, it can be stored in the above-described computer-readable medium. When executed by a processor, the software can perform the disclosed methods. The computing units and other functional units described in this invention can be implemented by hardware, by software, or by a combination of hardware and software. It will also be understood by those skilled in the art that the above-described multiple modules / units can be combined into one module / unit, and each of the above-described modules / units can be further divided into multiple sub-modules / sub-units.
[0312] In the foregoing specification, embodiments have been described with reference to numerous specific details, which may vary with different implementations. Certain adjustments and modifications can be made to the described embodiments. Other embodiments will be apparent to those skilled in the art in consideration of the specifications and practices of the invention disclosed herein. The foregoing specification and embodiments are considered merely examples, and the true scope and spirit of the invention are indicated by the claims. The sequence of steps shown in the figures is also intended for illustrative purposes only and is not intended to limit to any particular order of steps. Therefore, those skilled in the art will understand that these steps may be performed in a different order while implementing the same method.
[0313] Exemplary embodiments have been disclosed in the accompanying drawings and description. However, many variations and modifications can be made to these embodiments. Therefore, although specific terms have been used, they are used in a general and descriptive sense only and not for limiting purposes.
Claims
1. A computer-implemented method for video encoding, comprising: Send the `sps_log2_transform_skip_max_size_minus2` parameter from the sequence parameter set, where... The sps_log2_transform_skip_max_size_minus2 parameter specifies the maximum block size to be skipped during the transformation, and the sps_log2_transform_skip_max_size_minus2 parameter is between 0 and 3; The MaxTsSize parameter is determined based on the sps_log2_transform_skip_max_size_minus2 parameter. Based on the MaxTsSize, it is determined whether the transform skip mode (TS mode) or block differential pulse code modulation (BDPCM mode) is allowed for the current block. The maximum block size allowed for both the TS mode and the BDPCM mode is set to be equal to the MaxTsSize parameter, which is the maximum transform skip size parameter. Determine the width and height of the transform block used for the chromaticity component Cr; and Based on the determination that one of the width and height of the transform block used for the chromaticity component Cr is greater than the size of the MaxTsSize parameter, the transform_skip_flag parameter is not sent via signal, wherein the transform_skip_flag parameter specifies whether to select the transform skip mode.
2. The computer-implemented method for video encoding according to claim 1 may be replaced by: the sps_log2_transform_skip_max_size_minus2 parameter taking a value in the range of 0 to 3 + sps_max_luma_transform_size_64_flag.
3. The computer-implemented method for video encoding according to claim 1, characterized in that: The value of the MaxTsSize parameter is 32 or 64.
4. A computer-implemented method for video decoding, comprising: The `sps_log2_transform_skip_max_size_minus2` parameter is received from the sequence parameter set, where... The sps_log2_transform_skip_max_size_minus2 parameter specifies the maximum block size to be skipped during the transformation, and the sps_log2_transform_skip_max_size_minus2 parameter is between 0 and 3; The MaxTsSize parameter is determined based on the sps_log2_transform_skip_max_size_minus2 parameter. Based on the MaxTsSize, it is determined whether the transform skip mode (TS mode) or block differential pulse code modulation (BDPCM mode) is allowed for the current block. The maximum block size allowed for both the TS mode and the BDPCM mode is set to be equal to the MaxTsSize parameter, which is the maximum transform skip size parameter. Determine the width and height of the transform block used for the chromaticity component Cr; and Based on the determination that one of the width and height of the transform block used for the chromaticity component Cr is greater than the size of the MaxTsSize parameter, the receive transform_skip_flag parameter is skipped, wherein the transform_skip_flag parameter specifies whether to select the transform skip mode.
5. The computer-implemented method for video decoding according to claim 4 may be replaced by: the sps_log2_transform_skip_max_size_minus2 parameter taking a value in the range of 0 to 3 + sps_max_luma_transform_size_64_flag.
6. The computer-implemented method for video decoding according to claim 4, characterized in that: The value of the MaxTsSize parameter is 32 or 64.
7. A non-transitory computer-readable storage medium storing an instruction set and a video bitstream, the instruction set being executable by one or more processors in a method to generate the video bitstream, the method comprising: Send the sps_log2_transform_skip_max_size_minus2 parameter from the sequence parameter set, wherein the sps_log2_transform_skip_max_size_minus2 parameter specifies the maximum block size to be skipped for the transformation, and the sps_log2_transform_skip_max_size_minus2 parameter is between 0 and 3; The MaxTsSize parameter is determined based on the sps_log2_transform_skip_max_size_minus2 parameter. Based on the MaxTsSize, it is determined whether the transform skip mode (TS mode) or block differential pulse code modulation (BDPCM mode) is allowed for the current block. The maximum block size allowed for both the TS mode and the BDPCM mode is set to be equal to the MaxTsSize parameter, which is the maximum transform skip size parameter. Determine the width and height of the transform block used for the chromaticity component Cr; and Based on the determination that one of the width and height of the transform block used for the chromaticity component Cr is greater than the size of the MaxTsSize parameter, the transform_skip_flag parameter is not sent via signal, wherein the transform_skip_flag parameter specifies whether to select the transform skip mode.
8. The non-transitory computer-readable storage medium of claim 7 may be replaced by a value in which the sps_log2_transform_skip_max_size_minus2 parameter is in the range of 0 to 3 + sps_max_luma_transform_size_64_flag.
9. The non-transitory computer-readable storage medium according to claim 7, characterized in that: The value of the MaxTsSize parameter is 32 or 64.