Method and apparatus for transform and coefficient signaling
By arranging the LFNST and MTS indices before receiving chroma transformation samples, efficient video data decoding is achieved, solving the problem of low coding efficiency for high-resolution video data and improving coding efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
- Filing Date
- 2021-04-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing video coding technologies are inefficient when processing high-resolution video data, making it difficult to effectively compress data volume while maintaining image quality.
A unified data structure is used to arrange the Low Frequency Inseparable Transform (LFNST) index and the Multiple Transform Selection (MTS) index before receiving the chroma transform samples, so that the inverse transform operation of the luminance transform samples can be started immediately after these indices are received, thereby improving decoding efficiency.
It speeds up the video data decoding process and improves the encoding efficiency of high-resolution video data.
Smart Images

Figure CN115398911B_ABST
Abstract
Description
[0001] Cross-reference to related applications
[0002] This application claims priority to U.S. Provisional Application No. 63 / 005,420, filed April 5, 2020, the entire contents of which are incorporated herein by reference. Technical Field
[0003] This application generally relates to video encoding and decoding and compression. More specifically, this application relates to methods and apparatus for improving existing designs of transform and coefficient encoding / decoding methods in the Universal Video Coding (VVC) standard. Background Technology
[0004] Various electronic devices (such as digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video game consoles, smartphones, video conferencing equipment, video streaming devices, etc.) support digital video. Electronic devices transmit, receive, encode, decode, and / or store digital video data by implementing video compression / decompression as defined in the MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4 Part 10, Advanced Video Codec (AVC), High Efficiency Video Codec (HEVC), and Universal Video Codec (VVC) standards. Video compression typically involves performing spatial (intra-frame) prediction and / or temporal (inter-frame) prediction to reduce or remove redundancy inherent in the video data. For block-based video codecs, video frames are divided into one or more stripes, each strip containing multiple video blocks, also known as coding tree units (CTUs). Each CTU may contain a coding unit (CU) or be recursively divided into smaller CUs until a predefined minimum CU size is reached. Each CU (also called a leaf CU) contains one or more transform units (TUs) and each CU also contains one or more prediction units (PUs). Each CU can be encoded and decoded in intra-frame, inter-frame, or IBC modes. Video blocks in an intra-frame coded (I) strip of a video frame are encoded using spatial predictions about reference samples in adjacent blocks within the same video frame. Video blocks in an inter-frame coded (P or B) strip of a video frame can use spatial predictions about reference samples in adjacent blocks within the same video frame or temporal predictions about reference samples in other previous and / or future reference video frames.
[0005] A prediction block for the current video block to be encoded is derived based on spatial or temporal prediction of previously encoded reference blocks (e.g., neighboring blocks). The process of finding the reference block can be accomplished using a block-matching algorithm. The residual data representing the pixel difference between the current block to be encoded and the prediction block is called the residual block or prediction error. Inter-frame coded blocks are encoded based on the residual block and the motion vector pointing to the reference block forming the prediction block in the reference frame. The process of determining the motion vector is typically called motion estimation. Intra-frame coded blocks are encoded based on the intra-frame prediction mode and the residual block. For further compression, the residual block is transformed from the pixel domain to the transform domain (e.g., the frequency domain) to obtain residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to produce a one-dimensional vector of transform coefficients, which is then entropy-encoded into the video bitstream for even more compression.
[0006] The encoded video bitstream is then stored in a computer-readable storage medium (e.g., flash memory) for access by another electronic device with digital video capabilities or transmitted directly to the electronic device via wired or wireless means. The electronic device then performs video decompression (a process opposite to the video compression described above), for example, by parsing the encoded video bitstream to obtain syntax elements from the bitstream, and reconstructs the digital video data from the encoded video bitstream to its original format based at least in part on the syntax elements obtained from the bitstream, and presents the reconstructed digital video data on the display of the electronic device.
[0007] As digital video quality evolves from high definition to 4K×2K or even 8K×4K, the amount of video data to be encoded / decoded grows exponentially. Maintaining image quality while efficiently encoding / decoding video data remains a long-standing challenge. Summary of the Invention
[0008] This application describes implementations related to video data encoding and decoding, and more specifically, methods and apparatuses for improving existing designs of transform and coefficient encoding methods. A unified data structure for transform samples and syntax elements is applied to the tree and split tree partitions of the transform unit. This unified data structure arranges a Low Frequency Inseparable Transform (LFNST) index before the chroma transform skip flag and the chroma transform sample, such that the inverse LFNST operation defined based on the LFNST index can be applied at least partially before or simultaneously with the reception of the chroma transform sample. In some embodiments, the unified data structure also arranges a Multiple Transform Selection (MTS) index before the chroma transform skip flag and the chroma transform sample, such that the inverse primary transform selected based on the MTS index can be applied at least partially before or simultaneously with the reception of the chroma transform sample. In these ways, the inverse LFNST operation or primary transform of the luminance transform sample can be initiated immediately after receiving the LFNST or MTS index, without waiting for the LFNST or MTS index to be received after the chroma transform sample. This unified data structure speeds up the decoding process of the transform unit.
[0009] In one aspect of this application, a method for decoding video data is implemented. The method includes: receiving a luminance transition skip flag and a plurality of luminance transition samples of a transform unit via a bitstream of the transform unit; and receiving an LFNST index associated with the transform unit via the bitstream. The method further includes: after receiving the LFNST index, receiving a chroma transition skip flag and a plurality of chroma transition samples associated with the transform unit via the bitstream. The method further includes: based on a determination that the LFNST index is not zero and the luminance transition skip flag is zero, applying an inverse LFNST to these luminance transition samples to generate a plurality of first decoded luminance samples for the transform unit.
[0010] According to another aspect of this application, an electronic device includes one or more processing units, a memory, and a plurality of programs stored in the memory. When executed by the one or more processing units, these programs cause the electronic device to perform the aforementioned method for decoding video data.
[0011] According to another aspect of this application, a non-transitory computer-readable storage medium stores a plurality of programs executed by an electronic device including one or more processing units. When executed by the one or more processing units, these programs cause the electronic device to implement the above-described method for decoding video data. Attached Figure Description
[0012] The accompanying drawings, which are incorporated in and form part of this specification, are included to provide a further understanding of the embodiments and serve to illustrate the described embodiments and, together with this description, to explain the principles of this disclosure. The same reference numerals denote corresponding parts.
[0013] Figure 1 This is a block diagram illustrating an exemplary video encoding and decoding system according to some embodiments.
[0014] Figure 2 This is a block diagram illustrating an exemplary video encoder according to some embodiments.
[0015] Figure 3 This is a block diagram illustrating an exemplary video decoder according to some embodiments.
[0016] Figures 4A to 4E This is a block diagram illustrating how, according to some embodiments, a frame is recursively divided into multiple video blocks of different sizes and shapes.
[0017] Figure 5 This is a block diagram illustrating an example of transform coefficient encoding / decoding utilizing context encoding / decoding and bypass encoding / decoding according to some embodiments.
[0018] Figure 6 This is a block diagram illustrating an example of a context-adaptive binary arithmetic coding (CABAC) engine according to some embodiments.
[0019] Figure 7 This is a block diagram illustrating an exemplary low-frequency non-separable transform (LFNST) process according to some embodiments, which is a secondary transform that compresses the energy of the transform coefficients of the intra-coded block after the primary transform.
[0020] Figure 8 This is a block diagram illustrating an exemplary transform block with non-zero transform coefficients according to some embodiments.
[0021] Figure 9 This is a table illustrating exemplary multiple transform selection (MTS) schemes for transforming the residuals of inter-frame and intra-frame coded blocks according to some embodiments.
[0022] Figure 10 This is a flowchart illustrating an exemplary process by which a video encoder, according to some embodiments, conditionally implements a technique for transmitting the LFNST via a signal based on different components of a transform block.
[0023] Figure 11A This is an exemplary split-tree data structure for encoding a bitstream of transform units, according to some embodiments. Figure 11BThis is an exemplary single-tree data structure for encoding a bitstream of video data for a transform unit, according to some embodiments.
[0024] Figure 12 This is a flowchart illustrating a method for decoding video data according to some embodiments. Detailed Implementation
[0025] Referring now to the detailed description, examples of which are illustrated in the accompanying drawings. Numerous non-limiting specific details are set forth in the following detailed description to aid in understanding the subject matter presented herein. However, it will be apparent to those skilled in the art that various alternatives may be used without departing from the scope of the claims, and that the subject matter may be practiced without these specific details. For example, it will be apparent to those skilled in the art that the subject matter presented herein can be implemented on many types of electronic devices with digital video capabilities.
[0026] Figure 1 This is a block diagram illustrating an exemplary system for encoding and decoding video blocks in parallel according to some embodiments. Figure 1 As shown, system 10 includes source device 12, which generates and encodes video data that will later be decoded by target device 14. Source device 12 and target device 14 can include any electronic device from a wide variety of electronic devices, including desktop or laptop computers, tablet computers, smartphones, set-top boxes, digital televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, etc. In some embodiments, source device 12 and target device 14 are equipped with wireless communication capabilities.
[0027] In some implementations, target device 14 may receive encoded video data to be decoded via link 16. Link 16 may include any type of communication medium or device capable of moving encoded video data from source device 12 to target device 14. In one example, link 16 may include a communication medium enabling source device 12 to transmit encoded video data directly to target device 14 in real time. The encoded video data may be modulated according to a communication standard (e.g., a wireless communication protocol) and transmitted to target device 14. The communication medium may include any wireless or wired communication medium, such as radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network (e.g., a local area network, a wide area network, or a global network such as the Internet). The communication medium may include a router, switch, base station, or any other means that may facilitate communication from source device 12 to target device 14.
[0028] In some other implementations, encoded video data can be sent from output interface 22 to storage device 32. The target device 14 can then access the encoded video data in storage device 32 via input interface 28. Storage device 32 can include any data storage medium of various distributed or locally accessed data storage media, such as hard disk drives, Blu-ray discs, digital universal discs (DVDs), compact disc read-only memory (CD-ROMs), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In another example, storage device 32 can correspond to a file server or another intermediate storage device that can hold the encoded video data generated by source device 12. Target device 14 can access the stored video data from storage device 32 via streaming or downloading. The file server can be any type of computer capable of storing and sending encoded video data to target device 14. Exemplary file servers include web servers (e.g., for websites), file transfer protocol (FTP) servers, network attached storage (NAS) devices, or local disk drives. Target device 14 can access the encoded video data via any standard data connection suitable for accessing encoded video data stored on the file server. Standard data connections include wireless channels (e.g., Wi-Fi connections), wired connections (e.g., digital subscriber line (DSL), cable modems, etc.), or a combination of both. Transfer of encoded video data from storage device 32 can be streaming, downloading, or a combination of both.
[0029] like Figure 1 As shown, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Video source 18 may include sources or combinations of such sources, such as: a video capture device (e.g., a camera), a video archive containing previously captured video, a video feed interface for receiving video from a video content provider, and / or a computer graphics system for generating computer graphics data as source video. As an example, if video source 18 is a camera in a security monitoring system, source device 12 and target device 14 may form a camera phone or video phone. However, the embodiments described in this application are generally applicable to video encoding and decoding and can be applied to wireless and / or wired applications.
[0030] The captured, pre-captured, or computer-generated video can be encoded by the video encoder 20. The encoded video data can be sent directly to the target device 14 via the output interface 22 of the source device 12. Alternatively, the encoded video data can be stored on the storage device 32 for later access by the target device 14 or other devices for decoding and / or playback. The output interface 22 may further include a modem and / or transmitter.
[0031] Target device 14 includes an input interface 28, a video decoder 30, and a display device 34. Input interface 28 may include a receiver and / or a modem, and receives encoded video data via link 16. The encoded video data transmitted via link 16 or provided on storage device 32 may include various syntax elements generated by video encoder 20 for use by video decoder 30 when decoding the video data. Such syntax elements may be included within encoded video data transmitted on a communication medium, stored on a storage medium, or stored on a file server.
[0032] In some embodiments, the target device 14 may include a display device 34, which may be an integrated display device or an external display device configured to communicate with the target device 14. The display device 34 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or another type of display device.
[0033] Video encoder 20 and video decoder 30 can operate according to proprietary or industry standards (e.g., VVC, HEVC, MPEG-4 Part 10, AVC) or extensions of such standards. It should be understood that this application is not limited to a specific video encoding / decoding standard and can be applied to other video encoding / decoding standards. It is generally understood that the video encoder 20 of source device 12 can be configured to encode video data according to any of these current or future standards. Similarly, it is also generally understood that the video decoder 30 of target device 14 can be configured to decode video data according to any of these current or future standards.
[0034] The video encoder 20 and video decoder 30 can be implemented as any circuit of a variety of suitable encoder and / or decoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic devices, software, hardware, firmware, or any combination thereof. When partially implemented in software, the electronic device may store instructions for software in a suitable non-transitory computer-readable medium and use one or more processors to execute the instructions in hardware to perform the video encoding / decoding operations disclosed in this disclosure. Each of the video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, and either encoder or decoder may be integrated as part of a combined encoder / decoder (CODEC) in the respective device.
[0035] Figure 2 This is a block diagram illustrating an exemplary video encoder 20 according to some embodiments described in this application. The video encoder 20 can perform intra-frame predictive coding and inter-frame predictive coding on video blocks within a video frame. Intra-frame predictive coding relies on spatial prediction to reduce or remove spatial redundancy in the video data within a given video frame or picture. Inter-frame predictive coding relies on temporal prediction to reduce or remove temporal redundancy in the video data within neighboring video frames or pictures of a video sequence.
[0036] like Figure 2 As shown, the video encoder 20 includes a video data memory 40, a prediction processing unit 41, a decoded picture buffer (DPB) 64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy coding unit 56. The prediction processing unit 41 further includes a motion estimation unit 42, a motion compensation unit 44, a segmentation unit 45, an intra-frame prediction processing unit 46, and an intra-frame block copying (BC) unit 48. In some embodiments, the video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and an adder 62 for video block reconstruction. A loop filter 63, such as a deblocking filter, can be located between the adder 62 and the DPB 64 to filter block boundaries to remove block artifacts from the reconstructed video. In addition to the deblocking filter, another loop filter (e.g., a sample adaptive offset (SAO) filter and / or an adaptive loop filter (ALF)) can be used to filter the output of the adder 62. In some examples, the loop filter can be omitted, and the decoded video blocks can be directly provided to the DPB 64 by the adder 62. The video encoder 20 may take the form of a fixed or programmable hardware unit, or may be distributed among one or more of the fixed or programmable hardware units described.
[0037] The video data storage device 40 can store video data encoded by the components of the video encoder 20. For example, it can store data from... Figure 1 The video source 18 shown receives video data from the video data memory 40. The DPB 64 is a buffer that stores reference video data (e.g., reference frames or pictures) used by the video encoder 20 (e.g., in intra-frame or inter-frame predictive coding modes) when encoding the video data. The video data memory 40 and DPB 64 can be formed from any of a variety of memory devices. In various examples, the video data memory 40 may be on-chip along with other components of the video encoder 20, or off-chip relative to those components.
[0038] like Figure 2 As shown, after receiving video data, the segmentation unit 45 within the prediction processing unit 41 segments the video data into video blocks. This segmentation may also include segmenting the video frame into stripes, tiles, or other larger coding units (CUs) according to a predefined splitting structure (e.g., a quadtree (QT) structure) associated with the video data. The prediction processing unit 41 may select one of several feasible predictive coding modes for the current video block based on error results (e.g., coding rate and distortion level), such as one or more inter-frame predictive coding modes among multiple intra-frame predictive coding modes. The prediction processing unit 41 may provide the resulting intra-frame predictive coding block or inter-frame predictive coding block to adder 50 to generate a residual block, and to adder 62 to reconstruct the coding block for subsequent use as part of a reference frame. The prediction processing unit 41 also provides syntax elements (e.g., motion vectors, intra-frame mode indicators, segmentation information, and other such syntax information) to entropy coding unit 56.
[0039] To select a suitable intra-predictive coding mode for the current video block, the intra-predictive processing unit 46 within the prediction processing unit 41 can perform intra-predictive coding of the current video block in relation to one or more adjacent blocks in the same frame as the current block to be encoded to provide spatial prediction. The motion estimation unit 42 and motion compensation unit 44 within the prediction processing unit 41 perform inter-predictive coding of the current video block in relation to one or more prediction blocks in one or more reference frames to provide temporal prediction. The video encoder 20 can perform multiple coding passes, for example, to select a suitable coding mode for each block of video data.
[0040] In some implementations, motion estimation unit 42 determines an inter-frame prediction mode for the current video frame by generating motion vectors based on a predetermined pattern within the video frame sequence. The motion vectors indicate the displacement of a video block within the current video frame relative to a prediction block within a reference video frame. Motion estimation performed by motion estimation unit 42 is the process of generating motion vectors that estimate the motion of video blocks. For example, the motion vectors may indicate the displacement of the PU (Program Unit) of a video block within the current video frame or picture relative to a prediction block within a reference frame associated with the current block being encoded within the current frame (or another encoded unit). The predetermined pattern may designate video frames in the sequence as P-frames or B-frames. Intra-frame BC unit 48 may determine vectors (e.g., block vectors) for intra-frame BC encoding in a similar manner to how motion estimation unit 42 determines motion vectors for inter-frame prediction, or the block vectors may be determined using motion estimation unit 42.
[0041] Regarding pixel differences, the predicted block for a video block can be, or can correspond to, a block or reference block of a reference frame considered to closely match the video block to be encoded. Pixel differences can be determined by the sum of absolute differences (SAD), sum of squared differences (SSD), or other difference metrics. In some implementations, the video encoder 20 can compute values for sub-integer pixel positions of the reference frame stored in the DPB 64. For example, the video encoder 20 can interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of the reference frame. Therefore, the motion estimation unit 42 can perform motion search relative to full-pixel positions and fractional pixel positions and output a motion vector with fractional-pixel accuracy.
[0042] The motion estimation unit 42 calculates the motion vector for a video block in an inter-frame predictive coding frame by comparing the position of the video block with the position of the predicted block in a reference frame selected from either a first reference frame list (list 0) or a second reference frame list (list 1), where each reference frame list identifies one or more reference frames stored in the DPB 64. The motion estimation unit 42 sends the calculated motion vector to the motion compensation unit 44, and then to the entropy coding unit 56.
[0043] Motion compensation performed by motion compensation unit 44 may involve acquiring or generating prediction blocks based on motion vectors determined by motion estimation unit 42. Upon receiving motion vectors for the current video block, motion compensation unit 44 may locate the prediction block pointed to by the motion vector in a reference frame list within a reference frame list, retrieve the prediction block from DPB 64, and forward the prediction block to adder 50. Adder 50 then forms a residual video block of pixel differences by subtracting the pixel values of the prediction block provided by motion compensation unit 44 from the pixel values of the currently encoded video block. The pixel differences forming the residual video block may include a luminance difference component or a chrominance difference component, or both. Motion compensation unit 44 may also generate syntax elements associated with video blocks of a video frame for use by video decoder 30 when decoding video blocks of a video frame. Syntax elements may include, for example, syntax elements defining motion vectors for identifying prediction blocks, any flags indicating prediction modes, or any other syntax information described herein. It should be noted that motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are described separately for conceptual purposes.
[0044] In some implementations, the intra-BC unit 48 can generate vectors and acquire prediction blocks in a manner similar to that described above in conjunction with the motion estimation unit 42 and the motion compensation unit 44; however, these prediction blocks are in the same frame as the current block being encoded, and these vectors are referred to as block vectors rather than motion vectors. Specifically, the intra-BC unit 48 can determine the intra-prediction mode to be used for encoding the current block. In some examples, the intra-BC unit 48 can, for example, use various intra-prediction modes to encode the current block during individual encoding passes and test their performance through rate-distortion analysis. Next, the intra-BC unit 48 can select a suitable intra-prediction mode from the various tested intra-prediction modes to use and generate an intra-mode indicator accordingly. For example, the intra-BC unit 48 can use rate-distortion analysis to calculate rate-distortion values for the various tested intra-prediction modes and select the intra-prediction mode with the best rate-distortion characteristics from the tested modes as the suitable intra-prediction mode to use. Rate-distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was encoded to produce the coded block, as well as the bit rate (i.e., the number of bits) used to produce the coded block. Intra-frame BC unit 48 can calculate the ratio based on the distortion and rate for various coded blocks to determine which intra-frame prediction mode exhibits the optimal rate-distortion value for the block.
[0045] In other examples, the intra-frame BC unit 48 may use, in whole or in part, the motion estimation unit 42 and the motion compensation unit 44 to perform such functions for intra-frame BC prediction according to the embodiments described herein. In any case, for intra-frame block copying, in terms of pixel differences, the predicted block may be a block considered to closely match the block to be encoded, the pixel differences may be determined by the sum of absolute differences (SAD), the sum of squared differences (SSD), or other difference metrics, and identifying the predicted block may include calculating values for sub-integer pixel positions.
[0046] Regardless of whether the predicted block comes from the same frame predicted intra-frame or from different frames predicted inter-frame, the video encoder 20 can form a residual video block by subtracting the pixel values of the predicted block from the pixel values of the current video block being encoded. The pixel difference forming the residual video block can include both luma component difference and chroma component difference.
[0047] As an alternative to the inter-frame prediction performed by the motion estimation unit 42 and the motion compensation unit 44 as described above, or the intra-block copy prediction performed by the intra-BC unit 48, the intra-prediction processing unit 46 can perform intra-frame prediction on the current video block. Specifically, the intra-prediction processing unit 46 can determine an intra-prediction mode for encoding the current block. To this end, the intra-prediction processing unit 46 can use various intra-prediction modes to encode the current block, for example, during individual encoding passes, and the intra-prediction processing unit 46 (or, in some examples, the mode selection unit) can select a suitable intra-prediction mode from the tested intra-prediction modes for use. The intra-prediction processing unit 46 can provide information indicating the intra-prediction mode selected for the block to the entropy coding unit 56. The entropy coding unit 56 can encode the information indicating the selected intra-prediction mode into the bitstream.
[0048] After prediction processing unit 41 determines the prediction block for the current video block via inter-frame prediction or intra-frame prediction, adder 50 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be included in one or more TUs and provided to transform processing unit 52. Transform processing unit 52 uses a transform (e.g., discrete cosine transform (DCT) or a conceptually similar transform) to transform the residual video data into residual transform coefficients.
[0049] The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process can also reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some examples, the quantization unit 54 can subsequently perform a scan on the matrix including the quantized transform coefficients. Alternatively, the entropy coding unit 56 can perform the scan.
[0050] After quantization, the entropy coding unit 56 entropy codes the quantized transform coefficients into a video bitstream using, for example, context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probabilistic interval segmented entropy (PIPE) coding, or another entropy coding method or technique. The encoded bitstream can then be sent to, for example,... Figure 1 The video decoder 30 shown, or archived in, for example Figure 1 The data is stored in storage device 32 for later transmission to or retrieval by video decoder 30. Entropy coding unit 56 can also entropy code the motion vectors and other syntax elements used for the current video frame being encoded.
[0051] The inverse quantization unit 58 and the inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct residual video blocks in the pixel domain for generating reference blocks to predict other video blocks. As noted above, the motion compensation unit 44 can generate motion-compensated prediction blocks from one or more reference blocks of frames stored in the DPB 64. The motion compensation unit 44 can also apply one or more interpolation filters to the prediction blocks to compute sub-integer pixel values for use in motion estimation.
[0052] Adder 62 adds the reconstructed residual block to the motion-compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in DPB 64. The reference block can then be used as a prediction block by intra-frame BC unit 48, motion estimation unit 42, and motion compensation unit 44 for inter-frame prediction of another video block in subsequent video frames.
[0053] Figure 3 This is a block diagram illustrating an exemplary video decoder 30 according to some embodiments of this application. The video decoder 30 includes a video data memory 79, an entropy decoding unit 80, a prediction processing unit 81, an inverse quantization unit 86, an inverse transform processing unit 88, an adder 90, and a DPB 92. The prediction processing unit 81 further includes a motion compensation unit 82, an intra-frame prediction unit 84, and an intra-frame BC unit 85. The video decoder 30 can perform operations in conjunction with the above. Figure 2 The decoding process described for the video encoder 20 is essentially the inverse of the encoding process. For example, the motion compensation unit 82 can generate prediction data based on the motion vectors received from the entropy decoding unit 80, while the intra-frame prediction unit 84 can generate prediction data based on the intra-frame prediction mode indicator received from the entropy decoding unit 80.
[0054] In some examples, units of the video decoder 30 may be assigned tasks to perform embodiments of this application. Furthermore, in some examples, embodiments of this disclosure may be distributed across one or more units of the video decoder 30. For example, the intra-frame BC unit 85 may perform embodiments of this application individually or in combination with other units of the video decoder 30 (e.g., motion compensation unit 82, intra-frame prediction unit 84, and entropy decoding unit 80). In some examples, the video decoder 30 may not include the intra-frame BC unit 85, and the functionality of the intra-frame BC unit 85 may be performed by other components of the prediction processing unit 81 (e.g., motion compensation unit 82).
[0055] Video data memory 79 can store video data, such as encoded video bitstreams, that will be decoded by other components of video decoder 30. The video data stored in video data memory 79 can be obtained, for example, from storage device 32, from a local video source (e.g., a camera), via wired or wireless network communication of video data, or by accessing a physical data storage medium (e.g., a flash drive or hard disk). Video data memory 79 may include an encoded picture buffer (CPB) that stores encoded video data from the encoded video bitstream. The DPB 92 of video decoder 30 stores reference video data for use by video decoder 30 (e.g., in intra-frame or inter-frame predictive coding modes) when decoding video data. Video data memory 79 and DPB 92 can be formed of any memory device from a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. For illustrative purposes, video data memory 79 and DPB 92 are shown in... Figure 3 The video data memory 79 and DPB 92 are depicted as two distinct components of the video decoder 30. However, it will be apparent to those skilled in the art that the video data memory 79 and DPB 92 may be provided by the same memory device or separate memory devices. In some examples, the video data memory 79 may be on-chip along with other components of the video decoder 30, or off-chip relative to those components.
[0056] During the decoding process, the video decoder 30 receives an encoded video bitstream representing video blocks of encoded video frames and associated syntax elements. The video decoder 30 may receive syntax elements at the video frame level and / or the video block level. The entropy decoding unit 80 of the video decoder 30 performs entropy decoding on the bitstream to generate quantization coefficients, motion vectors or intra-prediction mode indicators, and other syntax elements. The entropy decoding unit 80 then forwards the motion vectors or intra-prediction mode indicators, and other syntax elements to the prediction processing unit 81.
[0057] When a video frame is encoded as an intra-predictive coded (I) frame or used as an intra-coded prediction block in other types of frames, the intra-prediction unit 84 of the prediction processing unit 81 can generate prediction data for the video block of the current video frame based on the intra-prediction mode transmitted by the signal and reference data from the previous decoded block of the current frame.
[0058] When a video frame is encoded as an inter-frame predictive coded (i.e., B or P) frame, the motion compensation unit 82 of the prediction processing unit 81 generates one or more prediction blocks for the current video frame based on motion vectors and other syntax elements received from the entropy decoding unit 80. Each of the prediction blocks can be generated from a reference frame within a reference frame list. The video decoder 30 can construct the reference frame list, i.e., list 0 and list 1, based on the reference frames stored in the DPB 92 using a default construction technique.
[0059] In some examples, when a video block is encoded according to the intra-frame BC mode described herein, the intra-frame BC unit 85 of the prediction processing unit 81 generates a prediction block for the current video block based on the block vector and other syntax elements received from the entropy decoding unit 80. The prediction block can be located within a reconstructed region of the same image as the current video block, as defined by the video encoder 20.
[0060] Motion compensation unit 82 and / or intra-frame prediction (BC) unit 85 determine prediction information for video blocks in the current video frame by parsing motion vectors and other syntax elements, and then use this prediction information to generate prediction blocks for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine the prediction mode (e.g., intra-frame prediction or inter-frame prediction) for encoding video blocks in the video frame, the inter-frame prediction frame type (e.g., B or P), construction information for one or more reference frames in the reference frame list for the frame, motion vectors for each inter-frame prediction encoded video block in the frame, the inter-frame prediction state for each inter-frame prediction encoded video block in the frame, and other information for decoding video blocks in the current video frame.
[0061] Similarly, the intra-BC unit 85 can use some of the received syntax elements, such as flags, to determine which video blocks in the frame are predicted using the intra-BC mode, which video blocks in the frame are in the reconstruction region and should be stored in the DPB 92, the block vector for each intra-BC predicted video block in the frame, the intra-BC prediction state for each intra-BC predicted video block in the frame, and other information for decoding video blocks in the current video frame.
[0062] The motion compensation unit 82 can also perform interpolation using interpolation filters, such as those used by the video encoder 20 during encoding of video blocks, to calculate interpolated values for sub-integer pixels of the reference block. In this case, the motion compensation unit 82 can determine the interpolation filters used by the video encoder 20 based on the received syntax elements and use these interpolation filters to generate the prediction block.
[0063] The dequantization unit 86 dequantizes the quantized transform coefficients provided in the bitstream and entropy-decoded by the entropy decoding unit 80 using the same quantization parameters calculated by the video encoder 20 for each video block in the video frame to determine the degree of quantization. The inverse transform processing unit 88 applies an inverse transform (e.g., inverse DCT, inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to reconstruct the residual block in the pixel domain.
[0064] After the motion compensation unit 82 or the intra-frame BC unit 85 generates a prediction block for the current video block based on vectors and other syntax elements, the adder 90 reconstructs the decoded video block for the current video block by adding the residual block from the inverse transform processing unit 88 to the corresponding prediction block generated by the motion compensation unit 82 and the intra-frame BC unit 85. A loop filter 91 (e.g., a deblocking filter, a SAO filter, and / or an ALF) may be located between the adder 90 and the DPB 92 for further processing of the decoded video block. In some examples, the loop filter 91 may be omitted, and the decoded video block may be directly provided to the DPB 92 by the adder 90. The decoded video block in a given frame is then stored in the DPB 92, which stores reference frames for subsequent motion compensation of the next video block. The DPB 92 or a separate memory device may also store the decoded video for later presentation on a display device (e.g., ...). Figure 1 On the display device 34).
[0065] In a typical video coding process, a video sequence usually consists of an ordered set of frames or images. Each frame may include three sample arrays, denoted as SL, SCb, and SCr. SL is a two-dimensional array of luminance samples. SCb is a two-dimensional array of Cb chrominance samples. SCr is a two-dimensional array of Cr chrominance samples. In other instances, a frame may be monochrome and therefore consist of only a two-dimensional array of luminance samples.
[0066] like Figure 4AAs shown, the video encoder 20 (or more specifically, the segmentation unit 45) generates an encoded representation of a frame by first segmenting the frame into a set of CTUs. A video frame may include an integer number of CTUs ordered consecutively from left to right and top to bottom in raster scan order. Each CTU is the largest logical coding unit, and the width and height of the CTU are signaled by the video encoder 20 in a sequence parameter set such that all CTUs in the video sequence have the same size, such as 128×128, 64×64, 32×32, and 16×16. However, it should be noted that this application is not limited to a specific size. Figure 4B As shown, each CTU may include a CTB for the luma sample, two corresponding coding tree blocks for the chroma sample, and syntax elements for encoding the samples of the coding tree block. The syntax elements describe the properties of different types of units in the coded pixel block and how the video sequence can be reconstructed at the video decoder 30, including inter-frame prediction or intra-frame prediction, intra-frame prediction mode, motion vectors, and other parameters. In a monochrome image or an image with three separate color planes, the CTU may include a single coding tree block and syntax elements for encoding the samples of that coding tree block. The coding tree block can be an N×N sample block.
[0067] To achieve better performance, the video encoder 20 can recursively perform tree splitting on the coding tree blocks of the CTU, such as binary tree splitting, ternary tree splitting, quadtree splitting, or combinations thereof, and divide the CTU into smaller CUs. Figure 4C As described, the 64×64 CTU 400 is first divided into four smaller CUs, each with a block size of 32×32. Of these four smaller CUs, CU 410 and CU 420 are each divided into four CUs with a block size of 16×16. The two 16×16 CUs, 430 and CU 440, are further divided into four CUs with a block size of 8×8. Figure 4D Depicting as shown Figure 4C The final result of the CTU 400 partitioning process described in the figure is a quadtree data structure, where each leaf node of the quadtree corresponds to a CU of a corresponding size ranging from 32×32 to 8×8. Similar to... Figure 4B The CTU depicted in the image can include, for example, two corresponding coded blocks (CBs) of luminance and chrominance samples of the same size frame, as well as syntax elements for encoding the samples of the coded blocks. In monochrome images or images with three separate color planes, a CU can include a single coded block and a syntax structure for encoding the samples of the coded block. It should be noted that... Figure 4C and Figure 4DThe quadtree partitioning depicted is for illustrative purposes only, and a CTU can be split into multiple CUs based on quadtree / ternary / binary partitioning to adapt to varying local characteristics. In multi-type tree structures, a CTU is partitioned according to a quadtree structure, and each quadtree leaf CU can be further partitioned according to binary and ternary tree structures. Figure 4E As shown, a coded block with width W and height H has five possible segmentation types: quad segmentation, horizontal binary segmentation, vertical binary segmentation, horizontal triple segmentation, and vertical triple segmentation.
[0068] In some implementations, the video encoder 20 may further segment the coded blocks of the CU into one or more (M×N) PBs. A PB is a rectangular (square or non-square) sample block to which the same prediction (inter-frame or intra-frame) is applied. The PU of the CU may include a PB for luma samples, two corresponding PBs for chroma samples, and syntax elements for predicting the PBs. In a monochrome image or an image with three separate color planes, a PU may include a single PB and a syntax structure for predicting the PBs. The video encoder 20 may generate predicted luma blocks, predicted Cb blocks, and predicted Cr blocks for each PU of the CU, representing the luma PB, Cb PB, and Cr PB.
[0069] Video encoder 20 can generate prediction blocks for a PU using intra-frame prediction or inter-frame prediction. If video encoder 20 uses intra-frame prediction to generate prediction blocks for a PU, then video encoder 20 can generate prediction blocks for a PU based on decoded samples of the frame associated with the PU. If video encoder 20 uses inter-frame prediction to generate prediction blocks for a PU, then video encoder 20 can generate prediction blocks for a PU based on decoded samples of one or more frames other than the frame associated with the PU.
[0070] After the video encoder 20 generates predicted luminance blocks, predicted Cb blocks, and predicted Cr blocks for one or more PUs of the CU, the video encoder 20 can generate luminance residual blocks for the CU by subtracting the predicted luminance blocks of the CU from the original luminance coding blocks of the CU, such that each sample in the luminance residual block of the CU indicates the difference between a luminance sample in one of the predicted luminance blocks of the CU and a corresponding sample in the original luminance coding block of the CU. Similarly, the video encoder 20 can generate Cb residual blocks and Cr residual blocks for the CU, respectively, such that each sample in the Cb residual block of the CU indicates the difference between a Cb sample in one of the predicted Cb blocks of the CU and a corresponding sample in the original Cb coding block of the CU, and each sample in the Cr residual block of the CU indicates the difference between a Cr sample in one of the predicted Cr blocks of the CU and a corresponding sample in the original Cr coding block of the CU.
[0071] In addition, such as Figure 4CAs shown, the video encoder 20 can use quadtree partitioning to decompose the luminance residual block, Cb residual block, and Cr residual block of the CU into one or more luminance transform blocks, Cb transform blocks, and Cr transform blocks, respectively. A transform block is a rectangular (square or non-square) sample block to which the same transform is applied. A TU of the CU can include a transform block of the luminance samples, two corresponding transform blocks of the chrominance samples, and syntax elements for transforming the transform block samples. Therefore, each TU of the CU can be associated with a luminance transform block, a Cb transform block, and a Cr transform block. In some examples, the luminance transform block associated with a TU can be a sub-block of the CU's luminance residual block. A Cb transform block can be a sub-block of the CU's Cb residual block. A Cr transform block can be a sub-block of the CU's Cr residual block. In a monochrome image or an image with three separate color planes, a TU can include a single transform block and syntax structures for transforming the samples of that transform block.
[0072] The video encoder 20 can apply one or more transforms to the luminance transform block of the TU to generate a luminance coefficient block for the TU. The coefficient block can be a two-dimensional array of transform coefficients. The transform coefficients can be scalars. The video encoder 20 can apply one or more transforms to the Cb transform block of the TU to generate a Cb coefficient block for the TU. The video encoder 20 can apply one or more transforms to the Cr transform block of the TU to generate a Cr coefficient block for the TU.
[0073] After generating coefficient blocks (e.g., luminance coefficient blocks, Cb coefficient blocks, or Cr coefficient blocks), video encoder 20 can quantize the coefficient blocks. Quantization typically refers to the process of quantizing transform coefficients to potentially reduce the amount of data used to represent the transform coefficients, thereby providing further compression. After quantizing the coefficient blocks, video encoder 20 can entropy encode the syntax elements indicating the quantized transform coefficients. For example, video encoder 20 can perform CABAC on the syntax elements indicating the quantized transform coefficients. Finally, video encoder 20 can output a bitstream comprising a bit sequence that forms a representation of coded frames and associated data; the bitstream is stored in storage device 32 or transmitted to target device 14.
[0074] After receiving the bitstream generated by the video encoder 20, the video decoder 30 can parse the bitstream to obtain syntax elements. The video decoder 30 can reconstruct frames of video data, at least in part, based on the syntax elements obtained from the bitstream. The process of reconstructing the video data is generally the inverse of the encoding process performed by the video encoder 20. For example, the video decoder 30 can perform an inverse transform on the coefficient block associated with the TU of the current CU to reconstruct the residual block associated with the TU of the current CU. The video decoder 30 also reconstructs the coded blocks of the current CU by adding samples of the predicted blocks of the PU for the current CU to corresponding samples of the transformed blocks of the TU of the current CU. After reconstructing the coded blocks for each CU of the frame, the video decoder 30 can reconstruct the frame.
[0075] As mentioned above, video coding primarily uses two modes (i.e., intra-frame prediction and inter-frame prediction) to achieve video compression. Palette-based coding is another coding scheme adopted by many video coding standards. In palette-based coding, which may be particularly suitable for encoding screen-generated content, the video codec (such as video encoder 20 or video decoder 30) forms a color palette table representing a given block of video data. The palette table includes the most important (e.g., frequently used) pixel values in the given block. Pixel values that are not frequently represented in the given block of video data are either not included in the palette table or are included as escape colors.
[0076] Each entry in the palette table includes an index to the corresponding pixel value in the palette table. The palette index of a sample in a block can be encoded to indicate which entry in the palette table will be used to predict or reconstruct which sample. This palette pattern begins with the process of generating a palette predictor for the first block of a group of images, stripes, tiles, or other such video blocks. As will be explained below, palette predictors for subsequent video blocks are typically generated by updating the previously used palette predictor. For illustrative purposes, it is assumed that the palette predictor is defined at the image level. In other words, an image may include multiple coded blocks, each with its own palette table, but only one palette predictor for the entire image.
[0077] To reduce the number of bits required to signal palette entries in the video bitstream, the video decoder may utilize a palette predictor to determine new palette entries in the palette table for reconstructing video blocks. For example, the palette predictor may include palette entries from previously used palette tables, or even initialize with the most recently used palette table by including all entries from the most recently used palette table. In some implementations, the palette predictor may include fewer than all entries from the most recently used palette table, then incorporate some entries from other previously used palette tables. The palette predictor may have the same size as the palette table used to encode different blocks, or it may be larger or smaller than the palette table used to encode different blocks. In one example, the palette predictor is implemented as a first-in-first-out (FIFO) table comprising 64 palette entries.
[0078] To generate a palette table for video data blocks from the palette predictor, the video decoder receives a one-bit flag for each entry of the palette predictor from the encoded video bitstream. This one-bit flag may have a first value (e.g., binary 1) indicating that the associated entry from the palette predictor will be included in the palette table, or a second value (e.g., binary 0) indicating that the associated entry from the palette predictor is not included in the palette table. If the size of the palette predictor is larger than the palette table used for the video data blocks, the video decoder may stop receiving more flags once the maximum size of the palette table is reached.
[0079] In some implementations, some entries in the palette table can be directly signaled in the encoded video bitstream, rather than determined using a palette predictor. For such entries, the video decoder can receive three separate m-bit values from the encoded video bitstream, indicating the pixel values of the luma component and two chroma components associated with the entry, where m represents the bit depth of the video data. Compared to the multiple m-bit values required for palette entries directly signaled, these palette entries derived from the palette predictor require only one flag. Therefore, using a palette predictor to signal some or all palette entries can significantly reduce the number of bits required to signal new palette table entries, thereby improving the overall coding efficiency of palette pattern encoding and decoding.
[0080] In many cases, a block's palette predictor is determined based on a palette table used to encode one or more previously encoded blocks. However, when encoding the first coding tree unit in a picture, strip, or tile, the palette table for previously encoded blocks may be unavailable. Therefore, entries from previously used palette tables cannot be used to generate a palette predictor. In this case, a palette predictor initializer sequence can be signaled in the Sequence Parameter Set (SPS) and / or Picture Parameter Set (PPS) to generate values for the palette predictor when the previously used palette table is unavailable. The SPS typically refers to the syntax structure applied to a series of consecutive encoded video pictures called a Coded Video Sequence (CVS), which is determined by the contents of syntax elements found in the PPS, referred to by syntax elements found in the header of each strip segment. The PPS typically refers to the syntax structure applied to one or more individual pictures in the CVS, which are determined by syntax elements found in the header of each strip segment. Therefore, SPS is generally considered to be a higher-level grammatical structure than PPS, which means that the grammatical elements included in SPS are generally less varied and applicable to more parts of the video data compared to those included in PPS.
[0081] Figure 5 This is a block diagram illustrating example 500 of transform coefficient encoding and decoding using context encoding and bypass encoding and decoding according to some embodiments. Transform coefficient encoding in VVC is similar to that in HEVC because they both use non-overlapping groups of coefficients (also called CGs or sub-blocks). However, there are some differences between the two schemes. In HEVC, the size of each CG of a coefficient is fixed at 4×4. In VVC draft 6, the size of the CG depends on the size of the TB. Therefore, various CG sizes (1×16, 2×8, 8×2, 2×4, 4×2, and 16×1) are provided in VVC. CGs within a coding block and transform coefficients within a CG are encoded according to a predefined scan order.
[0082] To limit the maximum number of context codec bits (CCB) per pixel, the area of the TB and the type of video components (i.e., luma component versus chroma component) are used to derive the maximum number of context codec bits (CCB) for the TB. In some embodiments, the maximum number of context codec bits is equal to TB_zosize * 1.75. Here, TB_zosize represents the number of samples within the TB after the coefficients are zeroed. Note that for CCB counting, the flag coded_sub_block_flag, which indicates whether the CG contains non-zero coefficients, is not considered.
[0083] Coefficient zeroing is an operation performed on a transform block to force coefficients located in a specific region of the transform block to be set to zero. For example, in the current VVC, a 64×64 TB has an associated zeroing operation. Therefore, transform coefficients located outside the top-left 32×32 region of a 64×64 TB are forced to zero. In fact, in the current VVC, for any transform block with a size greater than 32 along a certain dimension, a coefficient zeroing operation is performed along that dimension to force coefficients located outside the top-left 32×32 region to be zero.
[0084] In VVC transform coefficient encoding and decoding, the variable `remBinsPass1` is first set to the maximum number of context-coded bits allowed (MCCB). During encoding and decoding, this variable is decremented by one each time context-coded bits are sent via signaling. When `remBinsPass1` is greater than or equal to 4, coefficients are first sent via signaling using the syntax elements `sig_coeff_flag`, `abs_level_gt1_flag`, `par_level_flag`, and `abs_level_gt3_flag`, all of which use context-coded bits in the first channel. In the second channel, the remaining level information of the coefficients is encoded using the syntax element `abs_remainder` with the Columbus and bypass-coded bits. When `remBinsPass1` becomes less than 4 during encoding and decoding in the first channel, the current coefficient is not encoded in the first channel, but is directly encoded in the second channel using the Columbus and bypass-coded bits with the syntax element `dec_abs_level`. After all the above-described levels of encoding and decoding, the sign (sign_flag) for all scan positions where sig_coeff_flag equals 1 is finally encoded and decoded into bypass bits. Figure 5 The process is illustrated below. For each TB, remBinsPass1 is reset. The transition from using context-coded bits for sig_coeff_flag, abs_level_gt1_flag, par_level_flag, and abs_level_gt3_flag to using bypass-coded bits for the remaining coefficients occurs at most once per TB. For a coefficient subblock, if remBinsPass1 is less than 4 before encoding / decoding its first coefficient, the entire coefficient subblock is encoded / decoded using bypass-coded bits.
[0085] Unlike HEVC, where a single residual encoding / decoding scheme is designed to encode both the transform coefficients and the transform skip coefficients, in VVC, two separate residual encoding / decoding schemes are used for the transform coefficients and the transform skip coefficients (i.e., the residuals), respectively.
[0086] For example, it was observed that the statistical properties of the residuals in the transform skip mode differed from those of the transform coefficients, and there was no energy compression around the low-frequency components. The residual encoding / decoding was modified to account for the different signal characteristics of the (spatial) transform skip residuals, including:
[0087] (1) Do not send the last x / y position using a signal;
[0088] (2) coded_sub_block_flag: When all previous flags are equal to 0, it encodes and decodes each sub-block except for the DC sub-block;
[0089] (3) sig_coeff_flag has contextual modeling with two adjacent coefficients;
[0090] (4) par_level_flag uses only one context model;
[0091] (5) Additional signs greater than 5, 7, or 9;
[0092] (6) Exporting the Rice parameter for modification of remainder binarization;
[0093] (7) Context modeling for the symbol flag is determined based on the adjacent coefficient values on the left and above, and the symbol flag is parsed after sig_coeff_flag to keep all context codec bits together.
[0094] Figure 6 This is a block diagram illustrating an exemplary Context Adaptive Binary Arithmetic Coding (CABAC) engine 600 according to some embodiments. Context Adaptive Binary Arithmetic Coding (CABAC) is a form of entropy coding used in the H.264 / MPEG-4 AVC and High Efficiency Video Coding (HEVC) standards, as well as VVC. CABAC is based on arithmetic coding with several improvements and variations to adapt it to the needs of video coding standards. For example, CABAC encodes binary symbols, which keeps complexity low and allows for probabilistic modeling of more frequently used bits of arbitrary symbols. The probabilistic model is based on locally context-adaptive selection, allowing for better modeling of probabilities because coding and decoding modes are generally well correlated locally. Finally, CABAC also uses a multiplication-free range partitioning by using quantized probability ranges and probability states.
[0095] CABAC employs multiple probability models for different contexts. It first converts all non-binary symbols into binary. Then, for each binary bit (or byte), the codec selects which probability model to use and optimizes the probability estimate using information from neighboring elements. Finally, arithmetic encoding and decoding are applied to compress the data.
[0096] Context modeling provides estimates of the conditional probabilities of encoded and decoded symbols. By utilizing an appropriate context model and switching between different probability models based on already encoded and decoded symbols in the neighborhood of the current symbol to be encoded, given inter-symbol redundancy can be taken advantage of. Encoding and decoding data symbols involves the following stages:
[0097] Binarization (602): CABAC uses binary arithmetic encoding and decoding, which means that only binary decisions (1 or 0) are encoded. Non-binary values (such as transform coefficients or motion vectors) are "binarized" or converted into binary code before arithmetic encoding and decoding. This process is similar to converting data symbols into variable-length code, but the binary code is further encoded (by the arithmetic codec) before transmission. The stages are repeated for each binary bit (or "bit") of the binarized symbol.
[0098] Context model selection (604): A "context model" is a probabilistic model for one or more binary bits of a binarized symbol. This model is selected from available model selections based on statistics of recently encoded / decoded data symbols. The context model stores the probability that each binary bit is "1" or "0".
[0099] Arithmetic coding (606): The arithmetic codec encodes each bit according to the chosen probability model. Note that there are only two subranges for each bit (corresponding to "0" and "1").
[0100] Probability update: Update the selected context model based on the actual codec value (e.g., increment the frequency count of "1" if the binary bit value is "1").
[0101] In CABAC, further processing of each binary value depends on an associated encoding / decoding mode decision, which can be chosen as either a regular mode or a bypass mode, by decomposing each non-binary syntax element value into a binary sequence. The latter is chosen for the bits, which are assumed to be uniformly distributed; thus, the entire regular binary arithmetic encoding (and decoding) process is simply bypassed. In the regular encoding / decoding mode, each bit value is encoded using a regular binary arithmetic encoding engine, where the associated probabilistic model is determined by a fixed choice based on the syntax element type and bit position, or the bit index (binIdx) in the binary representation of the syntax element, or adaptively chosen from two or more probabilistic models based on relevant side information (e.g., spatial neighbors, components, depth or size of CU / PU / TU, or position within a TU). This choice of probabilistic model is called context modeling. As an important design decision, the latter case is typically applied only to the most frequently observed bits, while other, less frequently observed bits are processed using a joint, usually zero-order, probabilistic model. In this way, CABAC enables selective adaptive probabilistic modeling at the sub-symbol level, thus providing an effective tool for utilizing inter-symbol redundancy and significantly reducing overall modeling or learning costs. Note that, in principle, for both fixed and adaptive cases, the switch from one probabilistic model to another can occur between any two consecutive bits of a regular code. Generally, the design of the context model in CABAC reflects a compromise between avoiding unnecessary modeling costs and the conflicting objectives of largely utilizing statistical correlations.
[0102] In CABAC, the parameters of the probabilistic models are adaptive, meaning that the adaptation of the model probabilities to statistical changes in the binary bit source is performed bit-by-bit in a backward adaptive and synchronous manner in the encoder and decoder; this process is called probability estimation. For this purpose, each probabilistic model in CABAC can be derived from one of 126 distinct states, each with an associated model probability value p within the range [0:01875; 0:98125]. The two parameters of each probabilistic model are stored in the context memory as 7-bit entries: 6 bits for each of the 63 probabilistic states represent the model probability pLPS of the minimum probability symbol (LPS), and 1 bit is used for nMPS, the value of the most likely symbol (MPS).
[0103] Figure 7is a block diagram showing an exemplary low-frequency non-separable transform (LFNST) process 700 according to some embodiments, which is a secondary transform for compressing the energy of the transform coefficients of an intra-coded block after a primary transform. As shown, LFNST is applied between the primary forward transform and quantization in the video encoder 20 and between de-quantization and inverse primary transform in the video decoder 30. In some embodiments, a non-separable transform with varying transform sizes is applied based on the size of a coding block, which can be described as the following matrix multiplication process. Assume LFNTS is applied to a 4×4 block, then the samples within the 4×4 block, i.e.,
[0104]
[0105] are first serialized into a vector, i.e.,
[0106]
[0107] Then, LFNTS is applied as where, are the transform coefficients after LFNTS and T is the transform kernel. In this example, T is a 16×16 matrix. Subsequently, the 16x1 vector is reorganized into a 4×4 block according to a predefined scan order, where the coefficients at the beginning of the vector will be associated with smaller scan indices in the 4×4 block.
[0108] As can be seen from the above example, LFNST is based on direct matrix multiplication, which is quite expensive in terms of computational operations and the memory for storing transform coefficients. In some embodiments, a simplified non-separable transform kernel is used to reduce the implementation cost of LFNST. The main idea of this method is to map an N-dimensional vector to an R-dimensional vector in a different space, where R < N. Therefore, the forward LFNST instead of performing an N×N matrix becomes an R×N matrix as follows:
[0109]
[0110] where T R×N the R basis vectors in are generated by selecting the first R bases of the original N-dimensional transform matrix (i.e., N×N).
[0111] After applying LFNST, all transform coefficients outside the potentially non-zero LFNST coefficient region in the upper left corner are forced to zero. For transform blocks of sizes 4×4, 8×8, 4×M, and / or M×4, the potentially non-zero LFNST coefficient region in the upper left corner includes the first 8 coefficient positions in coefficient scan order. For all other transform block sizes, the potentially non-zero LFNST coefficient region in the upper left corner includes the coefficient positions in the upper left 4x4 sub-block. In the following description of this disclosure, for simplicity, this potentially non-zero LFNST coefficient region is referred to as the "non-zero LFNST region".
[0112] In some embodiments, there are a total of four transform sets, with two inseparable transform kernels enabled for each transform set. The transform set is selected based on the intra-prediction mode of an intra-block. The mapping from the intra-prediction mode to the transform set is predefined, as shown in Table 1 below. For each transform set, the selected inseparable quadratic transform candidate is indicated by signaling an LFNST index in the video bitstream.
[0113]
[0114]
[0115] Table 1. Mapping between intra-frame modes and LFNST transform sets.
[0116] In some embodiments, the LFNST index is available for resolution at the video decoder only if all transform coefficients outside the first 4×4 sub-block of a given transform block are zero. The signaling of the LFNST index depends on the position of the last valid coefficient, which indicates the number of non-zero coefficients in the transform block. For example, for 4×4 and 8×8 coded blocks, the LFNST index is signaled only if the position of the last valid (i.e., non-zero) transform coefficient is less than 8; for other coded block sizes, the LFNST index is signaled only if the position of the last valid transform coefficient is less than 16; otherwise, the LFNST index is not signaled and is always inferred as zero, i.e., LFNST is disabled. In some other embodiments, a minimum threshold (e.g., 1) is set for the LFNST index such that the LFNST index is not signaled when the total number of non-zero transform coefficients is equal to or less than this minimum threshold.
[0117] Furthermore, to reduce the cache buffer size of transform coefficients, LFNST is disabled when the width or height of the current coded block exceeds the maximum transform size (e.g., 64) as a sequence parameter set (SPS) transmitted via signaling. In some embodiments, LFNST is applied only when the primary transform is DCT2. LFNST is applied to intra-coded blocks in intra- and inter-frame stripes and for both luma and chroma components. If dual-tree / local-tree (i.e., split-tree) is enabled (where the partitions of the luma and chroma components are not aligned), LFNST indices are transmitted via signaling separately for the luma and chroma components (i.e., different LFNST transforms can be applied to the luma and chroma components). Otherwise, when a single-tree is applied (where the partitions of the luma and chroma components are aligned), a single LFNST index is transmitted via signaling, and the luma and chroma components share the same LFNST transform.
[0118] Figure 8 This is a block diagram illustrating an exemplary transform block 800 with non-zero transform coefficients according to some embodiments. Transform block 800 includes a first region 802 corresponding to the upper-left grid portion of transform block 800 and a second region 804 indicated by the dashed lines of transform block 800. The first region 802 has a predefined size for transform block 800 (e.g., a 16×16 region at the upper-left corner of transform block 800) and includes one or more non-zero transform coefficients (e.g., first, second, and third non-zero coefficients 806, 808, and 810). The second region 804 is a region outside the first region 802, which may or may not include one or more non-zero transform coefficients.
[0119] In the current VVC, the signaling of the LFNST index depends on the availability of the decoded transform coefficients for all components in the CU. Because all transform coefficients outside the non-zero LFNST region are forced to zero after LFNST is applied, the signaling of the LFNST index is conditional on the position of the last non-zero coefficient of the three components in the CU. Specifically, for 4×4 and 8×8 CUs, the LFNST index is only signaled if the position of the last non-zero coefficient of all components (where the transform is applied to the residual encoding / decoding, i.e., non-transform skipped components) is less than 8; for other CU sizes, the LFNST index is only signaled if the position of the last non-zero coefficient of all non-transform skipped components is less than 16. This resolution dependency can lead to undesirable latency for the hardware encoder and decoder. For example, with this design, decoding of the luminance component in a TU cannot begin until the resolution of the chroma residual is complete.
[0120] In some embodiments, a simplified LFNST signaling method is proposed to remove the analytical dependency of the LFNST index on the availability of transform coefficients of the luminance and chrominance TBs in a CU. Because the analytical dependency is removed, the decoder can promptly obtain information on whether the LFNST is applied to the current CU, thereby calculating the accurate CCB limit based on the corresponding number of potentially non-zero coefficients.
[0121] As previously mentioned, LFNST is signaled at the end of an intra-frame CU, and the signaled LFNST index transmission depends on the position of the last valid coefficient of all coded components. For example, due to the zeroing constraint applied to LFNST, the LFNST index is signaled only if the position of the last non-zero coefficient of a coded component is outside the corresponding zeroing region. To address this issue, LFNST is signaled only based on the position of the last valid coefficient of the luminance component, as shown in the syntax table below.
[0122]
[0123]
[0124] As shown in the syntax table above, in the single-tree case, LFNST is signaled only based on the position of the last valid coefficient of the luminance component in the proposed method. For example, for 4×4 and 8×8 coded blocks, the LFNST index is signaled only when the position of the last valid luminance transform coefficient is less than 8; for other coded block sizes, the LFNST index is signaled only when the position of the last valid luminance transform coefficient is less than 16. In the split-tree case, the LFNST index is signaled separately for the luminance and chrominance components. Furthermore, the original DC-only constraint is applied so that the LFNST index is signaled only when the position of the last valid luminance coefficient is equal to or greater than 1.
[0125] As described above, a single tree or two separate trees can be used to partition the luma and chroma samples of a coded block. This characteristic can affect the transmission of the LFNST index via signaling. For example, when the luma and chroma samples of a coded block are partitioned by a single tree, only the transform coefficients of the corresponding luma samples conform to LFNST, while those of the chroma samples do not. In this case, it is not necessary to verify the position of the last non-zero coefficient of any chroma sample in the corresponding coded block before receiving the LFNST index. Instead, only the position of the last non-zero coefficient of the luma sample corresponding to the coded block is relevant to determining whether LFNST has been enabled for the coded block. However, when the luma and chroma samples of a coded block are partitioned by two separate trees, LFNST is applied to the luma and chroma samples respectively, and each sample has its own LFNST index.
[0126] In some embodiments, single-tree LFNST signaling follows the final coefficient of the luminance component, and additional changes are proposed to simplify signaling LFNST in both single-tree and dual-tree cases. First, LFNST signaling in a single-tree partition depends on the luminance transform skip mode, but is independent of the chrominance transform skip mode. LFNST signaling is enabled (e.g., non-zero) based on whether the luminance transform skip mode is disabled (e.g., equal to zero), without checking whether the chrominance transform skip mode is enabled or disabled. Next, the signaling of LFNST and MTS indexes is moved from the CU level to the TU level. In the case of a split tree, the indexes are signaled immediately after the luminance residual sample is parsed. In the case of a separate tree, the LFNST index associated with the luminance sample is also signaled immediately after the luminance residual sample is parsed and transformed, while the LFNST index associated with the chrominance sample is signaled after the second residual sample (i.e., the Cr component) is parsed. An example of the corresponding syntax table is as follows:
[0127] Encoding unit syntax:
[0128] LfnstDcOnly=1 LfnstZeroOutSigCoeffFlag=1 MtsDcOnly=1 MtsZeroOutSigCoeffFlag=1 transform_tree(x0,y0,cbWidth,cbHeight,treeType,chType)
[0129] Transformation Unit Syntax
[0130]
[0131]
[0132] Residual encoding / decoding syntax
[0133]
[0134] In some embodiments relevant to the single-tree case, only the position of the last effective luminance coefficient is used to determine whether the DC constraint is satisfied. The residuals of both the luminance and chrominance components are used to determine the zero-return constraint of the LFNST transmitted via signal. An example syntax table of the corresponding residual encoding / decoding syntax is shown below:
[0135]
[0136] In some embodiments, LFNST is disabled for the chroma components in single-tree and dual-tree partitions. The luma-chroma dependency in the LFNST order is removed. The LFNST index is moved from the CU level to the TU level and obtained after the luma residual is decoded, for example, before the chroma transform sample is received. The corresponding syntax is as follows:
[0137] Encoding unit syntax:
[0138] LfnstDcOnly=1 LfnstZeroOutSigCoeffFlag=1 MtsDcOnly=1 MtsZeroOutSigCoeffFlag=1 transform_tree(x0,y0,cbWidth,cbHeight,treeType,chType)
[0139] Transformation Unit Syntax
[0140]
[0141]
[0142]
[0143] Residual encoding / decoding syntax
[0144]
[0145] The encoding tree syntax includes the variable ApplyLfnstFlag, which is deduced as: ApplyLfnstFlag = (lfnst_idx > 0 && cIdx == 0) ? 1:0.
[0146] Figure 9 Table 900 illustrates an exemplary Multiple Transform Selection (MTS) scheme for transforming the residuals of inter-frame and intra-frame coded blocks according to some embodiments. For example, during encoding, video encoder 20 utilizes... Figure 2 The transformation processing unit 52 performs MTS. During decoding, the video decoder 30 utilizes... Figure 3 The inverse transformation processing unit 88 performs the inverse transformation using the corresponding inverse transformation method.
[0147] The current VVC specification uses the MTS scheme to transform residuals in inter-frame and intra-frame coded blocks. If MTS is used, during encoding, the video encoder selects one of several transform methods and applies it to the residuals of the coded block. For example, the video encoder may apply DCT2 transform (e.g., disabling MTS), DCT8 transform, or DST7 transform to the residuals of the coded block. A set of syntax elements (e.g., MTS_CU_flag, MTS_Hor_flag, MTS_Ver_flag) (also called flags) are used to indicate the specific transform method used for the coded block via signals.
[0148] In some embodiments, two syntax elements are specified at the sequence level (e.g., included in the Sequence Parameter Set (SPS)) to enable MTS for intra-frame and inter-frame modes, respectively. When MTS is enabled at the sequence level, another CU-level syntax element (e.g., MTS_CU_flag in Table 900) is further signaled to indicate whether MTS is applied to a specific CU.
[0149] In some embodiments, MTS is used only when several criteria related to the characteristics of the coded block are met, including: 1) the width and height of the coded block are both less than or equal to predefined values (e.g., 32); 2) the coded block is a luma coded block (e.g., the luma CBF flag == 1, because MTS is only used in luma residual encoding and decoding); 3) the horizontal and vertical coordinates of the last non-zero coefficient are both less than predefined values (e.g., 16) (e.g., the last non-zero coefficient is restricted to a predefined upper-left region of the transform block). If any of the above conditions are not met, the video encoder does not apply MTS, but instead transforms the block residual using a default transform method such as DCT2 transform, and sets the corresponding syntax elements to indicate the use of the default transform (e.g., MTS_CU_flag == 0 and MTS_Hor_flag and MTS_Ver_flag are not sent via signals).
[0150] Table 500 shows the syntax element values used in MTS and the corresponding transformation methods. If DCT2 is used to transform the transform block residuals, MTS_CU_flag is set to 0, and MTS_Hor_flag and MTS_Ver_flag are not sent via signals. If MTS_CU_flag is set to 1 (e.g., indicating that DCT8 and / or DST7 are being used), two additional syntax elements (e.g., MTS_Hor_flag, MTS_Ver_flag) are sent via signals to indicate the transformation types in the horizontal and vertical directions, respectively. When MTS_Hor_flag == 1 or MTS_Ver_flag == 1, the corresponding horizontal or vertical component is transformed using the DST7 method. When MTS_Hor_flag == 0 or MTS_Ver_flag == 0, the corresponding horizontal or vertical component is transformed using the DCT8 method.
[0151] In some embodiments, all MTS transform coefficients are encoded with 6-bit precision, the same as the DCT2 core transform. Given that VVC supports all transform sizes used in HEVC, all transform cores used in HEVC remain the same as in VVC, including 4-point, 8-point, 16-point, and 32-point DCT2 transforms and 4-point DST7 transforms. Meanwhile, the VVC transform design also supports other transform cores, including 64-point DCT2, 4-point DCT8, 8-point, 16-point, and 32-point DST7 and DCT8.
[0152] In addition, to reduce the computational complexity of large-size DST7 or DCT8 transforms, when the width or height of the block is equal to 32, the transform coefficients located outside the low-frequency region (e.g., the 16x16 region at the top left corner of the transform block) are set to 0 (e.g., a zeroing operation) for DST7 and DCT8 transform blocks.
[0153] In some embodiments, non-overlapping coefficient groups (CGs) are used to encode the transform coefficients of the transform block. The size of the CG is determined based on the size of the transform block. The CGs within the transform block and the transform coefficients within each CG are encoded based on a predefined scan order (e.g., diagonal scan order).
[0154] Figure 10 This is a flowchart 1000 illustrating an exemplary process by which a video encoder (such as video encoder 30) conditionally implements a technique for transmitting the LFNST via a signal based on different components of a transform block, according to some embodiments. Video decoder 30 receives (1010) a control flag associated with one or more coded blocks. This control flag indicates whether the luminance and chrominance samples of the coded block in the video data are partitioned based on a single tree or two separate trees. The video decoder also receives (1020) a bitstream corresponding to the coded block, which may include transform coefficients associated with different components of the coded block.
[0155] Then, the video decoder 30 determines the partition tree type of the coded block based on the control flag. When the control flag indicates that the luma and chroma samples are partitioned by a single tree (1030-1), the video decoder 30 determines (1040-1) the scan order index of the last non-zero transform coefficient for the luma samples of the coded block. As mentioned above, single-tree partitioning means that only the luma samples of the coded block are suitable for LFNST. When the scan order index of the last non-zero transform coefficient meets the predefined criterion (1050-1), the video decoder then receives (1060-1) the LFNST index from the bitstream and applies the inverse LFNST transform to (1070-1) the transform coefficients of the luma samples of the coded block based on that LFNST index.
[0156] When the control flag indicates that the luma and chroma samples are split by two separate trees (1030-2), the video decoder 30 determines (1040-2) the scan order indices of the last non-zero transform coefficients for the luma and chroma samples of the coding block, respectively. As described above, LFNST processes the luma and chroma components separately. For example, when a corresponding one of the scan order indices of the last non-zero transform coefficients of the luma or chroma sample satisfies a predefined criterion (1050-2), the video decoder receives (1060-2) the LFNST index corresponding to that component from the bitstream and applies the corresponding inverse LFNST transform to the transform coefficients of the corresponding component of the coding block based on the corresponding LFNST index (1070-2).
[0157] In some embodiments, before applying the inverse LFNST transform to the transform coefficients of the luminance or chrominance samples of the coded block, the video decoder 30 first determines the value of the LFNST index, and then identifies the LFNST transform kernel based on the corresponding LFNST index when the corresponding LFNST index is not zero. As described above, the video encoder 20 can access multiple LFNST transform kernels, and selects one of them for LFNST transformation of the coded block and transmits the index of the selected LFNST transform kernel in the video data via a signal. The video decoder 30 then receives the LFNST index from the video data and subsequently performs an inverse transform on the transform coefficients of the corresponding samples of the coded block using the identified LFNST transform kernel.
[0158] In some embodiments, the predefined criteria described above are satisfied when the scan order index of the last non-zero transform coefficient is not less than a minimum threshold and is less than a maximum threshold associated with the coding block. For example, the minimum threshold is 1, while the maximum threshold depends on the size of the coding block; for example, the maximum threshold is 8 for 4×4 or 8×8 coding blocks, or 16 for other coding block sizes. Similar to MTS, the inverse LFNST transform is applied to the non-zero transform coefficients in the upper left region of the transform block corresponding to the coding block, and the scan order is diagonal.
[0159] Figure 11A This is an exemplary single-tree data structure 1100 for encoding a bitstream of a transform unit according to some embodiments. Figure 11B This is an exemplary split-tree data structure 1150 for encoding a bitstream of video data for a transform unit according to some embodiments. In some embodiments, the TU includes a transform block of luma sample 1102, two corresponding transform blocks 1104 of chroma sample, and syntax elements for transforming the luma sample and chroma transform samples 1102 and 1104. The TU is associated with a luma transform block, a Cb transform block, and a Cr transform block. In some examples, the TU is part of a coding unit (CU). The luma transform block of the TU includes luma transform sample 1102 and is associated with a sub-block of the luma residual block of the CU. The Cb transform block of the TU includes Cr transform samples and is associated with a sub-block of the Cb residual block of the CU. The Cr transform block includes Cb transform samples and is associated with a sub-block of the Cr residual block of the CU. The Cr and Cb transform samples form the chroma transform sample 1104. In some embodiments, the implementation is carried out at the TU level by signaling the LFNST index 1110 or by signaling the MTS index 1116. That is, the LFNST index and / or MTS index are transmitted between the video encoder 20 and the video decoder 30, and with luminance transform samples and chrominance transform samples 1102 and 1104 for each individual TU, instead of those transform samples for each CU at the CU level.
[0160] See Figure 11A The transform unit (TU) includes a single transform block and a syntax structure for transforming samples of that transform block into a monochrome image or an image with three separate color planes. In some embodiments, a single-tree data structure 1100 is applied to encode the luminance and chrominance components of these images together with syntax elements in the bitstream of video data associated with the TU. The single-tree data structure 1100 includes at least luminance transform samples 1102 and chrominance transform samples 1104. In the video encoder 20, the prediction processing unit 41 divides the video data into video blocks (e.g., individual TUs) and provides intra-frame or inter-frame prediction blocks to generate residual blocks of these video blocks. The residual blocks are optionally included in one or more transform units (TUs). The transform processing unit 52 transforms the residual video data of these residual blocks into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform, and the quantization unit 54 quantizes these transform coefficients into luminance transform samples 1102 and chrominance transform samples 1104.
[0161] The luminance transform sample 1102 of the TU is associated with a luminance transform skip flag (LTSF) 1106 indicating whether the luminance component of the residual video data has been transformed, and the chrominance transform sample 1104 of the TU is associated with a chrominance transform skip flag (CTSF) 1108 indicating whether the chrominance component of the residual video data has been transformed. LTSF 1106, luminance transform sample 1102, CTSF 1108, and chrominance transform sample 1104 are arranged in an ordered sequence in a single-tree data structure 1100 used to encode the bitstream of video data for the TU.
[0162] In some embodiments, the LFNST index 1110 is signaled in the bitstream to enable a secondary transform (i.e., an LFNST operation) for compressing the energy of the transform coefficients of the intra-coded block after the primary transform. This LFNST operation is applied between the primary forward transform and quantization within the video encoder 20 and between dequantization and the inverse primary transform within the video decoder 30. The LFNST index 1110 is signaled from the video encoder 20 to the video decoder 30 at which point LTSF 1106 is (1112) zero to disable the transform skip mode for the luminance component of the TU and to ensure that the number of non-zero luminance transform samples generated by the encoder 20 is within a predefined range (1114). Upon receiving the bitstream, the video decoder 30 determines whether LTSF 1106 is zero and whether the LFNST index is non-zero. Based on the determination that LTSF 1106 is zero and LFNST index 1110 is not zero, the video decoder 30 applies inverse LFNST to the luminance transform sample 1102 to generate a first decoded luminance sample for the TU. More specifically, the luminance transform sample is dequantized and then processed by the inverse LFNST to generate the first decoded luminance sample for subsequent inverse primary transforms (e.g., DCT2, DCT8, and DST-7).
[0163] In some embodiments, the inverse LFNST is applied only to the luminance transform sample 1102 and not to the chrominance transform sample 1104. The inverse LFNST index is transmitted and applied via a signal based on the number of non-zero luminance samples in the luminance transform sample 1102, the number of non-zero Cb samples in the chrominance transform sample 1104, and the number of non-zero Cr samples in the chrominance transform sample 1104, independent of any CTSF 1108, LTSF 1106, LFNST index 1110, or CTSF 1108. That is, the inverse LFNST is transmitted and applied via a signal without checking any CTSF 1108, the number of non-zero Cb samples, or the number of non-zero Cr samples, for example, without checking whether CTSF 1108 is non-zero, whether the number of non-zero Cb samples is within a predefined range, or whether the number of non-zero Cr samples is within a predefined range.
[0164] In some embodiments, the Multiple Transform Selection (MTS) index 1116 is transmitted via signaling along with the bitstream of the TU. The MTS index 1116 is applied to select the primary transform used to transform the residuals of inter-frame and intra-frame coded blocks. During encoding, the video encoder 20 utilizes... Figure 2 The transformation processing unit 52 performs MTS, while during decoding, the video decoder 30 utilizes... Figure 3The inverse transform processing unit 88 performs an inverse transform using the corresponding inverse transform method. The current VVC specification uses the MTS scheme to transform residuals in inter- and intra-frame coded blocks. If MTS is used, during encoding, the video encoder 20 selects one of several transform methods to transform the residuals of the coded block. For example, the video encoder can apply a DCT2 transform (e.g., MTS disabled), a DCT8 transform, or a DST7 transform to the residuals of the coded block. Optionally, a set of syntax elements (e.g., MTS_CU_flag, MTS_Hor_flag, MTS_Ver_flag) (also called flags) are used in the MTS index 1116 to signal a specific transform method for that coded block.
[0165] Specifically, in some embodiments, the MTS index 1116 is not always applied and signaled from the video encoder 20 to the video decoder 30. Instead, the MTS index 1116 is signaled based on the determination that the LFNST index 1110 is zero (i.e., the inverse LFNST operation is disabled). When the LFNST index 1110 is zero, the video encoder 30 applies one of the (1118) DCT2 transform, DCT8 transform, or DST7 transform to these first decoded luminance samples based on the value of the MTS index 1116 after the inverse LFNST. Conversely, when the LFNST index 1110 is non-zero, the MTS index 1116 is not used or signaled to the video encoder 30, and the video encoder 30 applies a predefined inverse transform (e.g., DCT2 transform) to these first decoded luminance samples by default after the inverse LFNST. The predefined inverse transform is applied to the horizontal and vertical directions of these first decoded luminance samples.
[0166] Alternatively, in some embodiments, the MTS index 1116 is transmitted via signal along with the bitstream of the TU, independent of the value of the LFNST index 1110. However, the video decoder 30 applies DCT2, DST7, and DCT8 transforms based on the LFNST index 1110 and the MTS index 1116. Based on the determination that the LFNST index is enabled (e.g., non-zero), the video encoder 30 applies a (1120) DCT2 transform (i.e., a predefined inverse transform) to the luminance transform sample 1102, independent of the value of the MTS index 1116. Based on the determination that the LFNST index is disabled (e.g., zero), the video encoder 30 selects one of the (1118) DCT2, DST7, and DCT8 transforms based on the value of the MTS index 1116. For example, the video encoder 30 selects the corresponding one of the DCT2, DST7, and DCT8 transforms in each of the horizontal and vertical directions of the first decoded luminance sample based on the value of the MTS index 1116.
[0167] In some embodiments, the 1100LTSF 1106, luminance transform sample 1102, LFNST index 1110, MTS index 1116, chrominance transform skip flag 1108, and chrominance transform sample 1104 are arranged in an ordered sequence based on a single-tree data structure. That is, LTSF 1106 is followed by luminance transform sample 1102, followed by LFNST index 1110, then index 1110, then MTS index 1116, then chrominance transform skip flag 1108, and finally chrominance transform sample 1104. After receiving LFNST index 1110, the corresponding inverse LFNST operation is applied to the received and dequantized luminance transform sample 1102, regardless of whether chrominance transform sample 1104 has been received. LFNST index 1110 is received by decoder 30 before LTSF 1108 or chrominance residual sample 1104. In some embodiments, the TU is partially or entirely dequantized, and an inverse LFNST operation can be initiated and applied to the TU while receiving CTSF 1108 or chromaticity residual sample 1104. Conversely, in some cases, LFNST 1110 is received before CTSF 1108 or chromaticity residual sample 1104, however, the inverse LFNST operation is applied to the TU after receiving CTSF 1108 or chromaticity residual sample 1104.
[0168] Similarly, after receiving MTS index 1116, the corresponding primary inverse transform (e.g., DCT2, DST7, and DCT8) is applied to the luma transform sample 1102, which has already been received, dequantized, and optionally processed by the inverse LFNST operation, regardless of whether the chroma transform sample 1104 has been received. MTS index 1116 is received by decoder 30 before CTSF 1108 or chroma residual sample 1104. In some embodiments, dequantization and inverse LFNST operations are performed on some or all TUs, and the inverse primary transform is applied to these TUs simultaneously with receiving CTSF 1108 or chroma residual sample 1104. In these ways, the inverse LFNST operation or inverse primary transform applied to luma transform sample 1102 can be initiated after receiving chroma transform sample 1104 but before receiving LFNST index 1110 or MTS 1116, without waiting for LFNST index 1110 or MTS 1116, thereby speeding up the TU decoding process. Conversely, in some cases, the MTS index 1116 is received before the CTSF 1108 or the chromaticity residual sample 1104; however, the inverse primary transform corresponding to the MTS index 1116 is applied to the TU after the CTSF 1108 or the chromaticity residual sample 1104 is received.
[0169] Alternatively, in some embodiments, each of the LFNST index 1110 and MTS index 1116 is arranged in one of the alternative positions preceding the chroma transformation sample 1104 in the single-tree data structure 1100. For example, the LFNST index 1110 may be received (1122) before the LTSF 1106 and the luma transformation sample 1102, while the LTSF 1106 is received before or after the luma transformation sample 1102. In another example, the LFNST index 1110 is received (1124) between the LTSF 1106 and the luma transformation sample 1102, regardless of whether the LTSF 1106 or the luma transformation sample 1102 is received first in the single-tree data structure 1100. Furthermore, in some cases, the LFNST index 1110 is received (1126) after the CTSF 1108 and before the chroma transformation sample 1104. Similarly, MTS index 1116 may optionally be received before LSF 1106 and luminance transform sample 1102 or between LSF 1106 and luminance transform sample 1102, regardless of whether LSF 1106 is received before or after luminance transform sample 1102 in single-tree data structure 1100. MTS index 1116 may be received after CTSF 1108 and before chroma transform sample 1104. Furthermore, in some embodiments, MTS index 1116 may be received before LFNST index 1110 (1128). MTS index 1116 may be adjacent to or separate from LFNST index 1110 in single-tree data structure 1100.
[0170] In some embodiments, the chroma residual of the TU is encoded and decoded in Joint Chromatic Residual Coding and Decoding (JCCR) mode, and for the TU, the chroma transform skip flag 1108 is non-zero. The chroma transform sample 1104 is decoded based on the JCCR mode, for example, without using the inverse LFNST operation.
[0171] See Figure 11BThe split-tree data structure 1150 includes a luma tree data portion 1150A and a chroma tree data portion 1150B. Luma and chroma data portions 1150A and 1150B are used to encode the luma and chroma components of these images, respectively, using the image syntax elements, in the bitstream of video data associated with the TU. If this dual-tree / local-tree (i.e., split-tree) configuration is enabled, LFNST indices 1110 and 1160 are signaled for the luma and chroma components (e.g., corresponding to luma transform sample 1102 and chroma transform sample 1104), respectively. The partitioning of the luma and chroma components may be misaligned, and different LFNST operations can be applied to the luma and chroma components of the TU separately. The LFNST index 1110 associated with these luma transform samples 1102 is signaled after the video encoder 20 parses, transforms, and / or quantizes the luma residual samples into luma transform samples 1102. The chroma LFNST index 1160 associated with these chroma transform samples 1104 is transmitted via a signal after the video encoder 20 resolves, transforms, and / or quantizes the second chroma residual sample (i.e., the Cr component) into a subset of the chroma transform sample 1104.
[0172] According to the luminance tree data portion 1150A of the split tree data structure 1150, LTSF 1106, luminance transform sample 1102, luminance LFNST index 1110, and luminance MTS index 1116 (if any) are arranged in a first ordered sequence in the bitstream of the TU's video data. According to the chroma tree data portion 1150B of the split tree data structure 1150, CTSF 1108, chroma transform sample 1104, chroma LFNST index 1160, and chroma MTS index 1166 (if any) are arranged in a second ordered sequence in the bitstream of the TU's video data. The second ordered sequence follows the first ordered sequence in the bitstream. After receiving the luminance LFNST index 1110, the corresponding inverse LFNST operation is applied to the luminance transform sample 1102, which has already been received and dequantized in the video encoder 30, regardless of whether the chroma transform sample 1104 has been received. In some cases, some or all of the luminance transform samples 1102 of the TU are dequantized, and an inverse LFNST operation can be applied to some or all of these luminance transform samples 1102 of the TU while receiving the CTSF 1108 or chromaticity residual sample 1104 of the chromaticity tree data portion 1150B. Similarly, after receiving the luminance MTS index 1116, the corresponding inverse primary transforms (e.g., DCT2, DST7, and DCT8) are applied to the luminance transform samples 1102 that have been received, dequantized, and optionally processed by the inverse LFNST transform operation, regardless of whether these chromaticity transform samples 1104 have been received. In some cases, some or all of the luminance transform samples 1102 of the TU are dequantized and subjected to an inverse LFNST operation, and an inverse primary transform is applied to some or all of the luminance transform samples 1102 of the TU while receiving the CTSF 1108 or chromaticity residual sample 1104.
[0173] In some embodiments, when LTSF 1106 is (1112) zero, a luminance LFNST index 1110 is signaled between the video encoder 20 and the decoder 30 to disable the transform skip mode of the luminance component of the TU and to ensure that the number of non-zero luminance transform samples generated by the encoder 20 is within a predefined range (1114). Upon receiving the luminance LFNST index 1110, the video decoder 30 determines whether LTSF 1106 is zero and whether the luminance LFNST index 1110 is non-zero. Based on the determination that LTSF 1106 is zero and the luminance LFNST index 1110 is non-zero, the video decoder 30 applies inverse LFNST to the luminance transform sample 1102 to generate a first decoded luminance sample for the TU. More specifically, the luminance transform sample 1102 is dequantized and subsequently processed by inverse LFNST to generate a first decoded luminance sample for subsequent inverse primary transforms (e.g., DCT2, DCT8, and DST-7). Note that the luminance LFNST index 1110 is transmitted via signal and the inverse LFNST is applied, without checking the number of CTSF 1108 and non-zero Cb or Cr chromaticity samples.
[0174] Furthermore, as described above, in some embodiments, the luminance MTS index 1116 is not always used and transmitted via signal from the video encoder 20 to the video decoder 30. The luminance MTS index 1116 is transmitted via signal based on the determination that the LFNST index 1110 is zero (i.e., inverse LFNST operation is disabled). When the luminance LFNST index 1110 is zero, the video encoder 30 applies one of (1118) DCT2 transform, DCT8 transform, or DST7 transform to the luminance transform sample 1102 based on the value of the luminance MTS index 1116. Conversely, when the luminance LFNST index 1110 is non-zero, the luminance MTS index 1116 is not used or transmitted via signal, and the video encoder 30 applies a predefined inverse transform (e.g., DCT2 transform) to the first decoded luminance sample by default after inverse LFNST. The predefined inverse transform is applied to the horizontal and vertical directions of the first decoded luminance sample.
[0175] Alternatively, in some embodiments, the luminance MTS index 1116 is transmitted via signal along with the bitstream of the TU, independent of the value of the luminance LFNST index 1110; however, the video decoder 30 applies DCT2, DST7, and DCT8 transforms based on both the luminance LFNST index 1110 and the luminance MTS index 1116. Based on the determination that the luminance LFNST index 1110 is enabled (e.g., non-zero), the video encoder 30 applies a (1120) DCT2 transform (i.e., a predefined inverse transform, independent of the luminance MTS index 1116) to the first decoded luminance sample. Based on the determination that the luminance LFNST index 1110 is disabled (e.g., zero), the video encoder 30 selects one of the (1118) DCT2, DST7, and DCT8 transforms based on the value of the luminance MTS index 1116, and applies the selected transform to these first decoded luminance samples after the inverse LFNST. For example, the video encoder 30 selects the corresponding one of the DCT2, DST7, and DCT8 transforms in each horizontal and vertical direction of the first decoded luminance sample based on the value of the MTS index 1116.
[0176] In the luminance tree data section 1150A, the luminance LFNST index 1110 may optionally precede, be between, or follow the LTSF 1106 and the luminance transformation sample 1102, regardless of the order of the LTSF 1106 and the luminance transformation sample 1102. The luminance MTS index 1116 may optionally precede, be between, or follow the LTSF 1106 and the luminance transformation sample 1102, regardless of the order of the LTSF 1106 and the luminance transformation sample 1102. The luminance LFNST index 1110 and the luminance MTS index 1116 may optionally be adjacent to or separate from each other. An example of the luminance tree data section 1150A includes an ordered sequence of luminance LFNST index 1110, LTSF 1106, luminance MTS index 1116, and luminance transformation sample 1102.
[0177] The chroma transform sample 1104 is decoded based on the chroma tree data portion 1150B of the split tree data structure 1150, similar to the decoding of the luminance transform sample 1102 based on the luminance tree data portion 1150A. In some embodiments, a chroma LFNST index 1160 is signaled between the video encoder 20 and the video decoder 30 when CTSF 1108 is (1162) zero to disable the transform skip mode of the chroma components of the TU and when the number of non-zero Cr transform samples and non-zero Cb transform samples generated by the encoder 20 is within a predefined range (1164). Upon receiving the chroma LFNST index 1160, the video decoder 30 determines whether CTSF 1108 is zero and whether the chroma LFNST index 1160 is non-zero. Based on the determination that CTSF 1108 is zero and the chroma LFNST index 1160 is not zero, the video decoder 30 applies inverse LFNST to the chroma transform sample 1104 to generate a first decoded chroma sample for the TU. More specifically, the chroma transform sample 1104 is dequantized and then processed by inverse LFNST to generate the first decoded chroma sample for subsequent inverse primary transforms (e.g., DCT2, DCT8, and DST-7). Note that the chroma LFNST index 1160 is transmitted via a signal and inverse LFNST is applied to the chroma transform sample 1104 without checking the number of LTSF 1106 and non-zero luminance samples.
[0178] Furthermore, in some embodiments, the chroma MTS index 1166 is not always used and signaled from the video encoder 20 to the video decoder 30. The chroma MTS index 1166 is signaled based on the determination that the chroma LFNST index 1160 is zero (i.e., inverse LFNST operation is disabled). When the chroma LFNST index 1160 is zero, the video encoder 30 applies one of the (1168) DCT2 transform, DCT8 transform, or DST7 transform to the chroma transform sample 1104 based on the value of the chroma MTS index 1166. Conversely, when the chroma LFNST index 1160 is non-zero, the chroma MTS index 1166 is not used or signaled, and the video encoder 30 applies a predefined inverse transform (e.g., DCT2 transform) by default on these first decoded chroma samples after inverse LFNST. The predefined inverse transform is applied to the horizontal and vertical directions of these first decoded chroma samples.
[0179] Alternatively, in some embodiments, the chroma MTS index 1166 is transmitted via signal along with the bitstream of the TU, independent of the value of the chroma LFNST index 1160. However, the video decoder 30 applies DCT2, DST7, and DCT8 transforms based on both the chroma LFNST index 1160 and the chroma MTS index 1166. Based on the determination that the chroma LFNST index 1160 is enabled (e.g., non-zero), the video encoder 30 applies a (1170) DCT2 transform (i.e., a predefined inverse transform) to the first decoded luminance samples, independent of the chroma MTS index 1166. Based on the determination that the chroma LFNST index 1160 is disabled (e.g., zero), the video encoder 30 selects one of the (1168) DCT2, DST7, and DCT8 transforms based on the value of the chroma MTS index 1166 and applies the selected transform to these first decoded chroma samples after the inverse LFNST. For example, the video encoder 30 selects one of the DCT2, DST7, and DCT8 transforms in each horizontal and vertical direction of the first decoded chroma sample based on the value of MTS index 1166.
[0180] In the chroma tree data section 1150B, the chroma LFNST index 1160 may optionally precede, be between, or follow the CTSF 1108 and the chroma transform sample 1104, regardless of their order. The chroma MTS index 1166 may optionally precede, be between, or follow the CTSF 1108 and the chroma transform sample 1104, regardless of their order. The chroma LFNST index 1160 and the chroma MTS index 1166 may optionally be adjacent to or separate from each other. An example of the chroma tree data section 1150B includes an ordered sequence of the chroma LFNST index 1160, CTSF 1108, chroma MTS index 1166, and chroma transform sample 1104.
[0181] Figure 12 This is a flowchart illustrating a method 1200 for decoding video data according to some embodiments. Method 1200 is optionally controlled by instructions stored in a non-transitory computer-readable storage medium and executed by one or more processors of an electronic device (e.g., destination device 14). Figure 12Each operation shown may correspond to instructions stored in a computer memory or computer-readable storage medium of an electronic device. The computer-readable storage medium may include disk or optical disc storage devices, solid-state storage devices such as flash memory, or other non-volatile storage devices or equipment. Computer-readable instructions stored on the computer-readable storage medium may include one or more of the following: source code, assembly language code, object code, or other instruction formats interpreted by one or more processors. Some operations in method 1200 may be combined and / or the order of some operations may be changed.
[0182] The electronic device receives (1202) the luminance transition skip flag 1106 and a plurality of luminance transition samples 1102 of the conversion unit via a bitstream encoded with the conversion unit. The electronic device receives (1204) the low-frequency non-separable transform (LFNST) index 1110 associated with the conversion unit via the bitstream. In some embodiments, the LFNST index 1110 is received (1206) based on a determination that the luminance transition skip flag 1106 is zero and the number of non-zero luminance samples in the luminance transition samples 1102 is within a predefined range. After receiving the LFNST index 1110, the electronic device receives (1208) the chroma transition skip flag 1108 and the chroma transition sample 1104 associated with the conversion unit via the bitstream.
[0183] Based on the determination that LFNST index 1110 is not zero and luminance transition skip flag 1106 is zero, the electronic device applies (1210) inverse LFNST to luminance transition sample 1102 to generate a first decoded luminance sample for the transition unit. In some embodiments, (1212) inverse LFNST is applied to luminance transition sample 1102 simultaneously with receiving (1208) chroma transition skip flag 1108 and at least a portion of chroma transition sample 1104. In some embodiments, inverse LFNST is applied (1214) to luminance transition sample 1102 after receiving (1208) chroma transition skip flag 1108 and chroma transition sample 1104.
[0184] In some embodiments, based on the determination that the LFNST index is not zero, the electronic device applies a predefined inverse primary transform (1216) in both the horizontal and vertical directions to these first decoded luminance samples of the transform unit after the inverse LFNST. Moreover, this predefined inverse primary transform is optionally performed, at least partially simultaneously with or after receiving the chroma transform skip flag 1108 and the chroma transform sample 1104. Furthermore, in some embodiments, the predefined inverse primary transform includes an inverse DCT2 transform, which is applied to these first decoded luminance samples associated with the TU.
[0185] In some embodiments, the electronic device receives (1218) a Multiple Transform Selection (MTS) index 1116 of a bitstream having a transform unit encoded. Based on a determination that LFNST is zero, one of the inverse DCT2, DCT8, and DST7 transforms is applied (1220) to a first decoded luminance sample based on the MTS index 1116. Furthermore, in some embodiments, the MTS index 1116 is received before receiving the chroma transform skip flag 1108 and chroma transform sample 1104 associated with the transform unit. In some embodiments, the MTS index 1116 is received only based on a determination that LFNST is zero. In some embodiments, the luminance transform skip flag 1106, luminance transform sample 1102, LFNST index, MTS index 1116, chroma transform skip flag 1108, and chroma transform sample 1104 are arranged in an ordered sequence in the bitstream.
[0186] Furthermore, in some embodiments, based on the determination that MTS index 1116 has a first value, the electronic device receives values for the MTS horizontal flag and the MTS vertical flag from the bitstream, applies a horizontal transform to the luminance transform sample 1102 of the transform unit based on the value for the MTS horizontal flag, and applies a vertical transform to the luminance transform sample 1102 of the transform unit based on the value for the MTS vertical flag. Based on the determination that MTS index 1116 has a second value different from the first value, the luminance transform sample 1102 of the transform unit is transformed in the horizontal and vertical directions using a predefined default transform (e.g., DCT2). Furthermore, in some embodiments, the predefined default transform is an inverse DCT2 transform, and each of the horizontal and vertical transforms is an inverse DCT7 transform or an inverse DCT8 transform.
[0187] In some embodiments, the TU is encoded using a single tree data structure 1100, and the inverse LFNST is applied regardless of the chroma transition skip flag 1108, the number of non-zero Cb samples in the chroma transition sample 1104, and the number of non-zero Cr samples in the chroma transition sample 1104. That is, the chroma transition skip flag 1108, the number of non-zero Cb samples, and the number of non-zero Cr samples are not checked before applying the inverse LFNST to the luminance transition sample 1102.
[0188] In some embodiments, the luminance transformation sample 1102 and chrominance transformation sample 1104 of the transformation unit are encoded using a split tree data structure 1150. After receiving the chrominance transformation skip flag 1108 and the chrominance transformation sample 1104, the electronic device further receives (1218) the chrominance LFNST index 1160. A second inverse LFNST is applied to the chrominance transformation sample 1104, regardless of the number of non-zero luminance samples in the luminance transformation skip flag 1106 and the luminance transformation sample 1102. Furthermore, in some embodiments, the chrominance LFNST index 1160 is received based on the chrominance transformation skip flag 1108 being zero and the number of non-zero Cb samples and non-zero Cr samples in the chrominance transformation sample 1104 being within a predefined range. Additionally, in some embodiments, based on the determination that the chrominance LFNST index 1160 is non-zero and the chrominance transformation skip flag 1108 is zero, the electronic device applies the second inverse LFNST to the chrominance transformation sample 1104 to generate a decoded chrominance sample for the transformation unit.
[0189] In some embodiments, the chromaticity residual of the transform unit is encoded in a Joint Coding of Chromaticity Residues (JCCR) mode, and the chromaticity transform skip flag is non-zero, as defined in the chromaticity transform skip flag 1108 for the TU.
[0190] It should be understood that Figure 12 The specific order of operations described herein is merely exemplary and is not intended to indicate that the described order is the only possible order in which these operations can be performed. Those skilled in the art will recognize various ways to reorder the operations described herein. Additionally, it should be noted that this description pertains to data structures 1100 and 1150 (e.g., Figure 11A and 11B The details of other processes described above can also be found in similar formats to those described above. Figure 12 The method described in section 1200 applies. For the sake of brevity, these details will not be repeated here.
[0191] In one or more examples, the described functionality can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, these functions can be stored as one or more instructions or code on a computer-readable medium or transmitted through a computer-readable medium and executed by a hardware-based processing unit. A computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium such as a data storage medium, or a communication medium that includes any medium facilitating the transfer of a computer program from one place to another, for example, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium accessible by one or more computers or one or more processors to retrieve instructions, code, and / or data structures to implement the embodiments described in this application. Computer program products may include computer-readable media.
[0192] The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the claims. The singular forms “a” and “the” used in the description of the embodiments and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and covers any and all possible combinations of one or more of the associated listed items. It will be further understood that, when used in this specification, the term “comprising” specifies the presence of the stated features, elements, and / or components, but does not exclude the presence or addition of one or more other features, elements, components, and / or groups thereof.
[0193] It should also be understood that while the terms first, second, etc., may be used herein to describe various elements, these elements should not be limited to these terms. These terms are used only to distinguish one element from another. For example, without departing from the scope of the embodiments, a first electrode may be referred to as a second electrode, and similarly, a second electrode may be referred to as a first electrode. Both the first electrode and the second electrode are electrodes, but they are not the same electrode.
[0194] The description in this application is presented for illustrative and descriptive purposes and is not intended to be exhaustive or limiting of the invention in the disclosed form. Many modifications, variations, and alternative implementations will be apparent to those skilled in the art from the teachings presented in the foregoing description and the accompanying drawings. The embodiments were chosen and described in order to best explain the principles of the invention, its practical application, and to enable others skilled in the art to understand the various implementations of the invention and to best utilize the basic principles and various implementations with various modifications, as appropriate for the particular intended use. Therefore, it should be understood that the scope of the claims is not limited to the specific examples of the disclosed embodiments, and that modifications and other implementations are intended to be included within the scope of the appended claims.
Claims
1. A method for decoding video data, the method comprising: The brightness transformation skip flag and multiple brightness transformation samples of the transformation unit are received via the bit stream of the transformation unit; The low-frequency non-separable transform (LFNST) index associated with the transform unit is received via the bit stream; After receiving the LFNST index, the chroma transform skip flag and multiple chroma transform samples associated with the transform unit are received via the bit stream; as well as Based on the determination that the LFNST index is not zero and the luminance transition skip flag is zero, inverse LFNST is applied to the plurality of luminance transition samples to generate a plurality of first decoded luminance samples for the transition unit.
2. The method as described in claim 1, wherein, The LFNST index is received based on the determination that the brightness transformation skip flag is zero and the number of non-zero brightness samples among the plurality of brightness transformation samples is within a predefined range.
3. The method of claim 1 or 2, further comprising: Based on the determination that the LFNST index is not zero, a predefined inverse primary transform is applied to the plurality of first decoded luminance samples of the transform unit in the horizontal and vertical directions after the inverse LFNST.
4. The method of claim 3, wherein, The predefined inverse primary transform includes an inverse DCT2 transform, which is applied to the plurality of first decoded luminance samples associated with the transform unit.
5. The method of claim 1, further comprising: Receive a multiple transform selection (MTS) index having a bitstream that encodes the transform unit; as well as Based on the determination that LFNST is zero, one of the inverse DCT2, DCT8 and DST-7 transforms is applied to the plurality of first decoded luminance samples based on the MTS index.
6. The method of claim 5, wherein, The MTS index is received before receiving the chroma transformation skip flag associated with the transformation unit and the plurality of chroma transformation samples.
7. The method of claim 5, wherein, The MTS index is received only based on the determination that the LFNST is zero.
8. The method of claim 5, wherein, The luminance transition skip flag, the plurality of luminance transition samples, the LFNST index, the MTS index, the chrominance transition skip flag, and the plurality of chrominance transition samples are arranged in an ordered sequence in the bitstream.
9. The method of claim 5, wherein, Applying one of the inverse DCT2, DCT8, and DST-7 transforms to the plurality of first decoded luminance samples based on the MTS index includes: Based on the determination that the MTS index has a first value: Receive the values for the MTS horizontal flag and the values for the MTS vertical flag from the bit stream; Based on the value of the MTS level flag, a horizontal transformation is applied to the plurality of brightness transformation samples of the transformation unit in the horizontal direction; and Based on the value of the MTS vertical flag, a vertical transformation is applied to the plurality of brightness transformation samples of the transformation unit in the vertical direction after the horizontal transformation; and Based on the determination that the MTS index has a second value different from the first value: The plurality of brightness transformation samples of the transformation unit are transformed in the horizontal and vertical directions using a predefined default transformation.
10. The method of claim 9, wherein, The predefined default transform is the inverse DCT2 transform, and each of the horizontal and vertical transforms is either the inverse DST-7 transform or the inverse DCT8 transform.
11. The method of claim 1, wherein, The application of the inverse LFNST also includes: The inverse LFNST is applied to the plurality of luminance transformation samples at least partially simultaneously with receiving the chroma transformation skip flag and the plurality of chroma transformation samples.
12. The method of claim 1, wherein, The application of the inverse LFNST also includes: Upon receiving the chroma transformation skip flag and the plurality of chroma transformation samples, the inverse LFNST is applied to the plurality of luminance transformation samples.
13. The method of claim 1, wherein, The transform unit is encoded using a single-tree data structure, and the inverse LFNST is applied independently of the chroma transform skip flag, the number of non-zero Cb samples in the plurality of chroma transform samples, and the number of non-zero Cr samples in the plurality of chroma transform samples.
14. The method of claim 1, wherein, The method further includes encoding the plurality of luminance transformation samples and the plurality of chrominance transformation samples of the transformation unit using a split tree data structure. After receiving the chroma transformation skip flag and the plurality of chroma transformation samples, the chroma LFNST index is received; The second inverse LFNST is applied to the plurality of chroma transformation samples independently of the luminance transformation skip flag and the number of non-zero luminance samples in the plurality of luminance transformation samples.
15. The method of claim 14, wherein, The chromaticity LFNST index is received based on the determination that the chromaticity transformation skip flag is zero and the number of non-zero Cb samples and the number of non-zero Cr samples in the plurality of chromaticity transformation samples are within the predefined range.
16. The method of claim 14, further comprising: Based on the determination that the chroma LFNST index is not zero and the chroma transform skip flag is zero, the second inverse LFNST is applied to the plurality of chroma transform samples to generate decoded chroma samples for the transform unit.
17. The method of claim 1, wherein, The chroma residual of the transform unit is encoded and decoded in the Joint Chromaticity Residual Codec (JCCR) mode, and for the transform unit, the chroma transform skip flag is non-zero.
18. An electronic device comprising: One or more processors; as well as A memory that stores instructions that, when executed by the one or more processors, cause the processors to perform the method of any one of claims 1-17.
19. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform the method of any one of claims 1-17.
20. A computer program product, comprising a plurality of programs, the plurality of programs being executed by an electronic device with one or more processors, wherein, When the plurality of programs are executed by the one or more processors, the electronic device performs the method as described in any one of claims 1-17.