Supporting both lossy and lossless coding for residual coding
By using a unified transform skip mode residual decoding technique, the compatibility problem between lossy and lossless decoding is solved, the performance of video encoders and decoders is improved, and processing power consumption and encoding latency are reduced.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- QUALCOMM INC
- Filing Date
- 2020-12-23
- Publication Date
- 2026-06-23
Smart Images

Figure CN114846803B_ABST
Abstract
Description
[0001] This application claims the benefit of U.S. Application No. 17 / 131,046, filed December 22, 2020, and U.S. Provisional Patent Application No. 62 / 953,872, filed December 26, 2019, the entire contents of each of which are incorporated herein by reference. Technical Field
[0002] This disclosure relates to video encoding and video decoding. Background Technology
[0003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital live broadcast systems, wireless broadcasting systems, personal digital assistants (PDAs), laptops or desktop computers, tablets, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, mobile phones (such as cellular or satellite wireless phones or so-called "smartphones"), video conferencing equipment, video streaming devices, and so on. Digital video devices implement video decoding technologies (such as those described in standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4 (Part 10, Advanced Video Decoding (AVC)), ITU-T H.265 / High Efficiency Video Decoding (HEVC), and extensions to such standards). By implementing such video decoding technologies, video devices can more efficiently send, receive, encode, decode, and / or store digital video information.
[0004] Video decoding techniques include spatial (intra-picture) prediction and / or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video decoding, video slices (e.g., video pictures or portions of video pictures) can be segmented into video blocks, which may also be referred to as decoding tree units (CTUs), decoding units (CUs), and / or decoding nodes. Video blocks in a slice of a picture that has been intra-decoded (I) are encoded using spatial prediction relative to reference samples in neighboring blocks within the same picture. Video blocks in a slice of a picture that has been inter-decoded (P or B) can use spatial prediction relative to reference samples in neighboring blocks within the same picture or temporal prediction relative to reference samples in other reference pictures. A picture may be referred to as a frame, and a reference picture may be referred to as a reference frame. Summary of the Invention
[0005] In summary, this disclosure describes techniques for video decoding, and more specifically, techniques for residual decoding applicable to both lossy and lossless decoding with transform skipping modes. This disclosure relates to an entropy decoding process that converts a binary representation into a series of non-binary value quantization coefficients. The corresponding entropy coding process, as the inverse of entropy decoding, is implicitly specified and therefore also part of this disclosure, although not necessarily explicitly described herein. Examples of this disclosure can be applied to any existing video codec (such as extensions to High Efficiency Video Decoding (HEVC) or Multi-Functional Video Decoding (VVC)) and can be proposed as decoding tools for currently developing standards, and / or used with other future video decoding standards.
[0006] In one example, a method includes: determining whether a transform skip mode is used for a current block of video data; disabling level mapping for residual decoding based on the transform skip mode being used for the current block; and decoding the current block without applying level mapping.
[0007] In another example, a device includes: a memory configured to store video data; and one or more processors implemented in circuitry and coupled to the memory, the one or more processors being configured to: determine whether a transform skip mode is used for a current block of video data; disable level mapping for residual decoding based on the transform skip mode being used for the current block; and decode the current block without applying level mapping.
[0008] In another example, an apparatus includes: a unit for determining whether a transform skip mode is used for a current block of video data; a unit for disabling level mapping for residual decoding based on the transform skip mode being used for the current block; and a unit for decoding the current block without applying level mapping.
[0009] In another example, a non-transitory computer-readable storage medium encoded with instructions that, when executed, cause one or more processors to: determine whether a transform skip mode is used for the current block of video data; disable level mapping for residual decoding based on the transform skip mode being used for the current block; and decode the current block without applying level mapping.
[0010] Details of one or more examples are set forth in the accompanying drawings and the following description. Other features, objects, and advantages will be apparent from the description, drawings, and claims. Attached Figure Description
[0011] Figure 1This is a block diagram illustrating an example video encoding and decoding system that can perform the techniques described in this disclosure.
[0012] Figure 2A and Figure 2B This is a conceptual diagram showing an example quadtree binary tree (QTBT) structure and its corresponding decoding tree unit (CTU).
[0013] Figure 3 This is a block diagram illustrating an example video encoder that can perform the techniques described in this disclosure.
[0014] Figure 4 This is a block diagram illustrating an example video decoder that can perform the techniques described in this disclosure.
[0015] Figure 5 This is a conceptual diagram illustrating the interleaving method for decoding CG symbols and coefficients in VVC Draft 7.
[0016] Figure 6 This is a conceptual diagram showing a template for the adjacent coefficients used in the derivation of the Rice parameter.
[0017] Figure 7 This is a conceptual diagram showing an example local template with 5 adjacent coefficients.
[0018] Figure 8 This is a conceptual diagram showing another example of a local template with 5 adjacent coefficients.
[0019] Figure 9 This is a flowchart illustrating an example of a transform skip mode decoding technique that embodies the present disclosure.
[0020] Figure 10 This is a flowchart illustrating a method for encoding video data.
[0021] Figure 11 This is a flowchart illustrating a method for decoding video data. Detailed Implementation
[0022] In some video decoding standards, one residual decoding technique is used for lossy decoding, and another for lossless decoding. Therefore, if a video encoder decides to use a combination of lossy and lossless decoding techniques, the encoder may not be able to adapt to both, or its performance may be reduced, as this would require the encoder to use two different residual decoding techniques.
[0023] According to the techniques disclosed herein, the transform skip residual decoding technique can be the same for both lossy and lossless decoding. These techniques can improve the performance of video encoders and decoders (e.g., reduce processing power consumption) and / or reduce encoding latency.
[0024] Figure 1 This is a block diagram illustrating an example video encoding and decoding system 100 capable of performing the techniques of this disclosure. In general, the techniques of this disclosure relate to coding (encoding and / or decoding) video data. Typically, video data includes any data used for processing video. Therefore, video data can include raw, unencoded video, encoded video, decoded (e.g., reconstructed) video, and video metadata (e.g., signaling data).
[0025] like Figure 1 As shown, in this example, the video encoding and decoding system 100 includes a source device 102 that provides encoded video data to be decoded and displayed by a destination device 116. Specifically, the source device 102 provides the video data to the destination device 116 via a computer-readable medium 110. The source device 102 and the destination device 116 can include any device in a wide range of applications, including desktop computers, laptop computers, tablet computers, set-top boxes, mobile phones (such as smartphones), televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, etc. In some cases, the source device 102 and the destination device 116 may be equipped for wireless communication and therefore may be referred to as wireless communication devices.
[0026] exist Figure 1 In the example, source device 102 includes a video source 104, memory 106, video encoder 200, and output interface 108. Destination device 116 includes an input interface 122, video decoder 300, memory 120, and display device 118. According to this disclosure, the video encoder 200 of source device 102 and the video decoder 300 of destination device 116 can be configured to apply residual decoding techniques for transform skipping modes applicable to both lossy and lossless decoding. Therefore, source device 102 represents an example of a video encoding device, while destination device 116 represents an example of a video decoding device. In other examples, the source and destination devices may include other components or arrangements. For example, source device 102 may receive video data from an external video source such as an external camera. Similarly, destination device 116 may interface with an external display device, rather than including an integrated display device.
[0027] exist Figure 1The video encoding and decoding system 100 shown is merely an example. Typically, any digital video encoding and / or decoding device can perform residual decoding techniques for transform skipping modes applicable to both lossy and lossless decoding. Source device 102 and destination device 116 are merely examples of such decoding devices, where source device 102 generates encoded video data for transmission to destination device 116. In this disclosure, "decoding device" refers to a device that performs the decoding (e.g., encoding and / or decoding) of data. Therefore, video encoder 200 and video decoder 300 represent examples of decoding devices (specifically, video encoder and video decoder), respectively. In some examples, source device 102 and destination device 116 may operate in a substantially symmetrical manner, such that each of source device 102 and destination device 116 includes video encoding and decoding components. Therefore, the video encoding and decoding system 100 can support one-way or two-way video transmission between the source device 102 and the destination device 116, for example, for video streaming, video playback, video broadcasting, or video telephony.
[0028] Typically, video source 104 represents a source of video data (i.e., raw, unencoded video data) and a sequence of pictures (also referred to as "frames") that provide the video data in order to video encoder 200, which encodes the data used for the pictures. Video source 104 of source device 102 may include video capture devices such as cameras, video archive units containing previously captured raw video, and / or video feed interfaces for receiving video from video content providers. Alternatively, video source 104 may generate computer graphics-based data as source video, or a combination of live video, archived video, and computer-generated video. In each case, video encoder 200 may encode the captured, pre-captured, or computer-generated video data. Video encoder 200 may rearrange the pictures from their received order (sometimes referred to as "display order") to a decoding order for decoding. Video encoder 200 may generate a bitstream comprising the encoded video data. Then, the source device 102 can output the encoded video data to the computer-readable medium 110 via the output interface 108 so that it can be received and / or retrieved by, for example, the input interface 122 of the destination device 116.
[0029] The memory 106 of source device 102 and the memory 120 of destination device 116 represent general-purpose memory. In some examples, memory 106 and memory 120 may store raw video data, such as raw video from video source 104 and raw decoded video data from video decoder 300. Alternatively or additionally, memory 106 and memory 120 may store software instructions executable by, for example, video encoder 200 and video decoder 300 respectively. Although memory 106 and memory 120 are shown separately from video encoder 200 and video decoder 300 in this example, it should be understood that video encoder 200 and video decoder 300 may also include internal memory for functionally similar or equivalent purposes. Furthermore, memory 106 and memory 120 may store, for example, encoded video data output from video encoder 200 and input to video decoder 300. In some examples, portions of memory 106 and memory 120 may be allocated as one or more video buffers, for example, to store raw decoded and / or encoded video data.
[0030] Computer-readable medium 110 can represent any type of medium or device capable of transmitting encoded video data from source device 102 to destination device 116. In one example, computer-readable medium 110 represents a communication medium that enables source device 102 to directly transmit encoded video data to destination device 116 in real time, for example, via a radio frequency network or a computer-based network. Output interface 108 can modulate the transmitted signal including encoded video data according to a communication standard such as a wireless communication protocol, and input interface 122 can demodulate the received transmitted information according to a communication standard such as a wireless communication protocol. The communication medium can include any wireless or wired communication medium, such as radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network such as a local area network, a wide area network, or a global network such as the Internet. The communication medium can include a router, switch, base station, or any other device that may be useful for facilitating communication from source device 102 to destination device 116.
[0031] In some examples, source device 102 can output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 can access encoded data from storage device 112 via input interface 122. Storage device 112 may include any of a wide variety of distributed or locally accessible data storage media, such as hard disk drives, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
[0032] In some examples, source device 102 may output encoded video data to file server 114 or another intermediate storage device that may store the encoded video generated by source device 102. Destination device 116 may access the stored video data from file server 114 via streaming or downloading. File server 114 may be any type of server device capable of storing and sending encoded video data to destination device 116. File server 114 may represent a web server (e.g., for a website), a file transfer protocol (FTP) server, a content delivery network device, or a network attached storage (NAS) device. Destination device 116 may access the encoded video data from file server 114 via any standard data connection (including an Internet connection). This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), a cable modem, etc.), or a combination of both, suitable for accessing the encoded video data stored on file server 114. File server 114 and input interface 122 may be configured to operate according to a streaming protocol, a download protocol, or a combination thereof.
[0033] Output interface 108 and input interface 122 can represent a wireless transmitter / receiver, a modem, a wired networking component (e.g., an Ethernet card), a wireless communication component operating according to any of the various IEEE 802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 include wireless components, output interface 108 and input interface 122 can be configured to transmit data (such as encoded video data) according to cellular communication standards (such as 4G, 4G-LTE (Long Term Evolution), improved LTE, 5G, etc.). In some examples where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 can be configured to operate according to other wireless standards (such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., ZigBee)). TM Bluetooth TM The source device 102 and / or destination device 116 may include corresponding system-on-a-chip (SoC) devices. For example, source device 102 may include an SoC device for performing the functions assigned to video encoder 200 and / or output interface 108, and destination device 116 may include an SoC device for performing the functions assigned to video decoder 300 and / or input interface 122.
[0034] The technology disclosed herein can be applied to video decoding to support any of a wide variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission (such as HTTP-based Dynamic Adaptive Streaming (DASH)), digital video encoded onto data storage media, decoding digital video stored on data storage media, or other applications.
[0035] The input interface 122 of the destination device 116 receives an encoded video bitstream from a computer-readable medium 110 (e.g., a communication medium, storage device 112, file server 114, etc.). The encoded video bitstream may include signaling information defined by the video encoder 200, such as syntax elements (which are also used by the video decoder 300), which have values describing the characteristics and / or processing of video blocks or other decoding units (e.g., slices, pictures, picture groups, sequences, etc.). The display device 118 displays a decoded picture of the decoded video data to the user. The display device 118 may represent any of a wide variety of display devices, such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma display, organic light-emitting diode (OLED) display, or another type of display device.
[0036] Despite Figure 1 Not shown, but in some examples, the video encoder 200 and video decoder 300 may each be integrated with the audio encoder and / or audio decoder, and may include appropriate MUX-DEMUX units or other hardware and / or software to process multiplexed streams including both audio and video in a common data stream. Where applicable, the MUX-DEMUX unit may comply with the ITU H.223 multiplexer protocol or other protocols (such as User Datagram Protocol (UDP)).
[0037] The video encoder 200 and video decoder 300 can each be implemented as any of a wide variety of suitable encoder and / or decoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the technology is implemented in part in software, the device may store instructions for the software in a suitable non-transitory computer-readable medium, and may use one or more processors to execute the instructions in hardware to perform the contents of this disclosure. Each of the video encoder 200 and video decoder 300 may be included in one or more encoders or decoders, and either encoder or decoder may be integrated as part of a combined encoder / decoder (CODEC) in the respective device. Devices including the video encoder 200 and / or video decoder 300 may include integrated circuits, microprocessors, and / or wireless communication devices (such as cellular phones).
[0038] The video encoder 200 and video decoder 300 can operate according to video decoding standards such as ITU-T H.265 (also known as the High Efficiency Video Coding (HEVC) standard) or extensions thereof such as multi-view or scalable video decoding extensions. Alternatively, the video encoder 200 and video decoder 300 can operate according to other proprietary or industry standards such as ITU-T H.266 (also known as Versatile Video Coding (VVC)). The draft of the VVC standard is described in: Bross et al., “Versatile Video Coding (Draft 7)”, Joint Video Experts Group (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC1 / SC 29 / WG 11, 16th meeting, Geneva, Switzerland, October 1-11, 2019, JVET-P2001-v14 (hereinafter referred to as “VVC Draft 7”). A more recent draft of the VVC standard is described in: Bross et al., “Versatile Video Coding (Draft 10)”, Joint Video Experts Group (JVET) of ITU-T SG 16WP 3 and ISO / IEC JTC 1 / SC29 / WG 11, 19th meeting by teleconference, 22 June–1 July 2020, JVET-S2001-v17 (hereinafter referred to as “VVC Draft 10”). However, the technology of this disclosure is not limited to any particular decoding standard.
[0039] Typically, video encoder 200 and video decoder 300 can perform block-based decoding of images. The term "block" generally refers to a structure that includes data to be processed (e.g., encoded, decoded, or otherwise used during encoding and / or decoding). For example, a block may include a two-dimensional matrix of samples of luminance and / or chrominance data. Typically, video encoder 200 and video decoder 300 can decode video data represented in YUV (e.g., Y, Cb, Cr) format. That is, instead of decoding the red, green, and blue (RGB) data used for images, video encoder 200 and video decoder 300 can decode both luminance and chrominance components, where chrominance components may include both red hue and blue hue chrominance components. In some examples, video encoder 200 converts received RGB-formatted data to YUV representation before encoding, and video decoder 300 converts the YUV representation to RGB format. Alternatively, preprocessing and post-processing units (not shown) can perform these conversions.
[0040] In summary, this disclosure may relate to the decoding (e.g., encoding and decoding) of images, including the process of encoding or decoding the data of an image. Similarly, this disclosure may relate to the decoding of blocks of an image, including the process of encoding or decoding the data used for the blocks (e.g., prediction and / or residual decoding). Encoded video bitstreams typically include a series of values for representing decoding decisions (e.g., decoding modes) and syntax elements that segment the image into blocks. Therefore, references to decoding images or blocks should generally be understood as decoding the values of the syntax elements used to form images or blocks.
[0041] HEVC defines various blocks, including decoding units (CUs), prediction units (PUs), and transform units (TUs). According to HEVC, a video decoder (such as a video encoder 200) partitions the decoding tree unit (CTU) into CUs based on a quadtree structure. That is, the video decoder partitions the CTU and CU into four equal, non-overlapping squares, and each node of the quadtree has zero or four child nodes. Nodes without child nodes can be called "leaf nodes," and a CU with such a leaf node can include one or more PUs and / or one or more TUs. The video decoder can further partition the PUs and TUs. For example, in HEVC, a residual quadtree (RQT) represents a partition of a TU. In HEVC, a PU represents inter-frame prediction data, while a TU represents residual data. CUs with intra-frame prediction include intra-frame prediction information, such as intra-frame mode indication.
[0042] As another example, video encoder 200 and video decoder 300 can be configured to operate according to VVC. According to VVC, the video decoder (such as video encoder 200) segments the image into multiple decoding tree units (CTUs). Video encoder 200 can segment CTUs according to a tree structure (such as a quadtree-binary tree (QTBT) structure or a multi-type tree (MTT) structure). The QTBT structure eliminates the concept of multiple segmentation types, such as the separation between CUs, PUs, and TUs in HEVC. The QTBT structure includes two levels: a first level segmented according to quadtree segmentation and a second level segmented according to binary tree segmentation. The root node of the QTBT structure corresponds to a CTU. The leaf nodes of the binary tree correspond to decoding units (CUs).
[0043] In the MTT partitioning structure, blocks can be partitioned using quadtree (QT) partitioning, binary tree (BT) partitioning, and one or more types of ternary tree (TT) partitioning (also known as triplet tree (TT)). A ternary tree or triplet tree partitioning is a partition in which a block is divided into three sub-blocks. In some examples, a ternary tree or triplet tree partitioning divides a block into three sub-blocks without partitioning the original block through a center. The partitioning types in MTT (e.g., QT, BT, and TT) can be symmetric or asymmetric.
[0044] In some examples, the video encoder 200 and the video decoder 300 may use a single QTBT or MTT structure to represent each of the luma and chroma components, while in other examples, the video encoder 200 and the video decoder 300 may use two or more QTBT or MTT structures, such as one QTBT / MTT structure for the luma component and another QTBT / MTT structure for the two chroma components (or two QTBT / MTT structures for the respective chroma components).
[0045] The video encoder 200 and video decoder 300 can be configured to use per-HEVC quadtree segmentation, QTBT segmentation, MTT segmentation, or other segmentation structures. For illustrative purposes, a description of the techniques of this disclosure is given with respect to QTBT segmentation. However, it should be understood that the techniques of this disclosure can also be applied to video decoders configured to use quadtree segmentation or other types of segmentation.
[0046] Blocks (e.g., CTUs or CUs) can be grouped in various ways within an image. As an example, a brick-shaped block can refer to a rectangular area of a row of CTUs within a specific tile in an image. A tile can be a rectangular area of CTUs within a specific tile column and a specific tile row in an image. A tile column refers to a rectangular area of CTUs with a height equal to the height of the image and a width specified by a syntax element (e.g., in a set of image parameters). A tile row refers to a rectangular area of CTUs with a height specified by a syntax element (e.g., in a set of image parameters) and a width equal to the width of the image.
[0047] In some examples, a region can be divided into multiple brick blocks, each of which may include one or more CTU rows within the region. A region that is not divided into multiple brick blocks may still be referred to as a brick block. However, a brick block that is a true subset of a region may not be referred to as a region.
[0048] The bricks in an image can also be arranged as slices. A slice can be an integer number of bricks in the image, which can be exclusively contained in a single Network Abstraction Layer (NAL) unit. In some examples, a slice consists of multiple complete regions or a continuous sequence of complete bricks consisting of only one region.
[0049] This disclosure uses “NxN” and “N by N” interchangeably to refer to the sample size of a block (such as a CU or other video block) in the vertical and horizontal dimensions, for example, 16x16 samples or 16 by 16 samples. Typically, a 16x16 CU will have 16 samples in the vertical direction (y = 16) and 16 samples in the horizontal direction (x = 16). Similarly, an NxNCU typically has N samples in the vertical direction and N samples in the horizontal direction, where N represents a non-negative integer value. Samples in a CU can be arranged in rows and columns. Furthermore, a CU does not necessarily need to have the same number of samples in the horizontal direction as it does in the vertical direction. For example, a CU can include NxM samples, where M is not necessarily equal to N.
[0050] The video encoder 200 encodes video data for use in predicting and / or residual information, as well as other information, for the CU. The prediction information indicates how the CU will be predicted to form a prediction block for the CU. The residual information typically represents the sample-by-sample difference between a sample of the CU before encoding and the prediction block.
[0051] To predict the Cubic Frame (CU), the video encoder 200 typically forms prediction blocks for the CU using either inter-frame prediction or intra-frame prediction. Inter-frame prediction generally refers to predicting the CU based on data from previously decoded images, while intra-frame prediction generally refers to predicting the CU based on data from previously decoded images of the same frame. To perform inter-frame prediction, the video encoder 200 can generate prediction blocks using one or more motion vectors. The video encoder 200 can typically perform motion search to identify, for example, reference blocks that closely match the CU in terms of the difference between the CU and a reference block. The video encoder 200 can calculate difference metrics using sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared difference (MSD), or other such difference calculations to determine whether the reference block closely matches the current CU. In some examples, the video encoder 200 can use unidirectional or bidirectional prediction to predict the current CU.
[0052] Some examples of VVC also provide an affine motion compensation mode, which can be considered an inter-frame prediction mode. In affine motion compensation mode, the video encoder 200 can determine two or more motion vectors representing non-translational motion (such as zooming in or out, rotation, perspective motion, or other irregular types of motion).
[0053] To perform intra-frame prediction, the video encoder 200 can select an intra-frame prediction mode to generate prediction blocks. Some examples of VVC provide sixty-seven intra-frame prediction modes, including various directional modes, as well as planar and DC modes. Typically, the video encoder 200 selects an intra-frame prediction mode that describes the samples of the current block (e.g., a block of a CU) to be predicted based on, which are the neighboring samples of the current block. Assuming the video encoder 200 decodes the CTU and CU in raster scan order (from left to right, from top to bottom), such samples can typically be located above, to the upper left, or to the left of the current block within the same image.
[0054] The video encoder 200 encodes data representing the prediction mode used for the current block. For example, for inter-frame prediction modes, the video encoder 200 may encode data indicating which of the various available inter-frame prediction modes is used, as well as motion information for the corresponding mode. For unidirectional or bidirectional inter-frame prediction, for example, the video encoder 200 may use Advanced Motion Vector Prediction (AMVP) or merging modes to encode motion vectors. The video encoder 200 may use similar modes to encode motion vectors used for affine motion compensation modes.
[0055] Following a prediction, such as intra-frame or inter-frame prediction of a block, the video encoder 200 can compute residual data for that block. The residual data (such as a residual block) represents the sample-by-sample difference between the block and the prediction block used to form the block, which is formed using the corresponding prediction mode. The video encoder 200 can apply one or more transforms to the residual block to produce transformed data in the transform domain rather than the sample domain. For example, the video encoder 200 can apply a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video data. Additionally, the video encoder 200 can apply a secondary transform after the first transform, such as a Mode-dependent Inseparable Quadratic Transform (MDNSST), a Signal-dependent Transform, a Karhunen-Loeve Transform (KLT), etc. The video encoder 200 produces transform coefficients after applying one or more transforms.
[0056] As described above, after any transformation to produce transform coefficients (or a skipped transformation, such as in the case of a transform skip mode), the video encoder 200 can perform quantization on the transform coefficients (or non-transform coefficients). Quantization generally refers to the process in which transform coefficients (or non-transform coefficients) are quantized to potentially reduce the amount of data used to represent the transform coefficients (or non-transform coefficients), thereby providing further compression. By performing the quantization process, the video encoder 200 can reduce the bit depth associated with some or all of the transform coefficients. For example, the video encoder 200 can round down an n-bit value to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, the video encoder 200 can perform a bitwise right shift of the value to be quantized.
[0057] After quantization, the video encoder 200 can scan the transform coefficients to generate a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scan can be designed to place higher-energy (and therefore lower-frequency) transform coefficients before the vector and lower-energy (and therefore higher-frequency) transform coefficients after the vector. In some examples, the video encoder 200 can utilize a predefined scan order to scan the quantized transform coefficients to produce a serialized vector, and then entropy-encode the quantized transform coefficients of the vector. In other examples, the video encoder 200 can perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, the video encoder 200 can entropy-encode the one-dimensional vector, for example, according to Context Adaptive Binary Arithmetic Coding (CABAC). The video encoder 200 can also entropy-encode the values of syntax elements used to describe metadata associated with the encoded video data for use by the video decoder 300 when decoding the video data.
[0058] To perform CABAC, the video encoder 200 can assign context from within a context model to the symbols to be transmitted. Context may involve, for example, whether the neighboring values of a symbol are zero. Probability determination can be based on the context assigned to the symbol.
[0059] The video encoder 200 can also generate, for example, syntax data (such as block-based syntax data, image-based syntax data, and sequence-based syntax data) or other syntax data (such as sequence parameter sets (SPS), image parameter sets (PPS), or video parameter sets (VPS)) destined for the video decoder 300 in image headers, block headers, and slice headers. Similarly, the video decoder 300 can decode such syntax data to determine how to decode the corresponding video data.
[0060] In this way, the video encoder 200 can generate a bitstream that includes encoded video data, such as syntax elements describing the segmentation of an image into blocks (e.g., CUs) and prediction and / or residual information for those blocks. Finally, the video decoder 300 can receive the bitstream and decode the encoded video data.
[0061] Typically, the video decoder 300 performs a reciprocal process to decode the encoded video data of the bitstream, similar to that performed by the video encoder 200. For example, the video decoder 300 may use CABAC to decode the values of syntax elements used for the bitstream in a manner substantially similar to, but reciprocal to, the CABAC encoding process of the video encoder 200. Syntax elements may define information used to segment images into CTUs, and to segment each CTU according to a corresponding segmentation structure (such as a QTBT structure) to define the CUs of the CTUs. Syntax elements may also define prediction and residual information for blocks (e.g., CUs) of the video data.
[0062] Residual information can be represented, for example, by quantized transform coefficients. The video decoder 300 can inversely quantize and inversely transform the quantized transform coefficients of the block to regenerate a residual block for that block. The video decoder 300 uses a signal-informed prediction mode (intra-frame prediction or inter-frame prediction) and associated prediction information (e.g., motion information for inter-frame prediction) to form a prediction block for that block. The video decoder 300 can then combine the prediction block and the residual block (on a sample-by-sample basis) to regenerate the original block. The video decoder 300 can perform additional processing, such as performing a deblocking process to reduce visual artifacts along the block boundaries.
[0063] According to the technology of this disclosure, a method includes: determining whether a transform skip mode is used for a current block of video data; disabling level mapping for residual decoding based on the transform skip mode being used for the current block; and decoding the current block without applying level mapping.
[0064] According to the technology of this disclosure, an apparatus includes: a memory configured to store video data; and one or more processors implemented in a circuit and coupled to the memory, the one or more processors being configured to: determine whether a transform skip mode is used for a current block of video data; disable level mapping for residual decoding based on the transform skip mode being used for the current block; and decode the current block without applying level mapping.
[0065] According to the technology of this disclosure, there are units for determining whether a transform skip mode is used for the current block of video data; units for disabling level mapping for residual decoding based on the transform skip mode being used for the current block; and units for decoding the current block without applying level mapping.
[0066] According to the technology of this disclosure, a non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more processors, cause one or more processors to: determine whether a transform skip mode is used for the current block of video data; disable level mapping for residual decoding based on the transform skip mode being used for the current block; and decode the current block without applying level mapping.
[0067] In summary, this disclosure may involve "signaling" certain information (such as syntax elements). The term "signaling" can generally refer to the transmission of values for syntax elements and / or other data used to decode encoded video data. That is, video encoder 200 can signal values for syntax elements in the bitstream. Generally, signaling refers to generating values in the bitstream. As described above, source device 102 can transmit the bitstream to destination device 116 substantially in real time or not in real time (such as when syntax elements are stored in storage device 112 for later retrieval by destination device 116).
[0068] Figure 2A and 2BThis is a conceptual diagram illustrating an example Quadtree Binary Tree (QTBT) structure 130 and its corresponding Decoding Tree Unit (CTU) 132. Solid lines represent quadtree splits, and dashed lines indicate binary tree splits. In each split (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which split type (i.e., horizontal or vertical) is used, where, in this example, 0 indicates a horizontal split and 1 indicates a vertical split. For quadtree splits, since the quadtree node splits the block horizontally and vertically into four sub-blocks of equal size, there is no need to indicate the split type. Accordingly, the video encoder 200 can encode the following, and the video decoder 300 can decode the following: syntax elements (such as split information) for the region tree level (i.e., solid lines) of the QTBT structure 130, and syntax elements (such as split information) for the prediction tree level (i.e., dashed lines) of the QTBT structure 130. The video encoder 200 can encode video data (such as prediction and transform data) for a CU represented by the terminal leaf nodes of the QTBT structure 130, and the video decoder 300 can decode the video data.
[0069] generally, Figure 2B The CTU 132 can be associated with parameters that define the size of the blocks corresponding to the nodes at the first and second levels of the QTBT structure 130. These parameters can include the CTU size (representing the size of the CTU 132 in the sample), the minimum quadtree size (MinQTSize, which represents the minimum allowed quadtree leaf node size), the maximum binary tree size (MaxBTSize, which represents the maximum allowed binary tree root node size), the maximum binary tree depth (MaxBTDepth, which represents the maximum allowed binary tree depth), and the minimum binary tree size (MinBTSize, which represents the minimum allowed binary tree leaf node size).
[0070] The root node corresponding to the CTU in a QTBT structure can have four child nodes at the first level of the QTBT structure, where each child node can be partitioned according to a quadtree. That is, the node at the first level is a leaf node (with no child nodes) or has four child nodes. An example of QTBT structure 130 represents such a node as including a parent node and child nodes with solid-line branches. If the node at the first level is not larger than the maximum allowed binary tree root node size (MaxBTSize), the node can be further partitioned by the corresponding binary tree. The binary tree split of a node can be iterated until the node resulting from the split reaches the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). An example of QTBT structure 130 represents such a node as having dashed-line branches. The binary tree leaf nodes are called decoding units (CUs), which are used for prediction (e.g., intra-image or inter-image prediction) and transformation without any further partitioning. As discussed above, CUs can also be referred to as “video chunks” or “blocks”.
[0071] In one example of a QTBT segmentation structure, the CTU size is set to 128x128 (luminance samples and two corresponding 64x64 chrominance samples), MinQTSize is set to 16x16, MaxBTSize is set to 64x64, MinBTSize (for both width and height) is set to 4, and MaxBTDepth is set to 4. First, a quadtree segmentation is applied to the CTU to generate quadtree leaf nodes. Quadtree leaf nodes can have sizes ranging from 16x16 (i.e., MinQTSize) to 128x128 (i.e., the CTU size). If a leaf quadtree node is 128x128, it will not be further split by the binary tree because this size exceeds MaxBTSize (i.e., 64x64 in this example). Otherwise, the leaf quadtree node will be further split by the binary tree. Therefore, the quadtree leaf node also serves as the root node of the binary tree, with a binary tree depth of 0. When the depth of the binary tree reaches MaxBTDepth (4 in this example), further splitting is not allowed. Similarly, when a binary tree node has a width equal to MinBTSize (4 in this example), further horizontal splitting is not allowed. Likewise, a binary tree node with a height equal to MinBTSize means that further vertical splitting is not allowed for that node. As mentioned above, the leaf nodes of the binary tree are referred to as CUs and are further processed according to predictions and transformations without further splitting.
[0072] Figure 3 This is a block diagram illustrating an example video encoder 200 that can perform the techniques described in this disclosure. Figure 3This disclosure is provided for illustrative purposes and should not be construed as limiting the scope of the techniques extensively illustrated and described herein. For illustrative purposes, this disclosure describes the video encoder 200 based on VVC (ITU-T H.266, under development) and HEVC (ITU-TH.265) technologies. However, the technologies of this disclosure can be implemented by video encoding devices configured for other video decoding standards.
[0073] exist Figure 3 In the example, the video encoder 200 includes a video data memory 230, a mode selection unit 202, a residual generation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a filter unit 216, a decoded picture buffer (DPB) 218, and an entropy coding unit 220. Any or all of the video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, DPB 218, and entropy coding unit 220 can be implemented in one or more processors or in processing circuitry. For example, the units of the video encoder 200 can be implemented as one or more circuit or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Furthermore, the video encoder 200 may include additional or alternative processors or processing circuitry to perform these and other functions.
[0074] The video data storage device 230 can store video data to be encoded by the components of the video encoder 200. The video encoder 200 can obtain data from, for example, a video source 104 (…). Figure 1 The video data memory 230 receives video data stored in the video data memory 230. The DPB 218 can act as a reference picture memory, storing reference video data for use when the video encoder 200 predicts subsequent video data. The video data memory 230 and DPB 218 can be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. The video data memory 230 and DPB 218 can be provided by the same memory device or separate memory devices. In various examples, the video data memory 230 can be on-chip (as shown) with other components of the video encoder 200, or off-chip relative to those components.
[0075] In this disclosure, reference to video data memory 230 should not be construed as limited to memory inside video encoder 200 (unless specifically described therein) or memory outside video encoder 200 (unless specifically described therein). Rather, reference to video data memory 230 should be understood as a reference memory that stores video data received by video encoder 200 for encoding (e.g., video data for the current block to be encoded). Figure 1 The memory 106 can also provide temporary storage for the outputs from the various units of the video encoder 200.
[0076] It shows Figure 3 The various units within this unit help to understand the operations performed by the video encoder 200. This unit can be implemented as a fixed-function circuit, a programmable circuit, or a combination thereof. A fixed-function circuit refers to a circuit that provides a specific function and is pre-configured regarding the operations that can be performed. A programmable circuit refers to a circuit that can be programmed to perform various tasks and provides flexible functionality in terms of the operations that can be performed. For example, a programmable circuit can execute software or firmware that causes the programmable circuit to operate in a manner defined by the instructions of the software or firmware. A fixed-function circuit can execute software instructions (e.g., to receive or output parameters), but the type of operation performed by a fixed-function circuit is typically immutable. In some examples, one or more units within this unit can be different circuit blocks (fixed-function or programmable), and in some examples, one or more units within this unit can be integrated circuits.
[0077] The video encoder 200 may include an arithmetic logic unit (ALU), an essential function unit (EFU), digital circuitry, analog circuitry, and / or a programmable core, formed according to programmable circuitry. In an example where software executed by programmable circuitry is used to perform the operation of the video encoder 200, memory 106 ( Figure 1 The video encoder 200 may store instructions (e.g., object code) of the software received and executed by the video encoder 200, or another memory (not shown) within the video encoder 200 may store such instructions.
[0078] The video data storage unit 230 is configured to store received video data. The video encoder 200 can retrieve images of the video data from the video data storage unit 230 and provide the video data to the residual generation unit 204 and the mode selection unit 202. The video data in the video data storage unit 230 can be the raw video data to be encoded.
[0079] The mode selection unit 202 includes a motion estimation unit 222, a motion compensation unit 224, and an intra-frame prediction unit 226. The mode selection unit 202 may include additional functional units that perform video prediction based on other prediction modes. As an example, the mode selection unit 202 may include a palette unit, an intra-block copy unit (which may be part of the motion estimation unit 222 and / or the motion compensation unit 224), an affine unit, a linear model (LM) unit, etc.
[0080] Mode selection unit 202 typically coordinates multiple coding passes to test combinations of coding parameters and the resulting rate-distortion values for such combinations. Coding parameters may include segmenting the CTU into CUs, the prediction mode for the CUs, the transform type for the residual data of the CUs, and the quantization parameters for the residual data of the CUs. Mode selection unit 202 can ultimately select a combination of coding parameters that yields a better rate-distortion value than other tested combinations.
[0081] The video encoder 200 can segment images retrieved from the video data storage 230 into a series of CTUs, and encapsulate one or more CTUs within slices. The mode selection unit 202 can segment the CTUs of the image according to a tree structure (such as the QTBT structure or quadtree structure of HEVC described above). As mentioned above, the video encoder 200 can form one or more CUs by segmenting CTUs according to a tree structure. Such CUs can also be referred to as "video blocks" or "blocks".
[0082] Typically, mode selection unit 202 also controls its components (e.g., motion estimation unit 222, motion compensation unit 224, and intra-prediction unit 226) to generate predicted blocks for the current block (e.g., the current CU, or the overlapping portion of PU and TU in HEVC). To perform inter-frame prediction for the current block, motion estimation unit 222 may perform a motion search to identify one or more closely matching reference blocks in one or more reference pictures (e.g., one or more previously decoded pictures stored in DPB 218). Specifically, motion estimation unit 222 may calculate values representing the similarity between a potential reference block and the current block, for example, based on sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared error (MSD), etc. Motion estimation unit 222 may typically perform these calculations using sample-by-sample differences between the current block and the considered reference blocks. Motion estimation unit 222 may identify the reference block with the lowest value obtained from these calculations, indicating the reference block that most closely matches the current block.
[0083] Motion estimation unit 222 can generate one or more motion vectors (MVs) that define the position of a reference block in a reference image relative to the position of the current block in the current image. Motion estimation unit 222 can then provide the motion vectors to motion compensation unit 224. For example, for unidirectional inter-frame prediction, motion estimation unit 222 can provide a single motion vector, while for bidirectional inter-frame prediction, motion estimation unit 222 can provide two motion vectors. Motion compensation unit 224 can then use the motion vectors to generate prediction blocks. For example, motion compensation unit 224 can use the motion vectors to retrieve data from the reference blocks. As another example, if the motion vectors have fractional-sample precision, motion compensation unit 224 can interpolate the values used for the prediction blocks according to one or more interpolation filters. Furthermore, for bidirectional inter-frame prediction, motion compensation unit 224 can retrieve data for two reference blocks identified by corresponding motion vectors and combine the retrieved data, for example, by per-sample averaging or weighted averaging.
[0084] As another example, for intra-prediction or intra-prediction decoding, intra-prediction unit 226 can generate a prediction block based on samples adjacent to the current block. For example, in directional mode, intra-prediction unit 226 can typically mathematically combine the values of adjacent samples and fill these calculated values across the current block in a defined direction to generate a prediction block. As another example, in DC mode, intra-prediction unit 226 can calculate the average of the adjacent samples of the current block and generate a prediction block to include the obtained average for each sample of the prediction block.
[0085] Mode selection unit 202 provides a prediction block to residual generation unit 204. Residual generation unit 204 receives the original, unencoded version of the current block from video data memory 230 and the prediction block from mode selection unit 202. Residual generation unit 204 calculates the sample-by-sample difference between the current block and the prediction block. The resulting sample-by-sample difference defines the residual block for the current block. In some examples, residual generation unit 204 may also determine the difference between sample values in the residual block to generate the residual block using residual differential pulse code modulation (RDPCM). In some examples, one or more subtractor circuits performing binary subtraction may be used to form residual generation unit 204.
[0086] In the example where mode selection unit 202 divides a CU into PUs, each PU can be associated with a luma prediction unit and a corresponding chroma prediction unit. Video encoder 200 and video decoder 300 can support PUs of various sizes. As noted above, the size of a CU can refer to the size of the luma decoding block of the CU, and the size of a PU can refer to the size of the luma prediction unit of the PU. Assuming a particular CU size is 2Nx2N, video encoder 200 can support PU sizes of 2Nx2N or NxN for intra-frame prediction, and 2Nx2N, 2NxN, Nx2N, NxN, or similar symmetrical PU sizes for inter-frame prediction. Video encoder 200 and video decoder 300 can also support asymmetric segmentation for PU sizes of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter-frame prediction.
[0087] In an example where the mode selection unit 202 does not further divide the CU into PUs, each CU can be associated with a luminance decoding block and a corresponding chrominance decoding block. As mentioned above, the size of the CU can refer to the size of the luminance decoding block of the CU. The video encoder 200 and the video decoder 300 can support CU sizes of 2Nx2N, 2NxN, or Nx2N.
[0088] For other video decoding techniques (such as block-based copy mode decoding, affine mode decoding, and linear model (LM) mode decoding, the mode selection unit 202 generates a prediction block for the current block being encoded via a corresponding unit associated with the decoding technique. In some examples, such as when encoding a block using a transform skip mode, the mode selection unit 202 may disable level mapping, which will be discussed in more detail below. In some examples (such as palette mode decoding), the mode selection unit 202 may not generate a prediction block, but instead generate syntax elements indicating how the block should be reconstructed based on the selected palette. In such a mode, the mode selection unit 202 may provide these syntax elements to the entropy coding unit 220 for encoding.
[0089] As described above, the residual generation unit 204 receives video data for the current block and the corresponding prediction block. Then, the residual generation unit 204 generates a residual block for the current block. To generate the residual block, the residual generation unit 204 calculates the sample-by-sample difference between the prediction block and the current block.
[0090] Transform processing unit 206 applies one or more transforms to the residual block to generate a block of transform coefficients (referred to herein as a "transform coefficient block"). Transform processing unit 206 can apply various transforms to the residual block to form the transform coefficient block. For example, transform processing unit 206 can apply a Discrete Cosine Transform (DCT), a Direction Transform, a Karhunen-Loeve Transform (KLT), or a conceptually similar transform to the residual block. In some examples, transform processing unit 206 can perform multiple transforms on the residual block, such as primary and secondary transforms (e.g., rotation transforms). In some examples, transform processing unit 206 does not apply transforms to the residual block (or skips applying transforms), such as when decoding the block using a transform skip mode. This skipped application of transforms is indicated by dashed line 207.
[0091] Quantization unit 208 can quantize the transform coefficients in the transform coefficient block to produce a quantized transform coefficient block. Quantization unit 208 can quantize the transform coefficients of the transform coefficient block based on the quantization parameter (QP) value associated with the current block. Video encoder 200 (e.g., via mode selection unit 202) can adjust the degree of quantization applied to the transform coefficient block associated with the current block by adjusting the QP value associated with the CU. Quantization may cause information loss, and therefore, the quantized transform coefficients may have lower accuracy compared to the original transform coefficients produced by transform processing unit 206.
[0092] The inverse quantization unit 210 and the inverse transform processing unit 212 can apply inverse quantization and inverse transform, respectively, to the quantized transform coefficient block to reconstruct the residual block based on the transform coefficient block. In the case of a block using transform skip mode coding, the inverse transform processing unit 212 can skip performing the inverse transform on that block. This skipping of the inverse transform on the block is indicated by dashed line 211. The reconstruction unit 214 can generate a reconstructed block corresponding to the current block based on the reconstructed residual block and the prediction block generated by the mode selection unit 202 (although potentially with some degree of distortion). For example, the reconstruction unit 214 can add samples from the reconstructed residual block to corresponding samples from the prediction block generated by the mode selection unit 202 to generate the reconstructed block.
[0093] Filter unit 216 can perform one or more filtering operations on the reconstructed block. For example, filter unit 216 can perform deblocking to reduce block artifacts along the edges of the CU. In some examples, the operations of filter unit 216 can be skipped.
[0094] The video encoder 200 stores the reconstructed blocks in the DPB 218. For example, in an example where the operation of the filter unit 216 is not required, the reconstruction unit 214 can store the reconstructed blocks in the DPB 218. In an example where the operation of the filter unit 216 is required, the filter unit 216 can store the filtered reconstructed blocks in the DPB 218. The motion estimation unit 222 and the motion compensation unit 224 can retrieve a reference picture formed from the reconstructed (and potentially filtered) blocks from the DPB 218 to perform inter-frame prediction of blocks in subsequently encoded pictures. Additionally, the intra-frame prediction unit 226 can use the reconstructed blocks of the current picture in the DPB 218 to perform intra-frame prediction of other blocks in the current picture.
[0095] Typically, entropy coding unit 220 can entropy-encode syntax elements received from other functional components of video encoder 200. For example, entropy coding unit 220 can entropy-encode quantized transform coefficient blocks from quantization unit 208. As another example, entropy coding unit 220 can entropy-encode predictive syntax elements (e.g., motion information for inter-frame prediction or intra-frame mode information for intra-frame prediction) from mode selection unit 202. Entropy coding unit 220 can perform one or more entropy coding operations on syntax elements, another example of video data, to generate entropy-encoded data. For example, entropy coding unit 220 can perform context-adaptive variable-length coding (CAVLC), CABAC, variable-to-variable (V2V) length decoding, syntax-based context-adaptive binary arithmetic decoding (SBAC), probabilistic interval partitioned entropy (PIPE) decoding, exponential Golomb coding, or another type of entropy coding operation on the data. In some examples, entropy coding unit 220 can operate in a bypass mode where syntax elements are not entropy-encoded.
[0096] For example, entropy coding unit 220 can determine whether a transform skip mode is used for the current block of video data. Based on the premise that a transform skip mode is used for the current block, entropy coding unit 220 can disable level mapping for residual decoding and encode the current block without applying level mapping.
[0097] In some examples, entropy coding unit 220 may encode the following in the first path: a flag indicating whether the transform coefficients of the current block are non-zero, two flags indicating whether the absolute value of the transform coefficients is greater than j<<1)+1, and a flag indicating the parity of the transform coefficients. Entropy coding unit 220 may encode the following in the second path: a flag indicating the sign of the transform coefficients, and three flags indicating whether the absolute value of the transform coefficients is greater than j<<1)+1. Entropy coding unit 220 may encode a flag indicating the residual absolute value of the transform coefficients in the third path.
[0098] In some examples, entropy coding unit 220 can determine neighboring coefficient values adjacent to the current coefficient value of the current block. Entropy coding unit 220 can determine the Rice parameter based on the neighboring coefficient values, and also decode the current block based on the Rice parameter.
[0099] In some examples, entropy coding unit 220 can determine information associated with neighboring coefficients adjacent to the current coefficient of the current block. Based on the information associated with neighboring coefficients, entropy coding unit 220 can determine the context of the current coefficient, and also encode the current block based on the context.
[0100] The video encoder 200 can output a bitstream that includes entropy-encoded syntax elements required for reconstructing slices or blocks of images. Specifically, the entropy coding unit 220 can output a bitstream.
[0101] The above operations pertain to block descriptions. Such descriptions should be understood as operations applied to luma decoding blocks and / or chroma decoding blocks. As mentioned above, in some examples, the luma decoding block and chroma decoding block are the luma and chroma components of the CU. In some examples, the luma decoding block and chroma decoding block are the luma and chroma components of the PU.
[0102] In some examples, it is not necessary to repeat the operations performed for the luma decoding block for the chroma decoding block. As an example, it is not necessary to repeat the operations used to identify the motion vector (MV) and reference image for the luma decoding block to identify the MV and reference image for the chroma block. Instead, the MV for the luma decoding block can be scaled to determine the MV for the chroma block, and the reference image can be the same. As another example, the intra-frame prediction process can be the same for both the luma and chroma decoding blocks.
[0103] Video encoder 200 represents an example of a device configured to encode video data, the device including: a memory configured to store the video data; and one or more processing units implemented in circuitry coupled to the memory and configured to: determine whether a transform skip mode is used for the current block of video data; disable level mapping for residual decoding based on the transform skip mode being used for the current block; and encode the current block without applying level mapping.
[0104] Figure 4 This is a block diagram illustrating an example video decoder 300 capable of performing the techniques described herein. Figure 4 This disclosure is provided for illustrative purposes and does not limit the scope of the technologies extensively illustrated and described herein. For illustrative purposes, this disclosure describes a video decoder 300 based on VVC (ITU-T H.266, under development) and HEVC (ITU-T H.265) technologies. However, the technologies of this disclosure can be implemented by video decoding devices configured for other video decoding standards.
[0105] exist Figure 4 In the example, the video decoder 300 includes a decoded picture buffer (CPB) memory 320, an entropy decoding unit 302, a prediction processing unit 304, an inverse quantization unit 306, an inverse transform processing unit 308, a reconstruction unit 310, a filter unit 312, and a decoded picture buffer (DPB) 134. Any or all of the CPB memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, filter unit 312, and DPB 134 can be implemented in one or more processors or in processing circuitry. For example, units of the video decoder 300 can be implemented as one or more circuit or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Furthermore, the video decoder 300 may include additional or alternative processors or processing circuitry to perform these and other functions.
[0106] The prediction processing unit 304 includes a motion compensation unit 316 and an intra-frame prediction unit 318. The prediction processing unit 304 may include additional units that perform predictions based on other prediction modes. As an example, the prediction processing unit 304 may include a palette unit, an intra-block copy unit (which may form part of the motion compensation unit 316), an affine unit, a linear model (LM) unit, etc. In other examples, the video decoder 300 may include more, fewer, or different functional components.
[0107] CPB memory 320 can store video data to be decoded by components of video decoder 300, such as encoded video bitstreams. For example, it can be stored from computer-readable medium 110 ( Figure 1 The video data stored in the CPB memory 320 is obtained. The CPB memory 320 may include a CPB that stores encoded video data (e.g., syntax elements) from the encoded video bitstream. Additionally, the CPB memory 320 may store video data other than the syntax elements of the decoded picture, such as temporary data representing the output from various units of the video decoder 300. The DPB 314 typically stores decoded pictures, which the video decoder 300 may output and / or use as reference video data when decoding subsequent data or pictures of the encoded video bitstream. The CPB memory 320 and DPB 314 may be formed from any of a variety of memory devices, such as DRAM, including SDRAM, MRAM, RRAM, or other types of memory devices. The CPB memory 320 and DPB 314 may be provided by the same memory device or separate memory devices. In various examples, the CPB memory 320 may be on-chip with other components of the video decoder 300, or off-chip relative to those components.
[0108] Alternatively or concurrently, in some examples, the video decoder 300 can be derived from the memory 120 ( Figure 1 The decoded video data is retrieved. That is, memory 120 can utilize CPB memory 320 to store data, as discussed above. Similarly, when some or all of the functions of video decoder 300 are implemented in software to be executed by the processing circuitry of video decoder 300, memory 120 can store instructions to be executed by video decoder 300.
[0109] It shows Figure 4 The various units shown below help to understand the operations performed by the video decoder 300. This unit can be implemented as a fixed-function circuit, a programmable circuit, or a combination thereof. Similar to... Figure 3Fixed-function circuits refer to circuits that provide a specific function and are pre-configured regarding the operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks and provide flexible functionality in terms of the operations that can be performed. For example, a programmable circuit can execute software or firmware that causes the programmable circuit to operate in a manner defined by the instructions of the software or firmware. Fixed-function circuits can execute software instructions (e.g., to receive or output parameters), but the type of operation performed by a fixed-function circuit is typically immutable. In some examples, one or more units in this unit may be different circuit blocks (fixed-function or programmable), and in some examples, one or more units in this unit may be integrated circuits.
[0110] The video decoder 300 may include an ALU, EFU, digital circuitry, analog circuitry, and / or a programmable core formed according to programmable circuitry. In an example where the operation of the video decoder 300 is performed by software executing on the programmable circuitry, on-chip or off-chip memory may store instructions (e.g., object code) of the software received and executed by the video decoder 300.
[0111] Entropy decoding unit 302 can receive encoded video data from the CPB and perform entropy decoding on the video data to regenerate syntax elements. Prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, and filter unit 312 can generate decoded video data based on syntax elements extracted from the bitstream.
[0112] Typically, the video decoder 300 reconstructs the image on a block-by-block basis. The video decoder 300 can perform the reconstruction operation on each block individually (where the block currently being reconstructed (i.e., decoded) can be referred to as the "current block").
[0113] The entropy decoding unit 302 can perform entropy decoding on the syntax elements of the quantized transform coefficients that define the quantized transform coefficient block, as well as transform information such as quantization parameters (QP) and / or transform mode indications. In some examples, the entropy decoding unit 302 can determine whether a transform skip mode is used for the current block of video data. Based on the assumption that a transform skip mode is used for the current block, the entropy decoding unit 302 can disable level mapping for residual decoding, and the video decoder 300 can decode the current block without applying level mapping.
[0114] In some examples, the entropy decoding unit 302 can decode the following in the first path: a flag indicating whether the transform coefficients of the current block are non-zero, two flags indicating whether the absolute value of the transform coefficients is greater than j<<1)+1, and a flag indicating the parity of the transform coefficients. The entropy decoding unit 302 can decode the following in the second path: a flag indicating the sign of the transform coefficients, and three flags indicating whether the absolute value of the transform coefficients is greater than j<<1)+1. The entropy decoding unit 302 can decode the flag indicating the residual absolute value of the transform coefficients in the third path.
[0115] In some examples, the entropy decoding unit 302 can determine the neighboring coefficient values adjacent to the current coefficient value of the current block. The entropy decoding unit 302 can determine the Rice parameter based on the neighboring coefficient values, and also decode the current block based on the Rice parameter.
[0116] In some examples, the entropy decoding unit 302 can determine information associated with neighboring coefficients adjacent to the current coefficient of the current block. Based on the information associated with neighboring coefficients, the entropy decoding unit 302 can determine the context of the current coefficient, and the video decoder 300 can also decode the current block based on the context.
[0117] The inverse quantization unit 306 can use the QP associated with the quantized transform coefficient block to determine the quantization level, and similarly, determine the inverse quantization level to be applied by the inverse quantization unit 306. The inverse quantization unit 306 can, for example, perform a bit-left shift operation to inverse quantize the quantized transform coefficients. The inverse quantization unit 306 can thus form a transform coefficient block including the transform coefficients.
[0118] After the inverse quantization unit 306 forms the transform coefficient block, the inverse transform processing unit 308 can apply one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block. For example, the inverse transform processing unit 308 can apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotation transform, an inverse direction transform, or another inverse transform to the transform coefficient block. In some examples, when using a transform skip mode to decode the block, the inverse transform processing unit 308 may not apply (or skip) the inverse transform. This skipped inverse transform is indicated by the dashed line 307.
[0119] Furthermore, the prediction processing unit 304 generates a prediction block based on the prediction information syntax elements entropy-decoded by the entropy decoding unit 302. For example, if the prediction information syntax elements indicate that the current block is inter-frame predicted, the motion compensation unit 316 can generate the prediction block. In this case, the prediction information syntax elements may indicate the reference picture from which the reference block is to be retrieved in the DPB 314, and a motion vector identifying the position of the reference block in the reference picture relative to the position of the current block in the current picture. The motion compensation unit 316 can typically be configured with respect to the motion compensation unit 224 ( Figure 3 The inter-frame prediction process is performed in a manner substantially similar to that described above. In some examples, such as when using a transform skip mode to decode a block, prediction processing unit 304 may disable level mapping.
[0120] As another example, if the prediction information syntax element indicates that the current block is intra-predicted, then intra-prediction unit 318 can generate a prediction block according to the intra-prediction mode indicated by the prediction information syntax element. Again, intra-prediction unit 318 can typically be configured with respect to intra-prediction unit 226 ( Figure 3 The intra-prediction process is performed in a manner substantially similar to that described above. The intra-prediction unit 318 can retrieve data from neighboring samples of the current block from the DPB 314.
[0121] Reconstruction unit 310 can reconstruct the current block using the prediction block and the residual block. For example, reconstruction unit 310 can reconstruct the current block by adding the samples of the residual block to the corresponding samples of the prediction block.
[0122] Filter unit 312 can perform one or more filter operations on the reconstructed block. For example, filter unit 312 can perform a deblocking operation to reduce block artifacts along the edges of the reconstructed block. It is not necessary to perform the operations of filter unit 312 in all examples.
[0123] The video decoder 300 can store the reconstructed blocks in the DPB 314. For example, in an example where the operation of the filter unit 312 is not performed, the reconstruction unit 310 can store the reconstructed blocks in the DPB 314. In an example where the operation of the filter unit 312 is performed, the filter unit 312 can store the filtered reconstructed blocks in the DPB 314. As discussed above, the DPB 314 can provide reference information (such as samples of the current image for intra-frame prediction and samples of previously decoded images for subsequent motion compensation) to the prediction processing unit 304. Furthermore, the video decoder 300 can output decoded images (e.g., decoded video) from the DPB 314 for use in applications such as... Figure 1 The subsequent presentation on the display device 118.
[0124] In this manner, video decoder 300 represents an example of a video decoding device, which includes: a memory configured to store video data; and one or more processors implemented in circuitry and coupled to the memory, the one or more processors being configured to: determine whether a transform skip mode is used for the current block of video data; disable level mapping for residual decoding based on the transform skip mode being used for the current block; and decode the current block without applying level mapping.
[0125] Video decoding can be lossy or lossless. Compared to lossless video decoding, lossy video decoding may result in a less accurate reproduction of the original video stream after decoding. However, lossy video decoding can be more bandwidth-efficient than lossless video decoding. For example, lossy video decoding can be expected in video streaming applications, while lossless video decoding can be expected in medical applications where a very accurate reproduction of the original video stream is desired. In some examples, where regions of interest may exist in the video stream, both lossy and lossless techniques can be used. For instance, lossless techniques can be used to decode the region of interest, while lossy techniques can be used to decode the rest of the video stream. In this way, the region of interest can be decoded very accurately without consuming the full bandwidth (which would be used if lossless techniques were used to encode the entire video stream).
[0126] As mentioned above, in some video decoding standards, one residual decoding technique is used for lossy decoding, and another residual decoding technique is used for lossless decoding. Accordingly, if a video encoder decides to use a combination of lossy and lossless decoding techniques, the video encoder may not be able to adapt to both techniques, or the video encoder's performance may be reduced because this would require the video encoder to use two different techniques.
[0127] According to the techniques disclosed herein, the transform skip residual decoding technique can be the same for both lossy and lossless decoding. These techniques can improve decoder performance (e.g., reduce processing power consumption) and / or reduce decoding latency.
[0128] In VVC draft 7, the residual block of a block decoded via transform skip mode can be split into multiple coefficient groups (CGs). The coefficients in the transform skip mode can be in the spatial domain, not the frequency domain, if a transform is to be applied; in the frequency domain, the coefficients will be in the frequency domain. For each CG, the video decoder 300 parses (or, in some cases, infers) the flag coded_sub_block_flag (also known as the CG flag), and if the CG flag is 0, then all coefficients within the CG are 0. Otherwise, the video decoder 300 further decodes the values of the coefficients within the CG.
[0129] Figure 5 This is a conceptual diagram illustrating the interleaving method for decoding CG flags and coefficients in VVC Draft 7. In VVC Draft 7, CG flags and coefficients are decoded using an interleaving method, such as... Figure 5 The bitstream 400 is shown below. For example, the first CG flag of the first CG is shown as 1, indicating the presence of non-zero coefficients in the first CG. The first CG flag is followed by the CG coefficients of the first CG. The CG flag of the second CG is shown as 0. Because the second CG flag is 0, all coefficients in the second CG are 0, and there is no need to signal or parse the individual coefficients of the second CG. Therefore, the coefficients of the second CG are not included in the bitstream 400. The CG flag of the third CG is shown as 1, followed by the coefficients of the third CG.
[0130] The scanning order of coefficients within the CG of the transform skip pattern block can be from top left to bottom right. Accordingly, when the video decoder 300 decodes the syntax element of a specific coefficient, the same syntax elements in the left, top, and top-left neighbors of that specific coefficient have already been decoded.
[0131] The order of three-path residual decoding for the transform skip mode in VVC Draft 7 is now discussed. In VVC Draft 7, up to nine syntax elements can be decoded for each coefficient: sig_coeff_flag (which specifies whether the transform coefficient is non-zero), coeff_sign_flag (which specifies the sign of the transform coefficient level), abs_level_gt1_flag (which specifies whether the absolute value of the transform coefficient is greater than (j<<1)+1), par_level_flag (which specifies the parity of the transform coefficient), abs_level_gtX_flag (X = 2, 3, 4, 5) (which specifies whether the absolute value of the transform coefficient is greater than (j<<1)+1), and abs_remainder (which specifies the residual absolute value of the transform coefficient decoded using Golomb-Rice codes).
[0132] For example, video decoder 300 can decode coefficients as follows:
[0133] absCoeffLevel=sig_coeff_flag+abs_level_gt1_flag+par_level_flag+2*(abs_level_gt2_flag+abs_level_gt3_flag+…+abs_level_gt5_flag)+abs_remainder
[0134] CoeffLevel=(coeff_sign_flag==1?-1:1)*absCoeffLevel
[0135] Where absCoeffLevel is the absolute value of the coefficient level, and CoeffLevel is the coefficient level. If a syntax element is not present in the bitstream, the video decoder 300 can infer that element as 0.
[0136] The video decoder 300 can split the decoding of nine possible syntax elements into three paths. For example, instead of decoding all nine syntax elements of a given coefficient, the video decoder 300 can decode some syntax elements of multiple coefficients before decoding the other syntax elements of the first coefficient.
[0137] The first path: sig_coeff_flag, coeff_sign_flag, abs_level_gt1_flag, and par_level_flag are decoded.
[0138] The second path: abs_level_gtX_flag (X = 2, 3, 4, 5) is decoded.
[0139] The third path: abs_remainder is decoded.
[0140] In the first and second paths, the video decoder 300 can use a total context decoding bin to decode syntax elements, and in the third path, the video decoder 300 can use a bypass bin (e.g., an equal-probability bin encoded by a CABAC engine that does not involve context) to decode syntax elements. For each TU, there is a limit to the total context decoding bin that can be used. In VVC Draft 7, this limit is set to 1.75 * TUSize (e.g., TU region), where TUSize is the size of the TU in the sample. As used herein, “context decoding” means context-based entropy decoding using a context model or probabilistic model, and “context” means a context model or probabilistic model.
[0141] For the first or second path of each coefficient, if there are fewer than four remaining context decoding bins for the TU, the video decoder 300 may exceed the context decoding bin limit for that particular path. To avoid exceeding the context decoding bin limit, after the number of remaining context decoding bins is less than four, the video decoder 300 may skip all subsequent first and second path decoding and may adjust the corresponding abs_remainders to include missing syntax flags. More specifically, for each coefficient:
[0142] If both the first and second path decoding are performed, the video decoder 300 can adjust the value of abs_remainder (if it exists) to absCoeffLevel–10;
[0143] If the first path decoding is performed but the second path decoding is skipped, the video decoder 300 can adjust the value of abs_remainder (if it exists) to absCoeffLevel–2;
[0144] If both the first and second path decoding are skipped, the video decoder 300 can adjust the value of abs_remainder (if it exists) to absCoeffLevel–0.
[0145] The number subtracted from absCoeffLevel to obtain abs_remainder is called the "base level" of abs_remainder. Basically, the base level is the value that has been decoded in the first and second paths of coefficient decoding.
[0146] Now let's discuss the Rice parameter. As mentioned in the previous section, abs_remainder is decoded via Rice-Golomb decoding (or Golomb-Rice), and one parameter used in the Rice-Golomb decoding process is the "Rice parameter".
[0147] The Rice parameter derivation for decoding the coefficient-level bypass decoding portion of transform coefficient decoding and transform skip residual decoding should be designed to address the different local statistics encountered in video decoding. Larger Rice parameter values are needed for efficient representation when coefficient residuals tend to be large. Smaller Rice parameter values are preferred when coefficient residuals tend to be small.
[0148] The derivation of the Rice parameter used for changing the skip mode is now discussed. In VVC Draft 7, the Rice parameter used for changing the skip mode is always set to 1. In VVC Draft 6 (see B. Bross et al., “Versatile Video Coding (Draft 6)”, ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29 / WG 11 Joint Video Experts Group (JVET), 15th Meeting: Gothenburg, Sweden, July 3-12, 2019, JVET-O2001-v14 (hereinafter referred to as VVC Draft 6), the Rice parameter used for changing the skip mode is derived as follows:
[0149] Two neighboring coefficients are used to derive the Rice parameter, and since the coefficient scan is forward (from top left to bottom right), the template uses the adjacent coefficients on the left and top to derive the locSumAbs value.
[0150] The locSumAbs for the coefficient at position (x, y) is as follows:
[0151] locSumAbs=abs(coeff(x-1,y))+abs(coeff(x,y-1))
[0152] If no neighbor coefficient exists, the video decoder 300 can infer that the value of the neighbor coefficient is 0.
[0153] The value of locSumAbs will be clipped to min(locSumAbs, 31), which is used to derive the Rice parameter using the following formula:
[0154] riceParTable
[32] ={0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2};
[0155] We will now discuss the derivation of the Rice parameter for the conventional transform coefficients. Figure 6 This is a conceptual diagram showing a template for the adjacent coefficients used in the derivation of the Rice parameter. Figure 6 The current coefficient 406 and five gray-shaded adjacent coefficients are shown, their levels used for Rice parameter derivation. For example, the video decoder 300 can determine the locSumAbs for the coefficient at position (x, y) using the following formula: locSumAbs = abs(coeff(x+1,y)) + abs(coeff(x+2,y)) + abs(coeff(x,y+1)) + abs(coeff(x+1,y+1)) + abs(coeff(x,y+2))
[0156] If the coefficients (x, y) are outside of TU, the video decoder 300 can disregard these values in the locSumAbs calculation. The final locSumAbs can be clipped using locSumAbs = max(min(locSumAbs-5*baseLevel,31),0).
[0157] Where baseLevel is the base level represented by the context decoding part of the coefficient level. The final clipped locSumAbs value is used to perform a lookup from the table below to derive the Rice parameter.
[0158] riceParTable
[32] ={0,0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3};
[0159] We now discuss level-mapped residual decoding for transform skip modes. In VVC Draft 7, level mapping is performed on transform skip residuals. For each coefficient, a predictor is first computed based on adjacent coefficients (left and top). For example, video encoder 200 or video decoder 300 can compute a predictor for each coefficient. Video encoder 200 can adjust the absolute value of the current coefficient based on the predictor before encoding. This conversion from coefficient value to adjusted coefficient value is called "level mapping".
[0160] The level mapping is introduced based on several observations. First, decoding large numbers may require more bits compared to decoding smaller numbers. Second, if a neighbor has the value "a", the current absCoeff is more likely to have the same value "a" than it is to have another value. Based on these two observations, a predictor value "pred" can be found for each sample. The actual value "absCoeff" to be decoded is most likely the value "pred". Accordingly, the video encoder 200 can save some bits as follows: if absCoeff = pred, then decode 1. Otherwise, decode non-1 values to represent absCoeff, since 1 is reserved for the case where absCoeff = pred, so the video decoder can decode 1 + 1 = 2. If absCoeff is less than pred, then the video decoder can decode absCoeff + 1.
[0161] The video encoder 200 or video decoder 300 can perform the following operations as shown in the pseudocode.
[0162] Video Encoder 200:
[0163]
[0164] Video Decoder 300:
[0165]
[0166] Where X0 and X1 represent the absolute coefficient values located to the left and above the current coefficient, respectively. The value absCoeffMod represents the absolute coefficient value after level mapping. The value of absCoeff is the absolute value of the encoded / decoded coefficient. This disclosure discloses several techniques related to the residual decoding scheme in VVC Draft 7. These techniques can be used individually or in any combination.
[0167] Now let's discuss removing level mapping. According to this technique, the level mapping used for residual decoding in VVC Draft 7 can be disabled for transform skip mode. For example, video encoder 200 and video decoder 300 can perform no level mapping during residual decoding in transform skip mode. Disabling level mapping in transform skip mode can reduce processing power consumption and improve latency.
[0168] We will now discuss an alternative 3-path residual decoding technique. In this technique, the 3-path encoding and decoding for transform-skip residual decoding in VVC draft 7 are modified as follows:
[0169] In the first path: sig_coeff_flag, abs_level_gt1_flag, abs_level_gt2_flag, and par_level_flag are encoded.
[0170] In the second path: coeff_sign_flag and abs_level_gtX_flags (X = 3, 4, 5) are encoded. In the third path: abs_remainder is encoded.
[0171] Using the three-path encoding and decoding scheme described above, compared to the three-path scheme in VVC Draft 7, abs_level_gt2_flag is more likely to be decoded via the CABAC context (as opposed to bypass decoding). For example, when abs_level_gt2_flag is statistically more important than coeff_sign_flag, decoding of abs_level_gt2_flag via the CABAC context can be expected.
[0172] The technique for deriving the Rice parameter will now be discussed. As described in this disclosure, the following techniques can be used to derive the Rice parameter.
[0173] In one example, the video decoder 300 can use neighboring coefficient values to derive the Rice parameter. In another example, the video decoder 300 can use multiple available neighboring coefficients when deriving the Rice parameter. In some examples, available neighboring coefficients can be coefficients adjacent to the currently determined coefficient. For example, the left neighbor of a coefficient located on the left boundary of the TU can be considered unavailable, and the video decoder 300 may not use unavailable coefficients when deriving the Rice parameter. In some examples, the video decoder 300 can use base-level values when deriving the Rice parameter. Deriving Rice parameters as discussed herein can reduce processing power consumption and improve latency.
[0174] The following lists some additional examples of the above examples, and any of these examples can be used alone or in any combination.
[0175] Figure 7 This is a conceptual diagram illustrating an example local template with five adjacent coefficients. In some examples, the video decoder 300 can use a local template with five adjacent coefficients to derive the Rice parameter. For example... Figure 7 As shown, box 410 (black shading) indicates the current coefficient, and adjacent coefficients 411-415 (gray shading) indicate the positions of the five coefficients in the local template.
[0176] To derive the Rice parameter, the video decoder 300 can determine the value locSumAbs as follows:
[0177] locSumAbs=abs(coeff(x-1,y))+abs(coeff(x-2,y))+abs(coeff(x,y-1))+abs(coeff(x-1,y-1))+abs(coeff(x,y-2)),
[0178] Where abs(coeff(x,y)) represents the absolute value of the coefficient at (x,y). If coeff(x,y) does not exist, the video decoder 300 can infer that the value of coeff(x,y) is 0.
[0179] The Rice parameter (cRiceParam) of the video decoder 300 can be derived as follows:
[0180] cRiceParam=riceParTable[min(31,locSumAbs)]+(locSumAbs>128?1:0)
[0181] RiceParTable is defined as follows:
[0182] riceParTable={0,0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3};
[0183] In another example, after calculating the value of locSumAbs as described above, the video decoder 300 can derive the Rice parameter (cRiceParam) as follows:
[0184] cRiceParam=riceParTable[min(31,locSumAbs)]+(locSumAbs>256?2:localSumAbs>128?1:0)
[0185] RiceParTable is defined as described above.
[0186] In another example, after the video decoder 300 calculates the value of locSumAbs as described above, the video decoder 300 can normalize locSumAbs as follows:
[0187] locSumAbs = (noPos == 0 || noPos == 5) ? locSumAbs:((5 * locSumAbs) >> (noPos >> 1)), where noPos represents the number of adjacent coefficients available within the local template. The video decoder 300 can use locSumAbs as an index to the lookup table to derive the Rice parameter (cRiceParam):
[0188] cRiceParam = riceParTable[min(31, locSumAbs)], where riceParTable is defined as discussed above.
[0189] In another example, the video decoder 300 can derive the Rice parameter as discussed above, except for examples using different normalization as follows:
[0190] locSumAbs=locSumAbs<<((5–noPos)>>1)
[0191] In another example, the video decoder 300 can derive the value locSumAbs as described above, and can derive the Rice parameter based on locSumAbs and the base level of the residual, as follows:
[0192] offset=baseLevel==10? -30:-20
[0193] cRiceParam=riceParTable[max(min(31,locSumAbs+offset),0)]
[0194] In another example, video decoder 300 can derive the value locSumAbs in the same manner as described above. Video decoder 300 can derive the Rice parameter based on locSumAbs and the base level of the residual, as follows:
[0195] cRiceParam=riceParTable[max(min(31,locSumAbs–(baseLevel>>2)*15),0)]
[0196] In another example, the video decoder 300 can derive the value of locSumAbs in the same manner as described above. The video decoder 300 can derive the Rice parameter based on locSumAbs and the base level of the residual, as follows:
[0197] locSumAbs=baseLevel==10? locSumAbs:locSumAbs / 2-4
[0198] cRiceParam=riceParTable[min(locSumAbs,0)]
[0199] In another example, after the video decoder 300 calculates the value of locSumAbs as described above, the video decoder 300 can derive the Rice parameter (cRiceParam) as follows:
[0200] offset=baseLevel==10? -30:-20
[0201] if(baseLevel==0)
[0202] riceOffset=(locSumAbs>256?2:localSumAbs>128?1:0)
[0203] else
[0204] riceOffset=(locSumAbs>256?1:0)
[0205] cRiceParam = riceParTable[max(min(31,locSumAbs+offset),0)] + riceOffset, where riceParTable is defined as discussed above.
[0206] In the example, after the video decoder 300 calculates the value of locSumAbs as described above, the video decoder 300 can derive the Rice parameter (cRiceParam) as follows:
[0207] offset=baseLevel==10? -30:-20
[0208] riceOffset=(locSumAbs>128?1:0)
[0209] cRiceParam=riceParTable[max(min(31,locSumAbs+offset),0)]+riceOffse
[0210] RiceParTable is defined as discussed above.
[0211] In this example, the video decoder 300 can calculate locSumAbs as described above, and the video decoder 300 can normalize the value of locSumAbs as follows:
[0212] locSumAbs=locSumAbs<<((5–noPos)>>1)
[0213] Here, noPos represents the number of available adjacent coefficients within the local template. The video decoder 300 can derive the Rice parameter based on locSumAbs and the base level of the residual, as shown below:
[0214] offset=baseLevel==10? -30:-20
[0215] cRiceParam=riceParTable[max(min(31,locSumAbs+offset),0)]
[0216] In another example, the video decoder 300 can calculate locSumAbs as described above, and the video decoder 300 can normalize the value of locSumAbs as follows:
[0217] locSumAbs=locSumAbs<<((5–noPos)>>1)
[0218] Here, noPos represents the number of available adjacent coefficients within the local template. The video decoder 300 can derive the Rice parameter based on locSumAbs and the base level of the residual, as shown below:
[0219] cRiceParam=riceParTable[max(min(31,locSumAbs–(baseLevel>>2)*15),0)]
[0220] In another example, the video decoder 300 can calculate locSumAbs as described above, and the video decoder 300 can normalize the value of locSumAbs as follows:
[0221] locSumAbs=locSumAbs<<((5–noPos)>>1)
[0222] Here, noPos represents the number of available adjacent coefficients within the local template. The Rice parameter is derived based on locSumAbs and the base level of the residuals, as shown below:
[0223] offset = baseLevel > 0? -20:0
[0224] cRiceParam=riceParTable[max(min(31,locSumAbs+offset),0)]
[0225] Figure 8 This is a conceptual diagram illustrating an example local template with 5 adjacent coefficients. The example discussed above uses a local template that contains, for example... Figure 7 The coefficients shown are those to the left, above, and to the upper left of the current coefficient. The techniques of this disclosure can be modified to use different neighbors. For example, if the scanning order of coefficient decoding is from bottom right to top left, the video decoder 300 can use a template that includes right, bottom, and bottom-right neighbors in each of the examples above, such as... Figure 8 As shown, block 420 (black shading) represents the current coefficient, and adjacent coefficients 421-425 (gray shading) represent adjacent coefficients in the template.
[0226] We will now discuss context deduction for syntax elements. In some examples, the video decoder 300 can use information from neighboring coefficients to deduce the context for the current coefficient. For example, information about neighboring coefficients may include neighboring syntax values, the number of available neighboring coefficients, etc. Using information from neighboring coefficients can reduce processing power consumption and improve latency.
[0227] In some examples, the video decoder 300 may selectively share the context used for residual decoding between the luma and chroma components. For instance, for some syntax elements of residual decoding, the luma and chroma components share the same set of context, while for other syntax elements, luma and chroma use different sets of context.
[0228] In some examples, for each syntax element in which the technology of this disclosure is applied, the video decoder 300 can select a context from five candidates (e.g., denoted as context 0, 1, 2, 3, and 4 in Table 1 below). The selection of a context can be based on adjacent syntax values (e.g., left, top, and top-left neighbors). For example, top_flag can be the top neighbor coefficient (e.g., ...). Figure 7 The value of the same syntax element in the adjacent coefficients (411) is left_flag, which can be the left neighbor coefficient (e.g., Figure 7 The values of the same syntax elements in the adjacent coefficients (415) and top_left_flag can be the top-left neighbor coefficients (e.g., Figure 7 The value of the same syntax element (adjacent coefficient 413) in the context. If no adjacent coefficient exists, the video decoder 300 can infer the corresponding flag as 0. For example, noPos can be the number of available adjacent coefficients. The video decoder 300 can infer the context as follows:
[0229] if noPos == 0
[0230] selectedContext=0
[0231] else
[0232] The value of selectedContext (e.g., the context used) is assigned according to Table 1.
[0233] top_flag left_flag top-left flags Context of use 0 0 0 1 0 0 1 1 0 1 0 2 0 1 1 4 1 0 0 2 1 0 1 4 1 1 0 3 1 1 1 3
[0234] Table 1 - Contextual derivation when noPos>0
[0235] The above techniques can be applied to different syntax elements. For example, the video decoder 300 can use this technique to select sig_coeff_flag, abs_level_gt1_flag, and / or abs_level_gt2_flag.
[0236] The contexts 0, 1, 2, 3, and 4 in the above description are example ways of naming different contexts. Different ways of naming contexts may be used, and are still within the scope of this disclosure.
[0237] In some examples, for each syntax element in which the technology of this disclosure is applied, the video decoder 300 may select a context from four candidates (e.g., denoted as context 0, 1, 2, and 3) instead of five. The video decoder 300 may select a context based on adjacent syntax values (e.g., left, top, and top-left neighbors). For example, `top_flag` may be the value of the same syntax element with the top neighbor coefficient (e.g., adjacent coefficient 411), `left_flag` may be the value of the same syntax element with the left neighbor coefficient (e.g., adjacent coefficient 415), and `top_left_flag` may be the value of the same syntax element with the top-left neighbor coefficient (e.g., adjacent coefficient 413). If no adjacent coefficient exists, the video decoder 300 may infer the corresponding flag as 0. The video decoder 300 may determine the value of `selectedContext` (e.g., "Context Used") according to Table 2.
[0238] top_flag left_flag top-left flag Context of use 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 3 1 0 0 1 1 0 1 3 1 1 0 2 1 1 1 2
[0239] Table 2 - Contextual Deduction
[0240] The techniques described above can be applied to different syntax elements. For example, sig_coeff_flag, abs_level_gt1_flag, and abs_level_gt2_flag can be determined using these techniques.
[0241] The contexts 0, 1, 2, and 3 in the above description are an example of how to name different contexts. Different ways of naming contexts may be used, and are still within the scope of this disclosure.
[0242] In some examples, for each syntax element in which the techniques of this disclosure are applied, the video decoder 300 can select a context from four candidates (e.g., denoted as context 0, 1, 2, and 3). The video decoder 300 can select a context based on adjacent syntax values (e.g., left, top, and top-left neighbors). For example, top_flag is the value of the same syntax element for the top neighbor coefficient (e.g., adjacent coefficient 411), left_flag can be the value of the same syntax element for the left neighbor coefficient (e.g., adjacent coefficient 415), and top_left_flag can be the value of the same syntax element for the top-left neighbor coefficient (e.g., adjacent coefficient 413). If no adjacent coefficient exists, the video decoder 300 can infer the corresponding flag as 0.
[0243] For example, noPos can be the number of available adjacent coefficients. The video decoder 300 can perform context inference as follows, for example, when noPos == 0, one of the predefined contexts N is assigned N = 2.
[0244] if noPos == 0
[0245] selectedContext=N
[0246] else
[0247] The video decoder 300 can determine the value of selectedContext (e.g., "Context used") based on Table 3.
[0248] top_flag left_flag top-left flags Context of use 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 3 1 0 0 1 1 0 1 3 1 1 0 2 1 1 1 2
[0249] Table 3 - Contextual Deduction
[0250] The techniques described above can be applied to different syntax elements. For example, the video decoder 300 can use these techniques to select sig_coeff_flag, abs_level_gt1_flag, and abs_level_gt2_flag.
[0251] The contexts 0, 1, 2, and 3 in the above description are an example of how to name different contexts. Different ways of naming contexts may be used, and are still within the scope of this disclosure.
[0252] The above technique can be modified to use different neighbors. For example, if the scanning order of the coefficient decoding is from bottom right to top left, the video decoder 300 can use the right, bottom, and bottom-right neighbors (respectively...). Figure 8 The adjacency coefficients are 421, 425 and 423, instead of left, top and top left neighbors.
[0253] In some examples, for the transform skip residuals sig_coeff_flag, abs_level_gt1_flag, and abs_level_gt2_flag, the video decoder 300 can use separate or different contexts for the luma and chroma components. For other syntax elements, the luma and chroma components can share the same set of contexts.
[0254] In some examples, for each of the three syntax elements: sig_coeff_flag, abs_level_gt1_flag, and abs_level_gt2_flag, the video decoder 300 can use four contexts for luma and four contexts for chroma. For each specific coefficient, the video decoder 300 can select a context from the four contexts for the corresponding color component, as described herein.
[0255] Conditional skipping of the second path will now be described. In some examples, the video decoder 300 may conditionally skip one or more decoding paths based on information from the residual coefficient set (CG) within the transform unit (TU). As an example, the second path of transform skip decoding in VVC draft 7 may be skipped based on the residual CG information.
[0256] In some examples, numRemNonZeroCGs can be the number of remaining non-zero CGs in the TU (the current CG may or may not be included in numRemNonZeroCGs). numFlagsPass2 can be the number of flags to be decoded in the second path (e.g., in VVC draft 7, numFlagsPass2 = 4). remainingCtxBin can be the remaining context decoding bin for the current TU. n can be a multiplier; for example, n = 1.75.
[0257] CGSize can be the size of the coefficient group. For example, in VVC Draft 7, the value of CGSize is 16.
[0258] In some examples, in addition to other conditions required for performing second-path decoding (if present), the following condition may need to be true in order to perform second-path decoding. According to the techniques of this disclosure, in some examples, ">" in the following condition may be replaced with ">=".
[0259] remainingCtxBin>(n*CGSize*numRemNonZeroCGs)
[0260] In some examples, the condition for video decoder 300 to perform second-path decoding is: if(remainingCtxBin>=numFlagsPass2&&remainingCtxBin>(n*CGSize*numRemNonZeroCGs))
[0261] Proceed to the second pathway.
[0262] In some examples, the video decoder 300 can obtain the value of numRemNonZeroCGs for each coefficient. The video encoder 200 can decode the coded_sub_block_flag of all CGs within the TU at the beginning of the TU before decoding any particular coefficient within the same TU.
[0263] In some examples, numRemCGs can be the number of remaining CGs in the TU (the current CG may or may not be included in numRemCGs). numFlagsPass2 can be the number of flags to be decoded in the second path (e.g., in VVC draft 7, numFlagsPass2 = 4). remainingCtxBin can be the remaining context decoding bin for the current TU. n can be a multiplier; for example, n = 1.75. CGSize is the size of the coefficient set. For example, in VVC draft 7, this value is 16.
[0264] In addition to other conditions (if present) required for the second-path decoding to be performed, the following condition may need to be true for the video decoder 300 to perform the second-path decoding. In some examples, according to the techniques of this disclosure, the ">" in the condition may be replaced with ">=".
[0265] remainingCtxBin>(n*CGSize*numRemCGs)
[0266] For example, the conditions under which the video decoder 300 performs second-path decoding could be:
[0267] if(remainingCtxBin>=numFlagsPass2&&remainingCtxBin>n*CGSize*numRemCGs)
[0268] Proceed to the second pathway.
[0269] Figure 9 This is a flowchart illustrating an example transform skip mode decoding technique of the present disclosure. Video encoder 200 or video decoder 300 can determine whether a transform skip mode is used for the current block (430) of video data. For example, the mode selection unit 202 of video encoder 200 can run multiple encoding paths to test different encoding parameters and determine, based on the resulting rate-distortion values, that the current block should be encoded using a transform skip mode. Video encoder 200 can signal a flag indicating that a transform skip mode is being used for the current block. Video decoder 300 can parse this flag to determine that the current block is encoded using a transform skip mode.
[0270] Based on the transform skip mode applied to the current block, video encoder 200 or video decoder 300 can disable level mapping for residual decoding (432). For example, video encoder 200 or video decoder 300 can choose not to apply level mapping to the residual coefficients of the current block. Video encoder 200 or video decoder 300 can decode the current block without applying level mapping. For example, video encoder 200 can avoid applying level mapping to the residual coefficients and use transform skip mode to encode the current block. For example, video decoder 300 can avoid applying level mapping to the residual coefficients and use transform skip mode to decode the current block.
[0271] In some examples, the video encoder 200 or video decoder 300 may decode the following in a first path: a flag indicating whether the transform coefficients of the current block are non-zero, two flags indicating whether the absolute value of the transform coefficients is greater than j<<1)+1, and a flag indicating the parity of the transform coefficients. The video encoder 200 or video decoder 300 may decode the following in a second path: a flag indicating the sign of the transform coefficients, and three flags indicating whether the absolute value of the transform coefficients is greater than j<<1)+1. The video encoder 200 or video decoder 300 may decode the flag indicating the remaining absolute value of the transform coefficients in a third path, where j indicates the number of flags. For example, j indicates the j-th flag, and the j-th flag indicates the absolute value of the transform coefficient.
[0272] In some examples, the video encoder 200 or video decoder 300 can determine neighboring coefficient values adjacent to the current coefficient value of the current block, and determine the Rice parameter based on the neighboring coefficient values. For example, the video encoder 200 or video decoder 300 can also decode the current block based on the Rice parameter. In some examples, the neighboring coefficient values include two left coefficient values, two top coefficient values, and one top-left coefficient value. In some examples, the neighboring coefficient values include two right coefficient values, two bottom coefficient values, and one bottom-right coefficient value.
[0273] The video encoder 200 or video decoder 300 can determine information associated with neighboring coefficients adjacent to the current coefficient of the current block of video data, and determine the context used for the current coefficient based on the information associated with the neighboring coefficients. In some examples, the video encoder 200 or video decoder 300 can also decode the current block based on the context. In some examples, this information includes the syntax values of the neighboring coefficients. In some examples, this information includes the number of available neighboring coefficients.
[0274] In some examples, video encoder 200 or video decoder 300 may determine the syntax elements associated with residual decoding for the current block, and determine the context sets for the luma component and the chroma component for the current block. In some examples, video encoder 200 or video decoder 300 may also decode the current block based on the context sets for the luma component and the chroma component. In some examples, if the syntax element is a first syntax element, the context sets for the luma component and the chroma component are shared. In other words, if the syntax element is a first syntax element, video encoder 200 or video decoder 300 may use the same context set for both the luma and chroma components of the current block. In some examples, if the syntax element is a second syntax element, the context sets for the luma component and the chroma component are different. In other words, if the syntax element is a second syntax element, video encoder 200 or video decoder 300 may use a different context set for the luma component compared to the chroma component of the current block.
[0275] In some examples, the video encoder 200 or video decoder 300 may determine whether the number of remaining context decoding boxes for the current transform unit is greater than the multiplier multiplied by the size of the decoded group multiplied by the number of remaining coefficient groups in the transform unit, and skip decoding paths when decoding the current block based on the premise that the number of remaining context decoding boxes for the current transform unit is not greater than the multiplier multiplied by the size of the decoded group multiplied by the number of remaining coefficient groups in the transform unit. In some examples, the multiplier may be 1.75. In some examples, skipping decoding paths includes skipping a second decoding path.
[0276] In some examples, video encoder 200 includes a camera configured to capture video data. In some examples, video decoder 300 includes a display device configured to display the video data. In some examples, video encoder 200 or video decoder 300 is part of a mobile phone. In some examples, determining whether a transform skip mode is used for the current block of video data is based on video data from an encoded video bitstream, and decoding the current block without applying level mapping includes decoding the current block without level mapping. In some examples, determining whether a transform skip mode is used for the current block of video data is based on rate-distortion values, and decoding the current block without applying level mapping includes encoding the current block without level mapping.
[0277] Figure 10 This is a flowchart illustrating an example method for encoding the current block. The current block may include the current CU. Although regarding video encoder 200 ( Figure 1 and 3 The description is provided, but it should be understood that other devices can be configured to perform the same actions. Figure 10 Similar to the method.
[0278] In this example, the video encoder 200 initially predicts the current block (350). For example, the video encoder 200 may form a prediction block for the current block. Then, the video encoder 200 may compute a residual block for the current block (352). To compute the residual block, the video encoder 200 may compute the difference between the original unencoded block and the prediction block for the current block. In some examples, when in transform skip mode, the video encoder 200 may disable level mapping for coefficient decoding. The video encoder 200 may then transform the residual block and quantize the transform coefficients of the residual block (354). In some examples, when in transform skip mode, the video encoder 200 may not transform the residual block. Next, the video encoder 200 may scan the quantized transform coefficients of the residual block (356). During or after the scan, the video encoder 200 may entropy encode the transform coefficients (358). For example, the video encoder 200 may use CAVLC or CABAC to encode the transform coefficients. In some examples, the video encoder 200 can determine whether a transform skip mode is used for the current block of video data. Based on the fact that a transform skip mode is used for the current block, the video encoder 200 can disable level mapping for residual decoding and decode the current block without applying level mapping. The video encoder 200 can then output the entropy-encoded data (360) of the block.
[0279] Figure 11 This is a flowchart illustrating an example method for decoding the current block of video data. The current block may include the current CU. Although regarding video decoder 300 ( Figure 1 and 4 The description is provided, but it should be understood that other devices can be configured to perform the same actions. Figure 11 Similar to the method.
[0280] The video decoder 300 can receive entropy-encoded data for the current block (such as entropy-encoded prediction information and entropy-encoded data for the coefficients of the residual block corresponding to the current block) (370). The video decoder 300 can entropy decode the entropy-encoded data to determine the prediction information for the current block and the coefficients for regenerating the residual block (372). In some examples, the video decoder 300 can determine whether a transform skip mode is used for the current block of video data. Based on the fact that a transform skip mode is used for the current block, the video decoder 300 can disable level mapping for residual decoding and decode the current block without applying level mapping. The video decoder 300 can predict the current block, for example, using an intra-frame or inter-frame prediction mode indicated by the prediction information for the current block (374), to compute a prediction block for the current block. The video decoder 300 can then inverse scan the regenerated transform coefficients (376) to create a block of quantized transform coefficients. The video decoder 300 can then inverse quantize and inverse transform the transform coefficients to produce a residual block (378). In some examples, when in transform skip mode, the video decoder 300 may skip or not apply the inverse transform. Ultimately, the video decoder 300 can decode the current block by combining the predicted block and the residual block (380). In some examples, when in transform skip mode, the video decoder may disable level mapping for coefficient decoding.
[0281] According to this disclosure, decoder performance can be improved and decoding latency can be reduced by using coordinated transformations to skip residual decoding techniques for both lossy and lossless decoding.
[0282] This disclosure includes the following examples.
[0283] Clause 1. A method for decoding video data, the method comprising:
[0284] Determine whether a transform skip mode is used for the current block of the video data; disable level mapping for residual decoding based on the use of the transform skip mode; and decode the current block based on coefficient decoding.
[0285] Clause 2. A method for decoding video data, the method comprising: decoding sig_coeff_flag, abs_level_gt1_flag, abs_level_gt2_flag, and par_level_flag for a current block of the video data in a first path; decoding coeff_sign_flag and abs_level_gtX_flags (X = 3, 4, 5) of the current block in a second path; and decoding abs_remainder in a third path.
[0286] Clause 3. A method for decoding video data, the method comprising: determining adjacent coefficient values that are adjacent to the current coefficient value of a current block of video data; determining a Rice parameter based on the adjacent coefficient values; and decoding the current block based on the Rice parameter.
[0287] Clause 4. The method described in Clause 3, wherein the adjacent coefficient values include two left coefficient values, two upper coefficient values, and one upper left coefficient value.
[0288] Clause 5. The method described in Clause 3, wherein the adjacent coefficient values include two right coefficient values, two lower coefficient values, and one lower right coefficient value.
[0289] Clause 6. A method for decoding video data, the method comprising: determining information associated with neighboring coefficients adjacent to a current coefficient of a current block of video data; determining a context for the current coefficient based on the information associated with the neighboring coefficients; and decoding the current block based on the context.
[0290] Clause 7. The method described in Clause 6, wherein the information includes adjacent syntax values.
[0291] Clause 8. The method described in Clause 6, wherein the information includes the number of available adjacent coefficients.
[0292] Clause 9. A method for decoding video data, the method comprising: determining a syntax element for residual decoding of a current block of video data; determining a context set for the luma component and a context set for the chroma component of the current block; and decoding the current block based on the context set for the luma component and the context set for the chroma component, wherein if the syntax element is a first syntax element, the context set for the luma component and the context set for the chroma component are the same, and if the syntax element is a second syntax element, the context set for the luma component and the context set for the chroma component are different.
[0293] Clause 10. A method for decoding video data, the method comprising: determining information associated with a set of remaining coefficients within a transform unit of a current block of the video data; and, based on the information, skipping a decoding path when decoding the current block.
[0294] Clause 11. The method according to Clause 10, wherein skipping the decoding path includes: skipping the second decoding path.
[0295] Clause 12. The method according to any one of Clauses 1-11, wherein decoding includes decoding.
[0296] Clause 13. The method according to any one of Clauses 1-12, wherein decoding includes encoding.
[0297] Clause 14. The method described in any combination of Clauses 1-13.
[0298] Clause 15. An apparatus for decoding video data, the apparatus comprising one or more units for performing the method according to any one of Clauses 1-14.
[0299] Clause 16. The device according to Clause 15, wherein the one or more units include one or more processors implemented in a circuit.
[0300] Clause 17. The device according to any one of Clauses 15 and 16 further includes: a memory for storing the video data.
[0301] Clause 18. The device according to any one of Clauses 15-17 further includes: a display configured to display decoded video data.
[0302] Clause 19. The device pursuant to any one of Clauses 15-18, wherein the device comprises one or more of the following: a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.
[0303] Clause 20. The device according to any one of Clauses 15-19, wherein the device includes a video decoder.
[0304] Clause 21. The device according to any one of Clauses 15-20, wherein the device includes a video encoder.
[0305] Clause 22. A computer-readable storage medium having instructions stored thereon, which, when executed, cause one or more processors to perform the method according to any one of Clauses 1-14.
[0306] Clause 23. An apparatus for encoding video data, the apparatus comprising a unit for performing the method according to any one of Clauses 1-14.
[0307] It should be recognized that, based on the examples, certain actions or events of any of the techniques described herein may be performed in a different order, may be added, combined, or omitted entirely (e.g., not all of the described actions or events are necessary for the practice of the technique). Furthermore, in some examples, actions or events may be performed concurrently rather than sequentially, for example, through multithreaded processing, interrupt handling, or multiple processors.
[0308] In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functionality may be stored or transmitted as one or more instructions or code on or through a computer-readable medium and executed by a hardware-based processing unit. A computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium such as a data storage medium or a communication medium, including, for example, any medium that facilitates the transfer of a computer program from one place to another according to a communication protocol. In this way, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to obtain instructions, code, and / or data structures for implementing the techniques described in this disclosure. Computer program products may include computer-readable media.
[0309] For example, rather than limiting, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, flash memory, or any other medium capable of storing desired program code in the form of instructions or data structures and accessible by a computer. Furthermore, any connection is appropriately referred to as a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology (such as infrared, radio, and microwave), then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technology (such as infrared, radio, and microwave) is included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather refer instead to non-transient tangible storage media. As used herein, disks and optical discs include compact optical discs (CDs), laser optical discs, optical discs, digital versatile optical discs (DVDs), floppy disks, and Blu-ray discs, wherein disks typically magnetically copy data, while optical discs utilize lasers to optically copy data. Combinations of the above items should also be included within the scope of computer-readable media.
[0310] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Accordingly, the terms "processor" and "processing circuit" as used herein can refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein can be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into combined codecs. Furthermore, the techniques can be implemented substantially within one or more circuit or logic elements.
[0311] The techniques disclosed herein can be implemented in a wide variety of devices or apparatuses, including wireless mobile phones, integrated circuits (ICs), or a set of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of a device configured to perform the disclosed techniques, but they do not necessarily need to be implemented through different hardware units. Rather, as described above, the various units can be combined in a codec hardware unit, or provided by a collection of interoperable hardware units (including one or more processors as described above) combined with appropriate software and / or firmware.
[0312] Various examples have been described. These and other examples are within the scope of the following claims.
Claims
1. A method for decoding video data, the method comprising: Determine whether the transform skip mode is used for the current block of the video data; Based on the transform skip mode used for the current block, disable level mapping for residual decoding; as well as Decode the current block without applying level mapping; The method further includes: Determine the neighboring coefficient values that are adjacent to the current coefficient value of the current block; and The Rice parameter is determined based on the adjacent coefficient value, and the decoding of the current block is also based on the Rice parameter.
2. The method according to claim 1, wherein, The adjacent coefficient values include two left coefficient values, two upper coefficient values, and one upper left coefficient value.
3. The method according to claim 1, wherein, The adjacent coefficient values include two right coefficient values, two lower coefficient values, and one lower right coefficient value.
4. The method according to claim 1, further comprising: Determine information associated with neighboring coefficients that are adjacent to the current coefficient of the current block; as well as Based on the information associated with the adjacent coefficients, the context for the current coefficient is determined; The decoding of the current block is still based on the context.
5. The method according to claim 4, wherein, The information includes syntax values for the adjacent coefficients.
6. The method according to claim 4, wherein, The information includes the number of available adjacent coefficients.
7. The method according to claim 1, further comprising: Determine the syntax elements associated with the residual decoding used for the current block; as well as Determine the context set for the luma component and the context set for the chroma component for the current block. The decoding of the current block is based on the context set for the luma component and the context set for the chroma component. If the syntax element is a first syntax element, the context set for the luma component and the context set for the chroma component are shared. If the syntax element is a second syntax element, the context set for the luma component and the context set for the chroma component are different.
8. The method according to claim 1, further comprising: Determine whether the number of remaining context decoding boxes used for the current transform unit is greater than the multiplier multiplied by the size of the decoding group multiplied by the number of remaining coefficient groups in the transform unit; as well as Based on the fact that the number of remaining context decoding boxes for the current transform unit is no greater than the multiplier multiplied by the size of the decoding group multiplied by the number of remaining coefficient groups in the transform unit, the decoding path is skipped when decoding the current block.
9. The method according to claim 8, wherein, Skipping the decoding path includes: skipping the second decoding path.
10. The method according to claim 1, wherein, Determining whether a transform skip mode is used for the current block of the video data is based on video data from an encoded video bitstream, and wherein decoding the current block without applying level mapping includes: decoding the current block without applying level mapping.
11. The method according to claim 1, wherein, Determining whether a transform skip mode is used for the current block of the video data is based on rate-distortion values, and wherein decoding the current block without applying level mapping includes encoding the current block without applying level mapping.
12. An apparatus for decoding video data, the apparatus comprising: A memory configured to store the video data; as well as One or more processors are implemented in the circuit and coupled to the memory, the one or more processors being configured to: Determine whether the transform skip mode is used for the current block of the video data; Based on the transform skip mode used for the current block, disable level mapping for residual decoding; and Decode the current block without applying level mapping; The one or more processors are further configured to: Determine the neighboring coefficient values that are adjacent to the current coefficient value of the current block; and The Rice parameter is determined based on the adjacent coefficient value, and the decoding of the current block is also based on the Rice parameter.
13. The device according to claim 12, wherein, The adjacent coefficient values include two left coefficient values, two upper coefficient values, and one upper left coefficient value.
14. The device according to claim 12, wherein, The adjacent coefficient values include two right coefficient values, two lower coefficient values, and one lower right coefficient value.
15. The device according to claim 12, wherein, The one or more processors are further configured to: Determine information associated with neighboring coefficients adjacent to the current coefficient of the current block; and Based on the information associated with the adjacent coefficients, the context for the current coefficient is determined; The one or more processors further decode the current block based on the context.
16. The device according to claim 15, wherein, The information includes syntax values for the adjacent coefficients.
17. The device according to claim 15, wherein, The information includes the number of available adjacent coefficients.
18. The device according to claim 12, wherein, The one or more processors are further configured to: Determine the syntax elements associated with the residual decoding used for the current block; and Determine the context set for the luma component and the context set for the chroma component for the current block. The one or more processors further decode the current block based on the context set for the luma component and the context set for the chroma component, wherein if the syntax element is a first syntax element, the context set for the luma component and the context set for the chroma component are shared, and if the syntax element is a second syntax element, the context set for the luma component and the context set for the chroma component are different.
19. The device according to claim 12, wherein, The one or more processors are further configured to: Determine whether the number of residual context decoding boxes used for the current transform unit is greater than the multiplier multiplied by the size of the decoded set multiplied by the number of residual coefficient sets in the transform unit; and Based on the fact that the number of remaining context decoding boxes for the current transform unit is no greater than the multiplier multiplied by the size of the decoding group multiplied by the number of remaining coefficient groups in the transform unit, the decoding path is skipped when decoding the current block.
20. The device according to claim 19, wherein, Skipping the decoding path includes: skipping the second decoding path.
21. The device of claim 12, further comprising a camera configured to capture the video data.
22. The apparatus of claim 12, further comprising a display device configured to display the video data.
23. The device according to claim 12, wherein, The device includes a mobile phone.
24. A non-transitory computer-readable storage medium storing instructions, said instructions, when executed by one or more processors, causing said one or more processors to perform the following operations: Determine whether the transform skip mode is used for the current block of video data; Based on the transform skip mode used for the current block, disable level mapping for residual decoding; and Decode the current block without applying level mapping; in, When the instruction is executed by the one or more processors, the one or more processors cause the one or more processors to also perform the following operations: Determine the adjacent coefficient values that are adjacent to the current coefficient value of the current block; as well as The Rice parameter is determined based on the adjacent coefficient value, and the decoding of the current block is also based on the Rice parameter.
25. An apparatus for decoding video data, the apparatus comprising: A unit for determining whether a transform skip mode is used for the current block of the video data; Units for disabling level mapping for residual decoding based on transform skip mode for the current block; Units for decoding the current block without applying level mapping; A unit for determining adjacent coefficient values that are adjacent to the current coefficient value of the current block; as well as A unit for determining Rice parameters based on the adjacent coefficient values, wherein decoding of the current block is also based on the Rice parameters.