Signaling of dynamic range adjustment parameters for decoding of picture buffer management and dynamic range
By signaling a unique identifier for the dynamic range adjustment adaptive parameter set in the bitstream, the problem of the dynamic range adjustment parameter set being overwritten during decoding is solved, thus improving the accuracy and output quality of video decoding.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- QUALCOMM INC
- Filing Date
- 2021-02-19
- Publication Date
- 2026-06-19
AI Technical Summary
When processing high dynamic range and wide color gamut video data, existing video encoding and decoding technologies may have their dynamic range adjustment parameter sets overwritten during the decoding process, leading to a decrease in decoding accuracy.
By signaling a unique identifier for the dynamic range adjustment adaptive parameter set in the bitstream, it is ensured that the parameter set is not overwritten during decoding until the corresponding parameter set is applied during output, thus ensuring the accurate application of the dynamic range adjustment parameters.
It improves the quality of video output to the display, ensures that the dynamic range adjustment parameter set is not overwritten during decoding, and improves decoding accuracy.
Smart Images

Figure CN115152234B_ABST
Abstract
Description
[0001] Cross-references to related applications
[0002] This application claims priority to U.S. Application No. 17 / 179,145, filed February 18, 2021, and U.S. Provisional Application No. 62 / 980,062, filed February 21, 2020, the entire contents of each of which are incorporated herein by reference. U.S. Application No. 17 / 179,145 claims the benefit of U.S. Provisional Application No. 62 / 980,062, filed February 21, 2020. Technical Field
[0003] This disclosure relates to video encoding and video decoding. Background Technology
[0004] Digital video functionality can be integrated into a wide range of devices, including digital televisions, digital direct broadcasting systems, wireless broadcasting systems, personal digital assistants (PDAs), laptops or desktop computers, tablets, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones, so-called "smartphones," video conferencing equipment, video streaming devices, and more. Digital video devices implement video coding technologies such as those specified in MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4 Part 10 Advanced Video Codec (AVC), ITU-T H.265 / High Efficiency Video Codec (HEVC), and extensions to these standards. By implementing such video coding technologies, video devices can more efficiently transmit, receive, encode, decode, and / or store digital video information.
[0005] Video coding and decoding techniques include spatial (intra-picture) prediction and / or temporal (inter-picture) prediction to reduce or eliminate redundancy inherent in video sequences. For block-based video coding and decoding, video strips (e.g., video pictures or portions of video pictures) can be segmented into video blocks, which may also be referred to as codec tree units (CTUs), codec units (CUs), and / or codec nodes. Video blocks in an intra-frame coding (I) strip of a picture are encoded using spatial prediction relative to reference samples in adjacent blocks within the same picture. Video blocks in an inter-frame coding (P or B) strip of a picture can use spatial prediction relative to reference samples in adjacent blocks within the same picture, or temporal prediction relative to reference samples in other reference pictures. A picture may be referred to as a frame, and a reference picture may be referred to as a reference frame. Summary of the Invention
[0006] Generally, this disclosure describes techniques for encoding and decoding video signals with high dynamic range (HDR) and wide color gamut (WCG) representations. More specifically, this disclosure describes signaling and operations applied to video data in certain color spaces to enable more accurate reproduction of HDR and WCG video data. The techniques of this disclosure define encoding and decoding operations that can improve the decoding accuracy of hybrid video codec systems used for encoding and decoding HDR and WCG video data by preventing data in the Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) from being overwritten by different data.
[0007] In one example, a method includes determining a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of video data; assigning a first DRA APS ID to the first DRA APS; determining a second DRA APS for a second image of video data; assigning a second DRA APS ID to the second DRA APS; signaling the first DRA APS in the bitstream; processing the first image according to the first DRA APS; determining whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, processing the second image according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signaling the second DRA APS in the bitstream and processing the second image according to the second DRA APS.
[0008] In another example, a device includes a memory configured to store video data and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of the video data; assign a first DRA APS ID to the first DRA APS; determine a second DRA APS for a second image of the video data; assign a second DRA APS ID to the second DRA APS; signal the first DRA APS in a bitstream; process the first image according to the first DRA APS; determine whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, process the second image according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signal the second DRA APS in a bitstream and process the second image according to the second DRA APS.
[0009] In another example, a computer-readable storage medium is encoded with instructions that, when executed, cause one or more processors to: determine a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first picture of video data; assign a first DRA APS ID to the first DRA APS; determine a second DRA APS for a second picture of video data; assign a second DRA APS ID to the second DRA APS; signal the first DRA APS in a bitstream; process the first picture according to the first DRA APS; determine whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, process the second picture according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signal the second DRA APS in a bitstream and process the second picture according to the second DRA APS.
[0010] In another example, an apparatus includes: components for determining a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of video data; components for assigning a first DRA APS ID to the first DRA APS; components for determining a second DRA APS for a second image of video data; components for assigning a second DRA APS ID to the second DRA APS; components for signaling the first DRA APS in a bitstream; components for processing the first image according to the first DRA APS; components for determining whether the first DRA APS ID is equal to the second DRA APS ID; components for processing the second image according to the first DRA APS if the first DRA APS ID is equal to the second DRA APS ID; and components for signaling the second DRA APS in a bitstream and processing the second image according to the second DRA APS if the first DRA APS ID is not equal to the second DRA APS ID.
[0011] In another example, a method includes: determining a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of video data; determining the DRA APS for the first image; storing the DRA APS in an APS buffer; determining a second DRA APS ID for a second image of video data; preventing the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and processing the first image and the second image according to the DRA APS.
[0012] In another example, a device includes a memory configured to store video data and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of the video data; determine the DRA APS for the first image; store the DRA APS in an APS buffer; determine a second DRA APS ID for a second image of the video data; prevent the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and process the first image and the second image according to the DRA APS.
[0013] In another example, a non-transitory computer-readable storage medium storing instructions, when executed, causes one or more processors to: determine a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first picture of video data; determine the DRA APS for the first picture; store the DRA APS in an APS buffer; determine a second DRA APS ID for a second picture of video data; prevent the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and process the first picture and the second picture according to the DRA APS.
[0014] In another example, a device includes: components for determining a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of video data; components for determining DRA APS for the first image; components for storing the DRA APS in an APS buffer; components for determining a second DRA APS ID for a second image of video data; components for preventing the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and components for processing the first image and the second image according to the DRA APS.
[0015] One or more examples will be set forth in detail in the accompanying drawings and the following description. Other features, objects, and advantages will be apparent from the description, drawings, and claims. Attached Figure Description
[0016] Figure 1 This is a block diagram illustrating an example video encoding and decoding system that can perform the techniques of this disclosure.
[0017] Figure 2A and Figure 2BThis is a conceptual diagram illustrating an exemplary quadtree binary tree (QTBT) structure and its corresponding codec tree unit (CTU).
[0018] Figure 3 This is a block diagram illustrating an example video encoder that can perform the techniques of this disclosure.
[0019] Figure 4 This is a block diagram illustrating an example video decoder that can perform the techniques of this disclosure.
[0020] Figure 5 It is a conceptual diagram illustrating human vision and display capabilities.
[0021] Figure 6 This is a conceptual diagram showing the color gamut.
[0022] Figure 7 This is a block diagram illustrating an example of HDR / WCG conversion.
[0023] Figure 8 This is a block diagram illustrating an example of inverse HDR / WCG conversion.
[0024] Figure 9 This is a conceptual diagram of an example of the electro-optic transfer function (EOTF).
[0025] Figure 10 This is a conceptual diagram of a visual example of the perceptual quantizer (PQ) transfer function (TF) (ST2084 EOTF).
[0026] Figure 11 This is a conceptual diagram of an example of the Luminance Driven Chromaticity Scaling (LCS) function.
[0027] Figure 12 This is a conceptual diagram illustrating Table 8-10 of the HEVC specification.
[0028] Figure 13 This is a conceptual diagram of the HDR buffer model.
[0029] Figure 14 It is a block diagram of a video encoder and video decoder system including the DRA unit.
[0030] Figure 15 This is a flowchart illustrating an example DRA APS encoding technique according to this disclosure.
[0031] Figure 16 This is a flowchart illustrating an example DRA APS decoding technique according to this disclosure.
[0032] Figure 17 This is a flowchart illustrating an example of video encoding.
[0033] Figure 18 This is a flowchart illustrating an example of video decoding. Detailed Implementation
[0034] A video encoder can signal dynamic range adjustment (DRA) data as a separate network adaptation layer (NAL) unit, which has a specific applicable adaptive parameter set (APS) identifier (ID) for all pictures in the picture parameter set (PPS) that reference that PPS. The video decoder can apply an inverse DRA process during the output process, which can be decoupled from the decoding process in time, for example, in random access (RA) encoding / decoding scenarios. This potential decoupling between the decoding and output processes can lead to situations where DRA APSs in the APS buffer, which might be specified by an ID, may have been overwritten by new DRA APSs during decoding.
[0035] To ensure that DRA APS data in the APS buffer is not overwritten during decoding until DRA is applied during output (based on the corresponding APS ID), the techniques disclosed herein prevent APS buffer entries from being overwritten by different data during decoding. These techniques limit the bitstream so that each DRA APS with a specific ID number includes the same content. In this way, DRA can be applied appropriately, which can improve the quality of video output to the display.
[0036] Figure 1 This is a block diagram illustrating an example video encoding and decoding system 100 that can perform the techniques of this disclosure. The techniques of this disclosure are generally directed to encoding and / or decoding video data. Typically, video data includes any data used for processing video. Thus, video data can include raw, unencoded video, encoded video, decoded (e.g., reconstructed) video, and video metadata (such as signaling data).
[0037] like Figure 1 As shown, in this example, the video encoding and decoding system 100 includes a source device 102 that provides encoded video data to be decoded and displayed by a destination device 116. Specifically, the source device 102 provides the video data to the destination device 116 via a computer-readable medium 110. The source device 102 and the destination device 116 can include any of a wide range of devices, including desktop computers, laptop computers, tablet computers, set-top boxes, handsets (mobile devices) such as smartphones, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, broadcast receivers, etc. In some cases, the source device 102 and the destination device 116 may be equipped for wireless communication and may therefore be referred to as wireless communication devices.
[0038] exist Figure 1 In the example, source device 102 includes a video source 104, memory 106, video encoder 200, and output interface 108. Destination device 116 includes an input interface 122, video decoder 300, memory 120, and display device 118. According to this disclosure, the video encoder 200 of source device 102 and the video decoder 300 of destination device 116 can be configured to apply signaling and manipulation techniques for video data in some color space. Therefore, source device 102 represents an example of a video encoding device, while destination device 116 represents an example of a video decoding device. In other examples, the source device and destination device may include other components or arrangements. For example, source device 102 may receive video data from an external video source such as an external camera. Similarly, destination device 116 may be connected to an external display device, rather than including an integrated display device.
[0039] like Figure 1 The video encoding and decoding system 100 shown is merely an example. Generally, any digital video encoding and / or decoding device can perform techniques for signaling and manipulation applied to video data in some color space. Source device 102 and destination device 116 are merely examples of such encoding and decoding devices in which source device 102 generates encoded and decoded video data for transmission to destination device 116. In this disclosure, the term "coding device" refers to a device that performs the encoding and / or decoding of data. Therefore, video encoder 200 and video decoder 300 represent examples of encoding and decoding devices, specifically examples of a video encoder and a video decoder, respectively. In some examples, source device 102 and destination device 116 can operate in a substantially symmetrical manner, such that each of source device 102 and destination device 116 includes video encoding and decoding components. Therefore, video encoding and decoding system 100 can support one-way or two-way video transmission between source device 102 and destination device 116, for example, for video streaming, video playback, video broadcasting, or video telephony.
[0040] Typically, video source 104 represents a source of video data (i.e., raw, unencoded video data) and provides a series of sequential pictures (also called “frames”) of video data to video encoder 200, which encodes the data for the pictures. Video source 104 of source device 102 may include video capture devices such as cameras, video archives containing previously captured raw video, and / or video feed interfaces that receive video from video content providers. Alternatively, video source 104 may generate computer graphics-based data as source video, or a combination of live video, archived video, and computer-generated video. In each case, video encoder 200 encodes the captured, pre-captured, or computer-generated video data. Video encoder 200 may rearrange the pictures from the received order (sometimes referred to as the “display order”) to an encoding / decoding order for encoding and decoding. Video encoder 200 may generate a bitstream comprising encoded video data. Then, the source device 102 can output encoded video data to a computer-readable medium 110 via the output interface 108 for reception and / or retrieval by, for example, the input interface 122 of the destination device 116.
[0041] The memory 106 of source device 102 and the memory 120 of destination device 116 represent general-purpose memory. In some examples, memory 106 and memory 120 may store raw video data, such as raw video from video source 104 and raw, decoded video data from video decoder 300. Additionally or alternatively, memory 106 and memory 120 may store software instructions that can be executed by, for example, video encoder 200 and video decoder 300. Although memory 106 and memory 120 are shown as separate from video encoder 200 and video decoder 300 in this example, it should be understood that video encoder 200 and video decoder 300 may also include internal memory for functionally similar or equivalent purposes. Furthermore, memory 106 and memory 120 may store, for example, encoded video data output from video encoder 200 and input to video decoder 300. In some examples, portions of memory 106 and memory 120 may be allocated as one or more video buffers, for example, to store raw, decoded, and / or encoded video data.
[0042] Computer-readable medium 110 can represent any type of medium or device capable of transmitting encoded video data from source device 102 to destination device 116. In one example, computer-readable medium 110 represents a communication medium enabling source device 102 to directly transmit encoded video data to destination device 116 in real time via, for example, a radio frequency network or a computer-based network. According to communication standards such as wireless communication protocols, output interface 108 can demodulate the transmitted signal including the encoded video data, and input interface 122 can demodulate the received transmitted signal. The communication medium can include any wireless or wired communication medium, such as radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium can include a router, switch, base station, or any other device that can facilitate communication from source device 102 to destination device 116.
[0043] In some examples, source device 102 can output encoded data to storage device 112 from output interface 108. Similarly, destination device 116 can access encoded data from storage device 112 via input interface 122. Storage device 112 may include any of a variety of distributed or locally accessible data storage media, such as hard disk drives, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
[0044] In some examples, source device 102 may output encoded video to file server 114 or another intermediate storage device that may store the encoded video generated by source device 102. Destination device 116 may access the stored video data from file server 114 via streaming or downloading. File server 114 may be any type of server device capable of storing encoded video data and sending the encoded video data to destination device 116. File server 114 may represent a web server (e.g., for a website), a file transfer protocol (FTP) server, a content delivery network device, or a network attached storage (NAS) device. Destination device 116 may access the encoded video data from file server 114 via any standard data connection including an internet connection. This may include wireless channels (e.g., Wi-Fi connections), wired connections (e.g., digital subscriber line (DSL), cable modems, etc.), or combinations thereof, suitable for accessing encoded video data stored on file server 114. File server 114 and input interface 122 may be configured to operate according to streaming protocols, download protocols, or combinations thereof.
[0045] Output interface 108 and input interface 122 can represent a wireless transmitter / receiver, a modem, a wired network component (e.g., an Ethernet card), a wireless communication component operating according to any of the various IEEE 802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 include wireless components, output interface 108 and input interface 122 can be configured to transmit data such as encoded video data according to cellular communication standards such as 4G, 4G-LTE (Long Term Evolution), Advanced LTE, 5G, etc. In some examples where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 can be configured according to specifications such as IEEE 802.11, IEEE 802.15 (e.g., ZigBee). TM Bluetooth TM Other wireless standards, such as those used for transmitting encoded video data, may be employed. In some examples, source device 102 and / or destination device 116 may include their respective system-on-a-chip (SoC) devices. For example, source device 102 may include an SoC device performing functions belonging to video encoder 200 and / or output interface 108, and destination device 116 may include an SoC device performing functions belonging to video decoder 300 and / or input interface 122.
[0046] The technology disclosed herein can be applied to video encoding and decoding that supports any of a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission, dynamic adaptive streaming via HTTP (DASH), digital video encoded onto a data storage medium, decoding digital video stored on a data storage medium, or other applications.
[0047] The input interface 122 of the destination device 116 receives an encoded video bitstream from a computer-readable medium 110 (e.g., a communication medium, storage device 112, file server 114, etc.). The encoded video bitstream may include signaling information defined by the video encoder 200, which is also used by the video decoder 300. This signaling information may include syntax elements having values describing the characteristics and / or processing of video blocks or other encoding / decoding units (e.g., stripes, pictures, picture groups, sequences, etc.). The display device 118 displays decoded pictures of the decoded video data to the user. The display device 118 may represent any of a variety of display devices, such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma display, organic light-emitting diode (OLED) display, or another type of display device.
[0048] Although Figure 1Not shown, but in some examples, the video encoder 200 and video decoder 300 may each be integrated with the audio encoder and / or audio decoder, and may include appropriate MUX-DEMUX units or other hardware and / or software to handle multiplexed streams that include both audio and video in a common data stream. Where applicable, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols, such as User Datagram Protocol (UDP).
[0049] The video encoder 200 and video decoder 300 can be implemented as any of a variety of suitable encoder and / or decoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When these techniques are partially implemented in software, the device may store software instructions in a suitable non-transitory computer-readable medium and use one or more processors to execute these instructions in hardware to perform the techniques of this disclosure. Each of the video encoder 200 and video decoder 300 may be included in one or more encoders or decoders, and either encoder or decoder may be integrated as part of a combined encoder / decoder (CODEC) in the respective device. Devices including the video encoder 200 and / or video decoder 300 may include integrated circuits, microprocessors, and / or wireless communication devices such as cellular phones.
[0050] The video encoder 200 and video decoder 300 can operate according to video codec standards such as ITU-T H.265, also known as High Efficiency Video Codec (HEVC) or its extensions such as Multi-View and / or Scalable Video Codec Extensions. Alternatively, the video encoder 200 and video decoder 300 can operate according to other proprietary or industry standards such as ITU-T H.266, also known as Universal Video Codec (VVC). The latest draft of the VVC standard is described in "(Versatile Video Coding (Draft 8)) Universal Video Coding (Draft 8)" of JVET-Q2001-vC (hereinafter referred to as "VVC Draft 8"), presented by Bross et al. at the 17th meeting of the Joint Video Experts Group (JVET) of ITU-T SG 16WP 3 and ISO / IEC JTC 1 / SC 29 / WG 11 in Brussels, Belgium, from 7 to 17 January 2020. However, the technology disclosed herein is not limited to any particular codec standard.
[0051] Typically, video encoder 200 and video decoder 300 can perform block-based image encoding and decoding. The term "block" generally refers to a structure that includes data to be processed (e.g., encoded, decoded, or otherwise used in encoding and / or decoding processes). For example, a block may include a two-dimensional matrix of samples of luminance and / or chrominance data. Typically, video encoder 200 and video decoder 300 can encode and decode video data represented in YUV (e.g., Y, Cb, Cr) format. That is, video encoder 200 and video decoder 300 can encode and decode luminance and chrominance components, rather than encoding and decoding red, green, and blue (RGB) data of image samples, where chrominance components may include hue and blue chrominance components. In some examples, video encoder 200 converts received RGB format data to a YUV representation before encoding, and video decoder 300 converts the YUV representation to RGB format. Alternatively, preprocessing and post-processing units (not shown) can perform these conversions.
[0052] This disclosure can generally relate to the encoding and decoding of images (e.g., encoding and decoding) to include processes for encoding or decoding image data. Similarly, this disclosure can relate to the encoding and decoding of image blocks to include processes for encoding or decoding block data, such as prediction and / or residual encoding and decoding. Encoded video bitstreams typically include a series of values representing encoding and decoding decisions (e.g., encoding / decoding modes) and syntax elements that segment images into blocks. Therefore, references to encoded or decoded images or blocks should generally be understood as the encoding and decoding values of the syntax elements used to form images or blocks.
[0053] HEVC defines various blocks, including codec units (CUs), prediction units (PUs), and transform units (TUs). According to HEVC, a video encoder (such as video encoder 200) partitions a codec tree unit (CTU) into multiple CUs according to a quadtree structure. That is, the video codec partitions the CTU and CU into four equal, non-overlapping squares, and each node of the quadtree has zero or four child nodes. Nodes without child nodes can be called "leaf nodes," and the CU of such leaf nodes can include one or more PUs and / or one or more TUs. The video codec can further partition PUs and TUs. For example, in HEVC, a residual quadtree (RQT) represents a partition of a TU. In HEVC, PUs represent inter-frame prediction data, while TUs represent residual data. Intra-frame prediction CUs include intra-frame prediction information, such as intra-frame mode indications.
[0054] As another example, video encoder 200 and video decoder 300 can be configured to operate according to VVC. According to VVC, the video encoder (such as video encoder 200) segments the image into multiple codec tree units (CTUs). Video encoder 200 can partition CTUs according to a tree structure, such as a quadtree-binary tree (QTBT) structure or a multi-type tree (MTT) structure. The QTBT structure removes the concept of multiple segmentation types, such as the separation between CUs, PUs, and TUs in HEVC. The QTBT structure consists of two layers: a first layer segmented according to quadtree segmentation, and a second layer segmented according to binary tree segmentation. The root node of the QTBT structure corresponds to a CTU. The leaf nodes of the binary tree correspond to codec units (CUs).
[0055] In the MTT partitioning structure, blocks can be partitioned using quadtree (QT) partitioning, binary tree (BT) partitioning, and one or more ternary tree (TT) partitioning methods. A ternary or tripartite partitioning divides a block into three sub-blocks. In some examples, a ternary or tripartite partitioning divides a block into three sub-blocks without using a center to partition the original block. Partition types in MTT (e.g., QT, BT, and TT) can be symmetric or asymmetric.
[0056] In some examples, the video encoder 200 and the video decoder 300 may use a single QTBT or MTT structure to represent each of the luma and chroma components, while in other examples, the video encoder 200 and the video decoder 300 may use two or more QTBT or MTT structures, such as one QTBT / MTT structure for the luma component and another QTBT / MTT structure for the two chroma components (or two QTBT / MTT structures for the respective chroma components).
[0057] The video encoder 200 and video decoder 300 can be configured to use quadtree segmentation, QTBT segmentation, MTT segmentation, or other segmentation structures according to HEVC. For illustrative purposes, the description of the technology in this disclosure is presented with respect to QTBT segmentation. However, it should be understood that the technology in this disclosure can also be applied to video codecs configured to use quadtree segmentation or other types of segmentation.
[0058] Blocks (e.g., CTUs or CUs) can be grouped in various ways within an image. As an example, a brick can refer to a rectangular area of a row of CTUs within a specific tile in an image. A tile can be a rectangular area within a specific tile column or a specific tile row in an image. A tile column refers to a rectangular area of CTUs with a height equal to the height of the image and a width specified by a syntax element (e.g., in the image parameter set). A tile row refers to a rectangular area of CTUs with a height specified by a syntax element (e.g., in the image parameter set) and a width equal to the width of the image.
[0059] In some examples, a slice can be divided into multiple tiles, where each tile may include one or more CTU rows within the slice. A slice that is not divided into multiple tiles can also be referred to as a tile. However, tiles that are a true subset of a slice cannot be referred to as a slice.
[0060] The tiles in an image can also be arranged in stripes. A strip can be an integer number of tiles that can be exclusively contained in a single Network Abstraction Layer (NAL) unit. In some examples, a strip consists of a continuous sequence of tiles that include multiple complete slices or only a single slice.
[0061] This disclosure uses "N×N" and "N multiplied by N" interchangeably to refer to the sample dimensions of a block (such as a CU or other video block) in the vertical and horizontal dimensions, for example, 16×16 samples or 16 by 16 samples. Typically, a 16×16 CU will have 16 samples in the vertical direction (y = 16) and 16 samples in the horizontal direction (x = 16). Similarly, an N×N CU typically has N samples in the vertical direction and N samples in the horizontal direction, where N represents a non-negative integer value. Samples in a CU can be arranged in rows and columns. Furthermore, a CU does not necessarily need to have the same number of samples in the horizontal and vertical directions. For example, a CU can include N×M samples, where M is not necessarily equal to N.
[0062] Video encoder 200 encodes video data of a CU (Complex Unit) representing prediction and / or residual information, as well as other information. Prediction information indicates how to predict the CU to form a prediction block for that CU. Residual information typically represents the sample-by-sample difference between samples of the CU before encoding and the prediction block.
[0063] To predict the Cubic Frame (CU), the video encoder 200 typically forms a prediction block of the CU through inter-frame prediction or intra-frame prediction. Inter-frame prediction generally refers to predicting the CU based on data from previously encoded images, while intra-frame prediction generally refers to predicting the CU based on data from previously encoded images of the same frame. To perform inter-frame prediction, the video encoder 200 can use one or more motion vectors to generate prediction blocks. The video encoder 200 can typically perform a motion search to identify a reference block that closely matches the CU (e.g., in terms of the difference between the CU and a reference block). The video encoder 200 can use sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared difference (MSD), or other such difference calculations to compute a difference metric to determine whether the reference block closely matches the current CU. In some examples, the video encoder 200 can use unidirectional or bidirectional prediction to predict the current CU.
[0064] Some examples of VVC also offer an affine motion compensation mode, which can be considered an inter-frame prediction mode. In affine motion compensation mode, the video encoder 200 can determine two or more motion vectors representing non-translational motion, such as zooming in or out, rotation, perspective motion, or other irregular motion types.
[0065] To perform intra-frame prediction, the video encoder 200 can select an intra-frame prediction mode to generate prediction blocks. Some examples of VVC provide sixty-seven intra-frame prediction modes, including various directional modes, as well as planar and DC modes. Typically, the video encoder 200 selects an intra-frame prediction mode that describes the neighboring samples of the current block (e.g., a block of a CU) to predict samples of the current block. Assuming the video encoder 200 encodes and decodes the CTU and CU in raster scan order (from left to right, from top to bottom), such samples are typically located above, to the upper left, or to the left of the current block within the same image.
[0066] The video encoder 200 encodes data representing the prediction mode of the current block. For example, for inter-frame prediction modes, the video encoder 200 may encode data indicating which of the various available inter-frame prediction modes is used, and the motion information of the corresponding mode. For example, for unidirectional or bidirectional inter-frame prediction, the video encoder 200 may use Advanced Motion Vector Prediction (AMVP) or merging modes to encode motion vectors. The video encoder 200 may use similar modes to encode motion vectors for affine motion compensation modes.
[0067] After prediction (such as intra-frame or inter-frame prediction of a block), the video encoder 200 can compute residual data for the block. The residual data (such as a residual block) represents the sample-by-sample difference between the block and the predicted block formed using the corresponding prediction mode. The video encoder 200 can apply one or more transforms to the residual block to produce transform data in the transform domain rather than the sample domain. For example, the video encoder 200 can apply a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video data. Additionally, the video encoder 200 can apply a secondary transform after the first transform, such as a Mode-dependent Inseparable Secondary Transform (MDNSST), a Signal-dependent Transform, a Karhunen-Loeve Transform (KLT), etc. The video encoder 200 produces transform coefficients after applying one or more transforms.
[0068] As described above, after any transform produces transform coefficients, the video encoder 200 can perform quantization of the transform coefficients. Quantization typically refers to the process of quantizing transform coefficients to minimize the amount of data used to represent them, thereby providing further compression. By performing quantization, the video encoder 200 can reduce the bit depth associated with some or all of the transform coefficients. For example, the video encoder 200 can round an n-bit value down to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, the video encoder 200 can perform a bit-right shift on the value to be quantized.
[0069] After quantization, the video encoder 200 can scan the transform coefficients to generate a one-dimensional vector from a two-dimensional matrix including the quantized transform coefficients. The scan can be designed to place higher-energy (and therefore lower-frequency) transform coefficients at the beginning of the vector and lower-energy (and therefore higher-frequency) transform coefficients at the end. In some examples, the video encoder 200 can utilize a predefined scan order to scan the quantized transform coefficients to produce a serialized vector, and then entropy-encode the quantized transform coefficients of the vector. In other examples, the video encoder 200 can perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, the video encoder 200 can entropy-encode the one-dimensional vector, for example, according to context-adaptive binary arithmetic codec (CABAC). The video encoder 200 can also entropy-encode the values of syntax elements that describe metadata associated with the encoded video data for use by the video decoder 300 when decoding the video data.
[0070] To perform CABAC, the video encoder 200 can assign context from within a context model to the symbols to be transmitted. This context may involve, for example, whether the neighboring values of the symbol are zero. Probability determination can be based on the context assigned to the symbols.
[0071] The video encoder 200 can also generate syntax data for the video decoder 300, such as block-based syntax data, image-based syntax data, and sequence-based syntax data, or other syntax data, such as sequence parameter sets (SPS), image parameter sets (PPS), or video parameter sets (VPS), in image headers, block headers, and strip headers. Similarly, the video decoder 300 can decode such syntax data to determine how to decode the corresponding video data.
[0072] In this way, the video encoder 200 can generate a bitstream that includes encoded video data (e.g., syntax elements describing the segmentation of an image into blocks (e.g., CUs) and prediction and / or residual information for those blocks). Finally, the video decoder 300 can receive the bitstream and decode the encoded video data.
[0073] Typically, the video decoder 300 performs the opposite processing to that performed by the video encoder 200 to decode the encoded video data of the bitstream. For example, the video decoder 300 can use CABAC to decode the values of the syntax elements of the bitstream in a manner substantially similar to but opposite to the CABAC encoding process of the video encoder 200. The syntax elements can define segmentation information for dividing the image into CTUs and for segmenting each CTU according to a corresponding segmentation structure (such as a QTBT structure) to define the CUs of the CTUs. The syntax elements can further define prediction and residual information for blocks (e.g., CUs) of the video data.
[0074] The residual information can be represented, for example, by quantized transform coefficients. The video decoder 300 can inversely quantize and inversely transform the quantized transform coefficients of the block to reconstruct the residual block of that block. The video decoder 300 uses the prediction mode (intra-frame or inter-frame prediction) and associated prediction information (e.g., motion information for inter-frame prediction) notified by the signal to form a prediction block for that block. The video decoder 300 can then combine the prediction block and the residual block (on a sample-by-sample basis) to reconstruct the original block. The video decoder 300 can perform additional processing, such as deblocking, to reduce visual artifacts along the block boundaries.
[0075] According to the technology disclosed herein, a method includes: determining a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of video data; assigning a first DRA APS ID to the first DRA APS; determining a second DRA APS for a second image of video data; assigning a second DRA APS ID to the second DRA APS; signaling the first DRA APS in a bitstream; processing the first image according to the first DRA APS; determining whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, processing the second image according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signaling the second DRA APS in a bitstream and processing the second image according to the second DRA APS.
[0076] According to the technology of this disclosure, an apparatus includes a memory configured to store video data and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of the video data; assign a first DRA APS ID to the first DRA APS; determine a second DRA APS for a second image of the video data; assign a second DRA APS ID to the second DRA APS; signal the first DRA APS in a bitstream; process the first image according to the first DRA APS; determine whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, process the second image according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signal the second DRA APS in a bitstream and process the second image according to the second DRA APS.
[0077] According to the technology disclosed herein, an apparatus includes: components for determining a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of video data; components for assigning a first DRA APS ID to the first DRA APS; components for determining a second DRA APS for a second image of video data; components for assigning a second DRA APS ID to the second DRA APS; components for signaling the first DRA APS in a bitstream; components for processing the first image according to the first DRA APS; components for determining whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, components for processing the second image according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, components for signaling the second DRA APS in a bitstream and processing the second image according to the second DRA APS.
[0078] According to the technology of this disclosure, a computer-readable storage medium is encoded with instructions that, when executed, cause one or more processors to: determine a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first picture of video data; assign a first DRA APS ID to the first DRA APS; determine a second DRA APS for a second picture of video data; assign a second DRA APS ID to the second DRA APS; signal the first DRA APS in a bitstream; process the first picture according to the first DRA APS; determine whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, process the second picture according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signal the second DRA APS in a bitstream and process the second picture according to the second DRA APS.
[0079] According to the technology disclosed herein, a method includes: determining a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of video data; determining a DRA APS for the first image; storing the DRA APS in an APS buffer; determining a second DRA APS ID for a second image of video data; preventing the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and processing the first image and the second image according to the DRA APS.
[0080] According to the technology disclosed herein, an apparatus includes a memory configured to store video data and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of the video data; determine a DRA APS for the first image; store the DRA APS in an APS buffer; determine a second DRA APS ID for a second image of the video data; prevent the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and process the first image and the second image according to the DRA APS.
[0081] According to the technology disclosed herein, a non-transitory computer-readable storage medium stores instructions that, when executed, cause one or more processors to: determine a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first picture of video data; determine the DRA APS for the first picture; store the DRA APS in an APS buffer; determine a second DRA APS ID for a second picture of video data; prevent the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and process the first picture and the second picture according to the DRA APS.
[0082] According to the technology disclosed herein, an apparatus includes: components for determining a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of video data; components for determining DRAAPS for the first image; components for storing the DRA APS in an APS buffer; components for determining a second DRA APS ID for a second image of video data; components for preventing the stored DRAAPS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and components for processing the first image and the second image according to the DRA APS.
[0083] This disclosure can generally refer to "signaling" certain information, such as syntax elements. The term "signaling" can generally refer to the communication of values for syntax elements and / or other data used to decode encoded video data. That is, the video encoder 200 can signal the values of syntax elements in the bitstream. Typically, signaling involves generating values in the bitstream. As described above, the source device 102 can transmit the bitstream to the destination device 116 substantially in real-time or non-real-time, such as when syntax elements are stored in the storage device 112 for later retrieval by the destination device 116.
[0084] Figure 2A and Figure 2B This is a conceptual diagram illustrating an example Quadtree Binary Tree (QTBT) structure 130 and its corresponding Code-to-Code-to-Decoder Tree Unit (CTU) 132. Solid lines represent quadtree splitting, and dashed lines indicate binary tree splitting. In each split (i.e., non-leaf) node of the binary tree, a signal informs a flag indicating which splitting type (i.e., horizontal or vertical) is used, where in this example, 0 indicates a horizontal split and 1 indicates a vertical split. For quadtree splitting, it is not necessary to indicate the splitting type because the quadtree node splits the block horizontally and vertically into four sub-blocks of equal size. Therefore, the video encoder 200 can encode and the video decoder 300 can decode the syntax elements (such as splitting information) at the region tree level of the QTBT structure 130 (i.e., solid lines) and the syntax elements (such as splitting information) at the prediction tree level of the QTBT structure 130 (i.e., dashed lines). The video encoder 200 can encode and the video decoder 300 can decode video data, such as prediction and transform data, for the CU represented by the terminal leaf nodes of the QTBT structure 130.
[0085] generally, Figure 2B The CTU 132 can be associated with parameters that define the size of the blocks corresponding to the nodes of the first and second layers of the QTBT structure 130. These parameters may include the CTU size (representing the size of the CTU 132 in the sample), the minimum quadtree size (MinQTSize, representing the minimum allowed size of the leaf nodes of the quadtree), the maximum binary tree size (MaxBTSize, representing the maximum allowed size of the root node of the binary tree), the maximum binary tree depth (MaxBTDepth, representing the maximum allowed depth of the binary tree), and the minimum binary tree size (MinBTSize, representing the minimum allowed size of the leaf nodes of the binary tree).
[0086] The root node of a QTBT structure corresponding to a CTU can have four child nodes in the first level of the QTBT structure, where each child node can be partitioned according to a quadtree partition. That is, the node in the first level is a leaf node (with no child nodes) or has four child nodes. An example of QTBT structure 130 represents such a node as including a parent node and child nodes with solid lines for branching. If the node in the first level is not larger than the maximum allowed binary tree root node size (MaxBTSize), the node can be further partitioned by the corresponding binary tree. The binary tree partitioning of a node can be iterated until the partitioning produces nodes that reach the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). An example of QTBT structure 130 represents such a node as having dashed lines for branching. The binary tree leaf nodes are called codec units (CUs), which are used for prediction (e.g., intra-picture or inter-picture prediction) and transformation without any further partitioning. As mentioned above, CUs can also be referred to as “video chunks” or “blocks”.
[0087] In one example of a QTBT partitioning structure, the CTU size is set to 128×128 (luminance samples and two corresponding 64×64 chrominance samples), MinQTSize is set to 16×16, MaxBTSize is set to 64×64, MinBTSize (both width and height) is set to 4, and MaxBTDepth is set to 4. First, quadtree partitioning is applied to the CTU to generate quadtree leaf nodes. Quadtree leaf nodes can have sizes ranging from 16×16 (i.e., MinQTSize) to 128×128 (i.e., the CTU size). If a quadtree leaf node is 128×128, it will not be further partitioned by the binary tree because its size exceeds MaxBTSize (i.e., 64×64 in this example). Otherwise, the quadtree leaf node will be further partitioned by the binary tree. Therefore, the quadtree leaf node is also the root node of the binary tree, and the binary tree depth is 0. When the depth of the binary tree reaches MaxBTDepth (4 in this example), further partitioning is not allowed. Similarly, when the width of a binary tree node equals MinBTSize (4 in this example), further vertical partitioning is not allowed. Likewise, a binary tree node with a height equal to MinBTSize means that further horizontal partitioning of the binary tree node is not allowed. As described above, the leaf nodes of the binary tree are called CUs and are further processed according to prediction and transformation without further partitioning.
[0088] Figure 3 A block diagram of an example video encoder 200 capable of implementing the techniques of this disclosure is shown. Figure 3This disclosure is for illustrative purposes and should not be construed as limiting the extensive examples and techniques described herein. For illustrative purposes, this disclosure describes the video encoder 200 based on the techniques of VVC (ITU-T H.266 under development) and HEVC (ITU-T H.265). However, the techniques of this disclosure can be implemented by video encoding devices configured for other video codec standards.
[0089] exist Figure 3 In the example, the video encoder 200 includes a video data memory 230, a mode selection unit 202, a residual generation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a filter unit 216, a decoded picture buffer (DPB) 218, and an entropy coding unit 220. Any one or all of the video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, DPB 218, and entropy coding unit 220 can be implemented in one or more processors or processing circuits. For example, the units of the video encoder 200 can be implemented as one or more circuit or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Furthermore, the video encoder 200 may include additional or alternative processors or processing circuitry to perform these and other functions.
[0090] The video data storage device 230 can store video data to be encoded by the components of the video encoder 200. The video encoder 200 can obtain data from, for example, a video source 104 (…). Figure 1 The video encoder 200 receives video data stored in video data memory 230. DPB 218 can act as a reference picture memory, storing reference video data used in the prediction of subsequent video data by the video encoder 200. Video data memory 230 and DPB 218 can be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 230 and DPB 218 can be provided by the same memory device or separate memory devices. In various examples, video data memory 230 can be on-chip with other components of the video encoder 200, as shown, or off-chip relative to these components.
[0091] In this disclosure, references to video data memory 230 should not be construed as being limited to memory internal to video encoder 200 unless otherwise explicitly described, or memory external to video encoder 200 unless otherwise explicitly described. Rather, references to video data memory 230 should be understood as a reference memory storing video data (e.g., video data of the current block to be encoded) received by video encoder 200 for encoding. Figure 1 The memory 106 can also temporarily store the outputs from the various units of the video encoder 200.
[0092] Figure 3 The various units are illustrated to aid in understanding the operations performed by the video encoder 200. These units can be implemented as fixed-function circuits, programmable circuits, or a combination thereof. A fixed-function circuit is a circuit that provides a specific function and is pre-programmed to perform certain operations. A programmable circuit is a circuit that can be programmed to perform various tasks and provide flexible functionality within its executable operations. For example, a programmable circuit can run software or firmware that causes it to operate in a manner defined by instructions from the software or firmware. A fixed-function circuit can run software instructions (e.g., receive or output parameters), but the type of operation performed by a fixed-function circuit is typically immutable. In some examples, one or more of these units may be different circuit blocks (fixed-function or programmable), and in some examples, one or more of these units may be integrated circuits.
[0093] The video encoder 200 may include an arithmetic logic unit (ALU), an essential function unit (EFU), digital circuits, analog circuits, and / or a programmable core formed by programmable circuits. In an example where the operation of the video encoder 200 is performed using software running on programmable circuits, memory 106 ( Figure 1 The video encoder 200 may store instructions (e.g., object code) of the software received and executed by the video encoder 200, or another memory (not shown) within the video encoder 200 may store such instructions.
[0094] The video data storage unit 230 is configured to store received video data. The video encoder 200 can retrieve images of the video data from the video data storage unit 230 and provide the video data to the residual generation unit 204 and the mode selection unit 202. The video data in the video data storage unit 230 can be the raw video data to be encoded.
[0095] The mode selection unit 202 includes a motion estimation unit 222, a motion compensation unit 224, and an intra-frame prediction unit 226. The mode selection unit 202 may include additional functional units to perform video prediction based on other prediction modes. As an example, the mode selection unit 202 may include a palette unit, an intra-frame block copying unit (which may be part of the motion estimation unit 222 and / or the motion compensation unit 224), an affine unit, a linear model (LM) unit, etc.
[0096] The mode selection unit 202 typically coordinates multiple encoding passes to test combinations of encoding parameters and the resulting rate-distortion values. Encoding parameters may include segmenting the CTU into CUs, the prediction mode used for the CUs, the transformation type of the residual data used for the CUs, and the quantization parameters of the residual data used for the CUs. The mode selection unit 202 can ultimately select a combination of encoding parameters that yields a better rate-distortion value than other test combinations.
[0097] The video encoder 200 can segment images retrieved from the video data storage 230 into a series of CTUs and encapsulate one or more CTUs within a strip. The mode selection unit 202 can segment the CTUs of the image according to a tree structure (such as the QTBT structure or quadtree structure of HEVC described above). As mentioned above, the video encoder 200 can form one or more CUs by partitioning CTUs according to a tree structure. Such CUs can also generally be referred to as "video blocks" or "blocks".
[0098] Typically, mode selection unit 202 also controls its components (e.g., motion estimation unit 222, motion compensation unit 224, and intra-prediction unit 226) to generate prediction blocks for the current block (e.g., the current CU, or the overlapping portion of PU and TU in HEVC). For inter-frame prediction of the current block, motion estimation unit 222 may perform a motion search to identify one or more closely matching reference blocks in one or more reference pictures (e.g., one or more previously encoded pictures stored in DPB 218). Specifically, motion estimation unit 222 may compute values representing how similar a possible reference block is to the current block, such as, based on sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared difference (MSD), etc. Motion estimation unit 222 may typically perform these computations using the sample-by-sample difference between the current block and the considered reference blocks. Motion estimation unit 222 may identify reference blocks with the lowest values produced by these computations, indicating the reference block that most closely matches the current block.
[0099] Motion estimation unit 222 can generate one or more motion vectors (MVs) that define the position of a reference block in a reference image relative to the position of a current block in the current image. Motion estimation unit 222 can then provide these motion vectors to motion compensation unit 224. For example, for unidirectional inter-frame prediction, motion estimation unit 222 can provide a single motion vector, while for bidirectional inter-frame prediction, it can provide two motion vectors. Motion compensation unit 224 can then use these motion vectors to generate prediction blocks. For example, motion compensation unit 224 can use the motion vectors to retrieve data from reference blocks. As another example, if the motion vectors have fractional sampling precision, motion compensation unit 224 can interpolate the values of the prediction blocks according to one or more interpolation filters. Furthermore, for bidirectional inter-frame prediction, motion compensation unit 224 can retrieve data from two reference blocks identified by corresponding motion vectors and combine the retrieved data by, for example, per-sample averaging or weighted averaging.
[0100] As another example, for intra-prediction or intra-prediction encoding / decoding, intra-prediction unit 226 can generate a prediction block from samples adjacent to the current block. For example, in directional mode, intra-prediction unit 226 can typically mathematically combine the values of adjacent samples and fill these calculated values into the current block along a defined direction to generate a prediction block. As another example, in DC mode, intra-prediction unit 226 can calculate the average of the adjacent samples of the current block and generate a prediction block to include the average generated for each sample of the prediction block.
[0101] Mode selection unit 202 provides the prediction block to residual generation unit 204. Residual generation unit 204 receives the raw, uncoded version of the current block from video data memory 230 and the prediction block from mode selection unit 202. Residual generation unit 204 calculates the sample-by-sample difference between the current block and the prediction block. The resulting sample-by-sample difference defines the residual block for the current block. In some examples, residual generation unit 204 can also determine the differences between sample values in the residual block to generate the residual block using Residual Differential Pulse Code Modulation (RDPCM). In some examples, one or more subtractor circuits performing binary subtraction can be used to form residual generation unit 204.
[0102] In the example where mode selection unit 202 divides a CU into PUs, each PU can be associated with a luma prediction unit and a corresponding chroma prediction unit. Video encoder 200 and video decoder 300 can support PUs of various sizes. As mentioned above, the size of a CU can refer to the size of its luma codec block, and the size of a PU can refer to the size of the luma prediction unit of the PU. Assuming a specific CU size is 2N×2N, video encoder 200 can support 2N×2N or N×N PU sizes for intra-frame prediction, and symmetric PU sizes of 2N×2N, 2N×N, N×2N, N×N, or similar sizes for inter-frame prediction. Video encoder 200 and video decoder 300 can also support asymmetric segmentation of PU sizes of 2N×nU, 2N×nD, nL×2N, and nR×2N for inter-frame prediction.
[0103] In the example where mode selection unit 202 does not further divide the CU into PUs, each CU can be associated with a luma coding block and a corresponding chroma coding block. As mentioned above, the size of the CU can refer to the size of the luma coding block of the CU. The video encoder 200 and the video decoder 300 can support CU sizes of 2N×2N, 2N×N, or N×2N.
[0104] For other video coding techniques, such as intra-block copy mode coding, affine mode coding, and linear model (LM) mode coding (as a few examples), mode selection unit 202 generates a prediction block for the current block being encoded via a corresponding unit associated with the coding technique. In some examples, such as palette mode coding, mode selection unit 202 may not generate a prediction block, but instead generate syntax elements indicating how the block should be reconstructed based on the selected palette. In this mode, mode selection unit 202 can provide these syntax elements to entropy coding unit 220 for encoding.
[0105] As described above, the residual generation unit 204 receives video data of the current block and the corresponding prediction block. Then, the residual generation unit 204 generates a residual block for the current block. To generate the residual block, the residual generation unit 204 calculates the sample-by-sample difference between the prediction block and the current block.
[0106] Transform processing unit 206 applies one or more transformations to the residual block to generate a block of transform coefficients (referred to herein as a "transform coefficient block"). Transform processing unit 206 may apply various transformations to the residual block to form the transform coefficient block. For example, transform processing unit 206 may apply a discrete cosine transform (DCT), direction transformation, Karhunen-Loeve transform (KLT), or conceptually similar transformations to the residual block. In some examples, transform processing unit 206 may perform multiple transformations on the residual block, such as primary and secondary transformations, such as rotation transformations. In some examples, transform processing unit 206 does not apply any transformations to the residual block.
[0107] Quantization unit 208 can quantize the transform coefficients in a transform coefficient block to produce a quantized transform coefficient block. Quantization unit 208 can quantize the transform coefficients of the transform coefficient block based on the quantization parameter (QP) value associated with the current block. Video encoder 200 (e.g., via mode selection unit 202) can adjust the degree of quantization applied to the transform coefficient block associated with the current block by adjusting the QP value associated with the CU. Quantization may result in information loss; therefore, the quantized transform coefficients may have lower accuracy than the original transform coefficients generated by transform processing unit 206.
[0108] The inverse quantization unit 210 and the inverse transform processing unit 212 can apply inverse quantization and inverse transform to the quantized transform coefficient block, respectively, to reconstruct the residual block from the transform coefficient block. The reconstruction unit 214 can generate a reconstructed block corresponding to the current block based on the reconstructed residual block and the prediction block generated by the mode selection unit 202 (although it may have some degree of distortion). For example, the reconstruction unit 214 can add samples of the reconstructed residual block to the corresponding samples of the prediction block generated by the mode selection unit 202 to generate the reconstructed block.
[0109] Filter unit 216 can perform one or more filtering operations on the reconstructed block. For example, filter unit 216 can perform a deblocking operation to reduce block artifacts along the edges of the CU. In some examples, the operation of filter unit 216 can be skipped.
[0110] The video encoder 200 stores reconstructed blocks in the DPB 218. For example, in an example where the operation of the filter unit 216 is not required, the reconstruction unit 214 can store the reconstructed blocks in the DPB 218. In an example where the operation of the filter unit 216 is required, the filter unit 216 can store the filtered reconstructed blocks in the decoded image buffer 218. The motion estimation unit 222 and the motion compensation unit 224 can retrieve reference images from the DPB 218, which are formed from the reconstructed (and possibly filtered) blocks, to perform inter-frame prediction of blocks in subsequently encoded images. Furthermore, the intra-frame prediction unit 226 can use the reconstructed blocks in the DPB 218 of the current image to perform intra-frame prediction of other blocks in the current image.
[0111] Typically, entropy coding unit 220 can entropy code syntax elements received from other functional components of video encoder 200. For example, entropy coding unit 220 can entropy code quantized transform coefficient blocks from quantization unit 208. As another example, entropy coding unit 220 can entropy code predictive syntax elements (e.g., motion information for inter-frame prediction or intra-frame mode information for intra-frame prediction) from mode selection unit 202. Entropy coding unit 220 can perform one or more entropy coding operations on syntax elements, which is another example of video data, to generate entropy-coded data. For example, entropy coding unit 220 can perform context-adaptive variable-length codec (CAVLC), CABAC, variable-to-variable (V2V) length codec, syntax-based context-adaptive binary arithmetic codec (SBAC), probability interval partitioned entropy (PIPE) codec, exponential-Golomb coding, or another type of entropy coding operation on the data. In some examples, the entropy coding unit 220 can operate in a bypass mode where syntax elements are not entropy encoded.
[0112] The video encoder 200 can output a bitstream containing the entropy-encoded syntax elements required to reconstruct strips or images. Specifically, the entropy coding unit 220 can output the bitstream.
[0113] The operations described above are described at the block level. This description should be understood as operations applied to luma-encoded blocks and / or chroma-encoded blocks. As mentioned above, in some examples, the luma-encoded block and chroma-encoded block are the luma and chroma components of the CU. In some examples, the luma-encoded block and chroma-encoded block are the luma and chroma components of the PU.
[0114] In some examples, it is not necessary to repeat the operations performed for the luma codec block for the chroma codec block. As an example, it is not necessary to repeat the operations of identifying the MV and reference image of the luma codec block in order to identify the motion vector (MV) and reference image of the chroma block. Instead, the MV of the luma codec block can be scaled to determine the MV of the chroma block, and the reference image can be the same. As another example, intra-frame prediction processing can be the same for both the luma and chroma codec blocks.
[0115] Video encoder 200 represents an example of a device configured to encode video data, the device including a memory configured to store the video data, and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first frame of the video data; assign a first DRA APS ID to the first DRA APS; determine a second DRA APS for a second frame of the video data; assign a second DRA APS ID to the second DRA APS; signal the first DRA APS in the bitstream; process the first frame according to the first DRA APS; determine whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, process the second frame according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signal the second DRA APS in the bitstream and process the second frame according to the second DRA APS.
[0116] Figure 4 This is a block diagram illustrating an example video decoder 300 that can perform the techniques of this disclosure. Figure 4 This disclosure is provided for illustrative purposes and does not limit the techniques extensively exemplified and described herein. For illustrative purposes, this disclosure describes the video decoder 300 based on the techniques of VVC (ITU-T H.266 under development) and HEVC (ITU-T H.265). However, the techniques of this disclosure can be implemented by video codec devices configured for other video codec standards.
[0117] exist Figure 4In the example, the video decoder 300 includes a codec picture buffer (CPB) memory 320, an entropy decoding unit 302, a prediction processing unit 304, an inverse quantization unit 306, an inverse transform processing unit 308, a reconstruction unit 310, a filter unit 312, and a decoded picture buffer (DPB) 314. Any one or all of the CPB memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, filter unit 312, and DPB 314 can be implemented in one or more processors or processing circuitry. For example, the units of the video decoder 300 can be implemented as one or more circuit or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Furthermore, the video decoder 300 may include additional or alternative processors or processing circuitry to perform these and other functions.
[0118] The prediction processing unit 304 includes a motion compensation unit 316 and an intra-prediction unit 318. The prediction processing unit 304 may include additional units to perform predictions based on other prediction modes. As an example, the prediction processing unit 304 may include a palette unit, an intra-block copying unit (which may form part of the motion compensation unit 316), an affine unit, a linear model (LM) unit, etc. In other examples, the video decoder 300 may include more, fewer, or different functional components.
[0119] CPB memory 320 can store video data to be decoded by components of video decoder 300, such as encoded video bitstreams. The video data stored in CPB memory 320 can, for example, be obtained from computer-readable medium 110 (…). Figure 1 The CPB memory 320 may include a CPB that stores encoded video data (e.g., syntax elements) from the encoded video bitstream. Additionally, the CPB memory 320 may store video data other than the syntax elements of the encoded picture, such as temporary data representing the output from the various units of the video decoder 300. The DPB 314 typically stores decoded pictures that the video decoder 300 may output and / or use as reference video data while decoding subsequent data or pictures from the encoded video bitstream. The CPB memory 320 and DPB 314 may be formed of any of a variety of memory devices, such as DRAM, including SDRAM, MRAM, RRAM, or other types of memory devices. The CPB memory 320 and DPB 314 may be provided by the same memory device or separate memory devices. In various examples, the CPB memory 320 may be on-chip with other components of the video decoder 300, or off-chip relative to those components.
[0120] Additionally or alternatively, in some examples, the video decoder 300 can be drawn from the memory 120 ( Figure 1 The memory 120 can retrieve encoded and decoded video data. In other words, the memory 120 can store data using the CPB memory 320 as described above. Similarly, when some or all of the functions of the video decoder 300 are implemented by software to be run by the processing circuitry of the video decoder 300, the memory 120 can store instructions to be executed by the video decoder 300.
[0121] Figure 4 The various units shown are illustrated to aid in understanding the operations performed by the video decoder 300. These units can be implemented as fixed-function circuits, programmable circuits, or a combination thereof. Similar to... Figure 3 Fixed-function circuits are circuits that provide a specific function and are pre-programmed to perform certain operations. Programmable circuits are circuits that can be programmed to perform various tasks and provide flexible functionality within the operable operations. For example, a programmable circuit can execute software or firmware that causes it to operate in a manner defined by software or firmware instructions. Fixed-function circuits can execute software instructions (e.g., receiving or outputting parameters), but the type of operation performed by a fixed-function circuit is typically immutable. In some examples, one or more of these units may be different circuit blocks (fixed-function or programmable), and in some examples, one or more of these units may be integrated circuits.
[0122] The video decoder 300 may include an ALU, an EFU, digital circuitry, analog circuitry, and / or a programmable core formed by programmable circuitry. In an example where the operation of the video decoder 300 is performed by software running on the programmable circuitry, on-chip or off-chip memory may store instructions (e.g., object code) of the software received and executed by the video decoder 300.
[0123] Entropy decoding unit 302 can receive encoded video data from the CPB and perform entropy decoding on the video data to reproduce the syntax elements. Prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, and filter unit 312 can generate decoded video data based on the syntax elements extracted from the bitstream.
[0124] Typically, the video decoder 300 reconstructs the image on a block-by-block basis. The video decoder 300 can perform the reconstruction operation separately for each block (where the block currently being reconstructed (i.e., decoded) can be referred to as the "current block").
[0125] Entropy decoding unit 302 can entropy decode the syntax elements of the quantized transform coefficients defining the quantized transform coefficient block, as well as transform information (such as quantization parameters (QP) and / or (one or more) transform mode indications). Inverse quantization unit 306 can use the QP associated with the quantized transform coefficient block to determine the degree of quantization, and similarly, determine the degree of inverse quantization to be applied by inverse quantization unit 306. Inverse quantization unit 306 can, for example, perform a bit-by-bit left shift operation to inverse quantize the quantized transform coefficients. Inverse quantization unit 306 can thereby form a transform coefficient block including the transform coefficients.
[0126] After the inverse quantization unit 306 forms the transform coefficient block, the inverse transform processing unit 308 can apply one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block. For example, the inverse transform processing unit 308 can apply inverse DCT, inverse integer transform, inverse Karhunen-Loeve (KLT), inverse rotation transform, inverse direction transform, or another inverse transform to the transform coefficient block.
[0127] Furthermore, prediction processing unit 304 generates prediction blocks based on prediction information syntax elements entropy decoded by entropy decoding unit 302. For example, if the prediction information syntax elements indicate that the current block is inter-frame predicted, motion compensation unit 316 can generate prediction blocks. In this case, the prediction information syntax elements may indicate a reference image in DPB 314 from which a reference block is retrieved, and a motion vector identifying the position of the reference block in the reference image relative to the position of the current block in the current image. Motion compensation unit 316 can typically be configured in conjunction with motion compensation unit 224 ( Figure 3 The process is performed in a largely similar manner to that described above.
[0128] As another example, if the prediction information syntax element indicates that the current block is intra-predictive, then intra-predictive unit 318 can generate a prediction block according to the intra-predictive mode indicated by the prediction information syntax element. Similarly, intra-predictive unit 318 can typically be configured with respect to intra-predictive unit 226 ( Figure 3 Intra-prediction processing is performed in a largely similar manner to that described. Intra-prediction unit 318 can retrieve data of neighboring samples of the current block from DPB 314.
[0129] Reconstruction unit 310 can use the prediction block and the residual block to reconstruct the current block. For example, reconstruction unit 310 can add samples from the residual block to the corresponding samples from the prediction block to reconstruct the current block.
[0130] Filter unit 312 can perform one or more filtering operations on the reconstructed block. For example, filter unit 312 can perform a deblocking operation to reduce block artifacts along the edges of the reconstructed block. The operation of filter unit 312 is not necessarily performed in all examples.
[0131] The video decoder 300 can store the reconstructed blocks in the DPB 314. For example, in an example where the filter unit 312 is not operated, the reconstruction unit 310 can store the reconstructed blocks in the DPB 314. In an example where the filter unit 312 is operated, the filter unit 312 can store the filtered reconstructed blocks in the DPB 314. As described above, the DPB 314 can provide reference information to the prediction processing unit 304, such as samples of the current image for intra-frame prediction and previously decoded images for subsequent motion compensation. Furthermore, the video decoder 300 can output decoded images (e.g., decoded video) from the DPB 314 for display on a display device (such as...). Figure 1 It will then be displayed on the display device 118.
[0132] Video decoder 300 represents a device configured to decode video data, the device including a memory configured to store the video data, and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of the video data; determine the DRA APS for the first image; store the DRA APS in an APS buffer; determine a second DRA APS ID for a second image of the video data; prevent the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and process the first image and the second image according to the DRA APS.
[0133] Next-generation video applications can operate using video data representing captured landscapes with High Dynamic Range (HDR) and Wide Color Gamut (WCG). The dynamic range and color gamut parameters used are two separate properties of video content, and their specifications for digital television and multimedia services are defined by multiple international standards. For example, ITU-R Rec.709 defines parameters for High Definition Television (HDTV), such as Standard Dynamic Range (SDR) and Standard Color Gamut (SCG), while ITU-R Rec.2020 specifies parameters for Ultra High Definition Television (UHDTV), such as HDR and WCG. Other Standards Development Organization (SDO) documents specify these properties in other systems; for example, the P3 color gamut is defined in SMPTE-231-2, and some parameters of HDR are defined in SMPTE-2084. The dynamic range and color gamut of video data are briefly described below.
[0134] Dynamic range is typically defined as the ratio between the minimum and maximum brightness of a video signal. Dynamic range can also be measured in "aperture range (f-stop)," where one aperture stop corresponds to twice the signal's dynamic range. In the MPEG definition, HDR content is content with brightness variations exceeding 16 aperture stops. In some definitions, levels between 10 and 16 aperture stops are considered intermediate dynamic range, but in others, they are considered HDR. Meanwhile, the human visual system (HVS) can perceive an even greater dynamic range, but the HVS includes an adaptive mechanism to narrow the so-called simultaneous range.
[0135] Video applications and services may be regulated under Rec.709 and provide SDR, typically supporting a brightness (or luminance) range of approximately 0.1 to 100 candela (cd) (often referred to as "nits") per square meter, resulting in a dynamic range of less than 10 apertures. Next-generation video services are expected to offer a dynamic range of up to 16 apertures, with some parameters specified in SMPTE-2084 and Rec.2020.
[0136] Figure 5 It is a conceptual diagram illustrating human vision and display capabilities. Figure 5 A visualization depicts the dynamic range provided by SDR in HDTV, the expected HDR in UHDTV, and the dynamic range in HVS, although the exact range can vary based on the individual and the monitor.
[0137] Figure 6 This is a conceptual diagram showing an example color gamut map. Besides HDR, one aspect of a more realistic video experience is the color dimension, which is typically defined by the color gamut. Figure 6The example shows the visual representation of the SDR color gamut (based on triangle 400 of the BT.709 red, green and blue primary colors) and the wider color gamut of UHDTV (based on triangle 402 of the BT.2020 red, green and blue primary colors). Figure 6 It also depicts the so-called spectral trajectory (defined by the tongue-shaped region 404), representing the boundary of natural colors. For example... Figure 6 As shown, the primary colors moved from BT.709 (triangle 400) to BT.2020 (triangle 402) are designed to provide approximately 70% or more of the color for UHDTV services. D65 specifies white in the BT.709 and / or BT.2020 specifications.
[0138] Table 1 shows some examples of color gamut specifications for the DCI-P3, BT.709, and BT.2020 color spaces.
[0139]
[0140] Table 1 - Colorimetric parameters of the selected color space
[0141] Now let's discuss the compression of HDR video data. HDR / WCG typically uses a 4:4:4 chroma format and a very wide color space (e.g., XYZ) to acquire and store data with very high per-component precision (it can even be stored with floating-point precision). This representation aims for high precision and is likely (almost) mathematically lossless. However, this format has many redundancies and is not optimal for compression purposes. Lower-precision formats with HVS assumptions are commonly used in current state-of-the-art video applications.
[0142] Figure 7 This is a block diagram illustrating an example format conversion technique. The video encoder 200 can perform format conversion techniques to transform linear RGB 410 into HDR data 418. These techniques may include, for example... Figure 7 The three main elements shown are: 1) a non-linear transfer function (TF) 412 for dynamic range compression; 2) a color conversion 414 to a more compact or robust color space; and 3) a floating-point to integer representation conversion unit (quantization 416).
[0143] Figure 7The technique can be performed by the source device 12 (which can be an example of video encoder 200). Linear RGB data 410 can be HDR / WCG video data and can be stored in floating-point representation. Linear RGB data 410 can be compressed using TF 412 for dynamic range compression. TF 412 can compress linear RGB data 410 using any number of non-linear transfer functions, such as the perceptual quantizer (PQ) TF defined in SMPTE-2084. In some examples, the color conversion process 414 converts the compressed data into a more compact or robust color space (e.g., YUV or YCrCb color space) more suitable for compression by the hybrid video encoder. The hybrid video encoder utilizes a predictive video encoder when encoding video data. This more compact data can be quantized using a floating-point to integer representation quantization unit 416 to produce converted HDR' data 418. In this example, HDR' data 418 is represented in integer form. HDR' data 418 is now in a format more suitable for compression by the hybrid video encoder (e.g., video encoder 200). Figure 7 The order of processes described is given as an example and may differ in other applications. For example, color conversion may precede the TF process. Furthermore, additional processing, such as spatial subsampling, may be applied to the color components.
[0144] Figure 8 This is a block diagram illustrating an example of reverse format conversion technology. The video decoder 300 can perform... Figure 8 The inverse transformation technique includes inverse quantization 422, inverse color transformation process 424 and inverse transfer function 426 to inverse transform HDR' data 420 into linear RGB 428.
[0145] Figure 8 The technique can be performed by destination device 14 (which may be an example of video decoder 300). Converted HDR' data 420 can be obtained at destination device 14 by decoding the video data using a hybrid video decoder (e.g., video decoder 300 applying HEVC technology). The hybrid video decoder utilizes a predictive video decoder while decoding the video data. Destination device 14 can dequantize the HDR' data 420 via an inverse quantization unit. An inverse color conversion process 424 can then be applied to the inverse HDR' data. The inverse color conversion process 424 can be the inverse of the color conversion process 414. For example, the inverse color conversion process 424 can convert the HDR' data from YCrCb format back to RGB format. An inverse transfer function 426 can be applied to the data to add back the dynamic range compressed by TF 412 to recreate linear RGB data 428.
[0146] The high dynamic range of input RGB data in linear and floating-point representations can be compressed using a TF (such as the PQ TF defined in SMPTE-2084). After compression, the video encoder 200 can convert the compressed data into a target color space more suitable for compression, such as YCbCr. The video encoder 200 can quantize the color-converted data to achieve an integer representation. Figure 7 and Figure 8 The technical order is provided as an example and may differ in real-world applications. For instance, color conversion may precede the TF module and additional processing, such as spatial subsampling, which may be applied to the color components.
[0147] TF is applied to data to compress its dynamic range and make it possible to represent the data with a finite number of bits. For example, a video encoder 200 can apply TF to compress the dynamic range of video data. This function is typically a one-dimensional (1D) nonlinear function that reflects the inverse of the electro-optical transfer function (EOTF) for the end-user display specified for SDR in Rec. 709, or an approximation of HVS perception of luminance changes (such as for the perceptual quantizer (PQ) TF specified for HDR in SMPTE-2084). The inverse process of the OETF (electro-optical transfer function) is the EOTF, which maps codelevels back to illuminance. Figure 9 Several examples of TF are shown.
[0148] The ST2084 specification defines the EOTF application as follows. Applying TF to normalized linear R, G, B values results in a non-linear representation of R'G'B'. ST2084 defines normalization by NORM = 10000, which is related to a peak luminance of 10000 nits (cd / m2).
[0149]
[0150] in
[0151]
[0152]
[0153]
[0154]
[0155]
[0156] Figure 10 This is a graph showing an example of a normalized output nonlinear value based on a normalized linear input value. Figure 10This depicts the input values (linear color values) normalized to the 0..1 range using PQ EOTF and the normalized output values (non-linear color values). For example... Figure 10 As shown, 1% of the dynamic range of the input signal (low illumination) is converted to 50% of the dynamic range of the output signal.
[0157] Typically, EOTF is defined as a function with floating-point precision, so if the inverse TF (e.g., the so-called OETF) is applied, errors will not be introduced into the signal, which has this non-linearity. The inverse TF (OETF) specified in ST2084 is defined as the inverse PQ function, as follows:
[0158]
[0159]
[0160] in
[0161]
[0162]
[0163]
[0164]
[0165]
[0166] Utilizing floating-point precision, sequential application of EOTF and OETF can provide perfect reconstruction without errors. However, this representation is not optimal for streaming or broadcast services. The following sections describe a more compact representation of nonlinear R'G'B' data with fixed bit precision.
[0167] Please note that EOTF and OETF are currently a very active research topic, and the TF used in some HDR video codec systems may be different from ST2084.
[0168] The color conversion technique is now described. RGB data is commonly used as input because it is typically generated by image capture sensors. However, the RGB color space has high redundancy among its RGB components and may not be optimal for a compact representation. To achieve a more compact and robust representation, the RGB components are often converted to a less correlated color space, such as YCbCr, which is better suited for compression. This color space separates luminance into different, less correlated components in the form of illuminance and color information.
[0169] For modern video codec systems, the commonly used color space is YCbCr, as described in ITU-R BT.709 or ITU-R BT.709. The YCbCr color space in the BT.709 standard specifies the following conversion process from R'G'B' to Y'CbCr (non-constant illuminance representation):
[0170]
[0171] The above can also be achieved using the following approximate transformation, which avoids the separation of Cb and Cr components:
[0172]
[0173]
[0174] The ITU-R BT.2020 standard specifies the following conversion process from R'G'B' to Y'CbCr (non-constant illuminance representation):
[0175]
[0176] The above can also be achieved using the following approximate transformation, which avoids the separation of Cb and Cr components:
[0177]
[0178] It should be noted that both color spaces are normalized. Therefore, for input values normalized to the range 0...1, the resulting values will be mapped to the range 0...1. Typically, color transformations implemented with floating-point precision provide perfect reconstruction, so this process can be lossless.
[0179] Quantization (or fixed-point conversion) will now be described in more detail. All the processing stages described above can typically be implemented with floating-point precision and can therefore be considered lossless. However, floating-point precision can be considered expensive for most consumer electronics applications. Therefore, input data in the target color space can be converted to target bit-deep fixed-point precision, saving bandwidth and memory. Some studies suggest that 10-12 bit precision combined with PQ TF is sufficient to provide HDR data across 16 aperture ranges with distortion below the Limit of Resolved Difference (JND). Generally, JND is the amount of something (e.g., video data) that must be changed to make differences noticeable (e.g., through HVS). Data represented with 10-bit precision can be further encoded / decoded using most of the current state-of-the-art video codec solutions. This conversion process includes signal quantization, which is an element of lossy encoding and decoding and a source of inaccuracies introduced into the converted data.
[0180] An example of such quantization applied to codewords in a target color space (e.g., YCbCr) is shown below. An input value YCbCr represented in floating-point precision can be converted to a signal having a fixed bit depth BitDepthY for the Y value and BitDepthC for the chroma values (Cb, Cr). For example, video encoder 200 can convert the input value from floating-point precision to a fixed-bit-depth signal.
[0181]
[0182] where
[0183] Round(x) = Sign(x) * Floor(Abs(x) + 0.5)
[0184] If x < 0, then Sign(x) = -1, if x = 0, then Sign(x) = 0, if x > 0, then Sign(x) = 1
[0185] Floor(x) is the largest integer less than or equal to x
[0186] If x >= 0, then Abs(x) = x, if x < 0, then Abs(x) = -x
[0187] Clip1 Y (x) = Clip3(0, (1 << BitDepth Y ) - 1, x)
[0188] Clip1 C (x) = Clip3(0, (1 << BitDepth C ) - 1, x)
[0189] If z < x, then Clip3(x, y, z) = x, if z > y, then
[0190] Clip3(x, y, z) = y, otherwise, Clip3(x, y, z) = z
[0191] In the document, D. Rusanovskyy, AK Ramasubramonian, D. Bugdayci, S. Lee, J. Sole, M. Karczewicz, "Dynamic Range Adjustment SEI to enable High Dynamic Range videocoding with Back-Compatible Capability," VCEG paper COM16-C 1027-E, September 2015, the authors propose implementing DRA as a piecewise linear function f(x), defined by a set of non-overlapping dynamic range partitions (ranges) {Ri} for the input value x, where i is the index of the range from 0 to N-1 (inclusive), and N is the total number of ranges {Ri} used to define the DRA function. For example, suppose the range of DRA is defined by the minimum and maximum x values belonging to the range Ri, e.g., [x i x i+1 -1], where x i and x i+1 Representing the range R respectively i and R i+1 The minimum value. Applied to the Y color component of video (luma), the DRA function Sy is proportional to S. y,i and offset O y,i To define, they apply to every x∈[x i x i+1 -1], therefore S y ={S y,i O y,i}
[0192] In this way, for any Ri and every x∈[x] i x i+1 -1], the output value X is calculated as follows:
[0193] X = S y,i *(xO y,i (8)
[0194] For the inverse DRA mapping process, for the luminance component Y at the decoder (e.g., video decoder 300), the DRA function Sy is defined by the inverse of the scale S_(y,i) and the offset O_(y,i) value, which is applied to each X∈[X_i, X_(i+1)-1].
[0195] Therefore, for any Ri and every X∈[X] i X i+1 -1], the reconstructed value X is calculated as follows:
[0196] x = X / S y,i +O y,i (9)
[0197] The forward DRA mapping process for chrominance components Cb and Cr (e.g., performed by video encoder 200) is defined as follows: An example is given, where the term "u" denotes a sample of the Cb color component belonging to the range Ri, u ∈ [u i u i+1 -1], therefore, S u ={S u,i O u,i}:
[0198] U = S u,i *(uO y,i )+Offset (10)
[0199] Where Offset equals 2 (bitdepth-1) This indicates the offset of the bipolar Cb and Cr signals.
[0200] The inverse DRA mapping process performed at the decoder for the chrominance components Cb and Cr (e.g., by the video decoder 300) is defined as follows: An example is given, where the term U denotes a sample of the remapped Cb color component belonging to the range Ri, U∈[U... i U i+1 -1]:
[0201] u = (U - Offset) / S u,i +O y,i (11)
[0202] Where Offset equals 2 (bitdepth-1) This indicates the offset of the bipolar Cb and Cr signals.
[0203] Luminance-driven chroma scaling (LCS) is now described. LCS was originally proposed in the following paper: JCTVC-W0101 HDR CE2: Report on CE2.a-1LCS, AK, J. Sole. Ramasubramonian, D. Rusanovskyy, D. Bugdayci, M. Karczewicz. In this paper, a technique is disclosed to adjust chroma information (e.g., Cb and Cr) by utilizing luminance information associated with the processed chroma samples. Similarly, for the DRA method discussed above, LCS proposes to apply scaling factors S_u for chroma samples, Cb and S_(v,i) for Cr. However, instead of defining the DRA function as a piecewise linear function S_u = {S_(u,i), O_(u,i)} over a range set {R_i} accessible by chroma values u or v, as in equations (8) and (9), the LCS method proposes to derive the scaling factors of the chroma samples using luminance values Y. The video encoder 200 can perform forward LCS mapping of chroma samples u (or v) using the following formula:
[0204] U = S u,i (Y)*(u-Offset)+Offset (12)
[0205] The video decoder 300 can perform the inverse LCS process using the following formula:
[0206] u = (U - Offset) / S u,i (Y)+Offset (13)
[0207] More specifically, for a given pixel located at (x, y), the chromaticity sample Cb(x, y) or Cr(x, y) can be scaled using a factor derived from the LCS function SCb (or SCr) of the pixels accessible from the pixel's luminance value Y'(x, y).
[0208] Using forward LCS, for a chromaticity sample, the Cb (or Cr) value and its associated luminance value Y' can be used as a chromaticity scaling function S. Cb (or S) Cr The input is , and Cb or Cr can be converted to Cb' and Cr', as shown in Equation 14. The video decoder 300 can apply the inverse LCS, and the reconstructed Cb' or Cr' can be converted to Cb or Cr, as shown in Equation (15).
[0209] Cb′(x, y)=S Cb (Y′(x,y))*Cb(x,y),
[0210] Cr′(x, y)=S Cr9Y′(x,y))*Cr(x,y) (14)
[0211]
[0212]
[0213] Figure 11 This is a diagram illustrating an example of the LCS function. Utilizing... Figure 11 In the example, the LCS function 450 multiplies the chromaticity component of pixels with smaller luminance values by a smaller scaling factor.
[0214] The relationship between DRA sample scaling and quantization parameters in video codecs will now be discussed. To adjust the compression ratio at the encoder (e.g., video encoder 200), block transform-based video codec schemes such as HEVC utilize scalar quantizers applied to the block transform coefficients.
[0215] Xq = X / scalerQP
[0216] Where Xq is the quantized code value of the transform coefficient X, which is generated by applying the scalar scalerQP derived from the QP parameters. In most codecs, the quantized code value is approximated as an integer value (e.g., through rounding). In some codecs, quantization may be a different function that depends not only on QP but also on other parameters of the codec.
[0217] The scalar value scalerQP is controlled by QP, and the relationship between QP and the scalar quantizer is defined as follows, where k is a known constant:
[0218] scalerQP=k*2^(QP / 6) (16)
[0219] (The inverse function definition applicable to the video decoder 300) relates to the scalar quantizer of the transform coefficients and the QP of HEVC as follows:
[0220] QP=ln(scalerQP / k)*6 / ln(2); (17)
[0221] In turn, an additive change in the QP value (e.g., deltaQP) will result in a multiplicative change in the scalerQP value applied to the transform coefficients.
[0222] DRA effectively applies the scaleDRA value to the pixel sample value, and taking into account the transformation properties, it can be combined with the scalerQP value as follows:
[0223] Xq = T(scaleDRA*x) / scaleQP
[0224] Here, Xq is a quantized transform coefficient generated by the transform T of scaled x-sample values and scaled using the scaleQP applied in the transform domain. Therefore, applying the multiplier scaleDRA in the pixel domain results in an effective change in the scalar quantizer scaleQP applied in the transform domain. This can then be interpreted as an additive change in the QP parameters applied to the currently processed data block:
[0225] dQP=log2(scaleDRA)*6; (18)
[0226] Wherein dQP is the approximate QP offset introduced by HEVC by deploying DRA on the input data.
[0227] We now discuss the chroma QP dependency on the luma QP value. Some current state-of-the-art video codec designs, such as HEVC and newer designs, can leverage predefined dependencies between the luma and chroma QP values that are efficiently applied to the processing of the current codec block Cb. This dependency can be used to achieve optimal (or relatively optimal) bitrate allocation between the luma and chroma components.
[0228] An example of this dependency is shown in Table 8-10 of the HEVC specification in the aforementioned paper entitled "Dynamic Range Adjustment SEI to enable High Dynamic Range video coding with Backward-Compatible Capability," where the QP values applied to decoding chroma samples are derived from the QP values used to decode luma samples. The chroma QP values are derived from the relevant portion of the QP values of the corresponding luma samples (e.g., the QP values applied to the block or TU to which the corresponding luma sample belongs), and the chroma QP offset of the HEVC specification is reproduced as follows:
[0229] When ChromaArrayType is not equal to 0, the following conditions apply:
[0230] – Variable qP Cb and qP Cr The derivation is as follows:
[0231] – If tu_residual_act_flag[xTbY][yTbY] equals 0, then the following applies:
[0232] qPi Cb =Clip3(-QpBdOffset) C 57, Qp Y+pps_cb_qp_offset+slice_cb_qp_offset+CuQpOffset Cb (8-287)
[0233] qPi Cr =Clip3(QpBdOffset) C 57, Qp Y +pps_cr_qp_offset+slice_cr_qp_offset+CuQpOffset Cr (8-288)
[0234] – Otherwise (tu_residual_act_flag[xTbY][yTbY] equals 1), the following applies:
[0235] qPi Cb =Clip3(-QpBdOffsetC, 57, QpY+PpsActQpOffsetCb+slice_act_cb_qp_offset+CuQpOffsetCb) (8-289)
[0236] qPi Cr =Clip3(QpBdOffsetC, 57, QpY+PpsActQpOffsetCr+slice_act_cr_qp_offset+CuQpOffsetCr) (8-290)
[0237] - If ChromaArrayType equals 1, then as specified in Table 8-10, based on index qPi, it equals qPi respectively. Cb and qPi Cr variable qP Cb and qP Cr Set it to the value equal to QpC.
[0238] Otherwise, based on index qPi, which is equal to qPi respectively Cb and qPi Cr variable qP Cb and qP Cr It is set to be equal to Min(qPi, 51).
[0239] -Cb and Cr components Qp′ Cb and Qp′ Cr The chromaticity quantification parameters are derived as follows:
[0240] Qp′ Cb =qP Cb +QpBdOffset C(8-291)
[0241] Qp′ Cr =qP Cr +QpBdOffset C (8-292)
[0242] Figure 12 This is a conceptual diagram illustrating Table 8-10 of the HEVC specification. Table 8-10 details the specification for QpC as a function of qPi for ChromaArrayType equal to 1.
[0243] The derivation of the DRA chroma scale will now be discussed. In video codec systems employing uniform scalar quantization in the transform domain and pixel-domain scaling using DRA (such as video encoder 200 or video decoder 300), the derivation of the scaled DRA values applied to the chroma sample (Sx) may depend on the following:
[0244] –S Y : Brightness scaling value of the associated brightness sample
[0245] –S CX : Derived from the scaling of the content color gamut, where CX represents Cb or Cr (if applicable).
[0246] –S corr : Correcting scaling terms, based on taking into account mismatches in transform encoding / decoding and DRA scaling, for example, to compensate for the dependencies introduced by HEVC's Table 8-10.
[0247] SX = fun(SY, SCX, Scorr).
[0248] An example is a separable function defined as follows: S X =f1(S Y )*f2(S CX )*f3(S corrr )
[0249] The bump operation is now described. The Decode Picture Buffer (DPB), such as DPB 218 or DPB 314, maintains a set of pictures / frames that can be used as references for inter-picture predictions in the codec loop of a codec (e.g., video encoder 200 or video decoder 300). Depending on the codec state, one or more pictures can be output for use by or read by an external application. Depending on the codec order, DPB size, or other conditions, pictures no longer used in the codec loop and consumed by an external application can be removed from the DPB or replaced by updated reference pictures. The process of outputting and removing pictures from the DPB is called the bump process. An example of a bump process defined for HEVC is cited below:
[0250] C.5.2.4 "Collision" process
[0251] The "collision" process consists of the following ordered steps:
[0252] 1. Select the first image to be output as the image with the minimum PicOrderCntVal value among all images marked "to be output" in DPB.
[0253] 2. The image is cropped. The consistent cropping window specified for the image in the active SPS is used to output the cropped image, and the image is marked as "not required for output".
[0254] 3. The image storage buffer is emptied when it contains images marked "not used for reference" in the image storage buffer that includes the cropped and output images.
[0255] Note - For any two images picA and picB that belong to the same CVS and are output through the "tap-out process", when picA is output earlier than picB, the value of PicOrderCntVal of picA is less than the value of PicOrderCntVal of picB.
[0256] The push-out operation utilizing DRA is now described. The draft text of the MPEG 5EVC specification adopts DRA post-processing, which takes the form of a modified push-out procedure. Below is an excerpt of a clause from the specification text, covering the push-out procedure with the proposed changes. The changes are marked with the following start marker: <change> And the changed end marker is< / change> Note that Figure C2 mentioned below is from this invention. Figure 13 And in Figure 13 The changes are also marked in the text.
[0257] Appendix C: Hypothetical Reference Decoder
[0258] The HRD includes an encoded image buffer (CPB), a transient decoding process, a decoded image buffer (DPB), an output DRA, and cropping, as shown in Figure C2. Figure 13 The hypothetical reference decoder 460 is shown.
[0259] The operation of DPB is specified in sub-clause C.3. The output DRA process and trimming are specified in sub-clauses C.3.3 and C.5.2.4.
[0260] C.3.3 Image Decoding and Output
[0261] Image n is decoded, and its DPB output time t o,dpb (n) is derived from the following formula.
[0262] to,dpb (n)=t r (n)+t c *dpb_output_delay(n) (C-12)
[0263] The output of the current image is specified as follows.
[0264] –If t o,dpb (n)=t r If (n), then output the current image.
[0265] –otherwise(t) o,dpb (n)>t r (n)), the current image will be output later and stored in the DPB (as described in sub-clause C.2.4) and at time t o,dpb (n) is output unless in t o,dpb The time before (n) is indicated by decoding or inference when no_output_of_prior_pics_flag equals 1, indicating that it will not be output.
[0266] <change> The output image should be derived by calling the DRA procedure specified in sub-clause 8.9.2 and cropped using the cropping rectangle specified for the sequence in SPS.< / change>
[0267] When image n is the image to be output but is not the last image in the output bitstream, Δt o,dpb The value of (n) is defined as:
[0268] Δt o,dpb (n)=t o,dpb (n n )-t o,dpb (n) (C-13)
[0269] Where n n This indicates the image that appears after image n in the output order.
[0270] The decoded image is stored in the DPB.
[0271] C.5.2.4 "Collision" process
[0272] The "touchout" procedure is invoked in the following situations.
[0273] – The current image is an IDR image, and no_output_of_prior_pics_flag is not equal to 1 and is not inferred to be equal to 1, as specified in sub-clause C.5.2.2.
[0274] - There is no empty image storage buffer (i.e., the DPB is full equal to the DPB size), and an empty image storage buffer is required to store the decoded image, as specified in the sub-clause.
[0275] The "collision" process consists of the following ordered steps: <change>
[0276] 4. Select the first image to be output as the image with the minimum PicOrderCntVal value among all images marked "to be output" in DPB.
[0277] The selected image consists of an array `currPicL` composed of luminance samples multiplied by `pic_height_in_luma_samples` and `pic_width_in_luma_samples`, and two chrominance samples multiplied by `PicWidthInSamplesC` and `PicHeightInSamplesC`, respectively. The sample arrays `currPicL`, `currPicCb`, and `currPicCr` correspond to the decoded sample array `S`. L S Cb and S Cr .
[0278] 5. When dra_table_present_flag equals 1, the DRA derivation procedure specified in Item 8.9 will be invoked, with the selected image as input and the output image as output; otherwise, the sample array of the output image will be initialized with the sample array of the selected image.< / change>
[0279] 6. The output image is cropped. Using the consistent cropping window specified for the image in the active SPS, the cropped image is output and marked as "not required for output".
[0280] 7. When including being <change> Mapping< / change> The image storage buffer for cropped and output images is emptied when it contains images marked "not used for reference".
[0281] We will now discuss Adaptive Parameter Set (APS) signaling for DRA data. The MPEG5 EVC specification defines the signal notification of DRA parameters in the APS. The syntax and semantics of DRA parameters are provided below:
[0282]
[0283]
[0284]
[0285]
[0286] DRA Data Syntax
[0287]
[0288] A value of 1 for sps_dra_flag specifies that dynamic range adjustment mapping on the output samples should be used. A value of 0 for sps_dra_flag specifies that dynamic range adjustment mapping on the output samples should not be used.
[0289] A value of 1 for `pic_dra_enabled_present_flag` indicates that `pic_dra_enabled_flag` exists in PPS. A value of 0 for `pic_dra_enabled_present_flag` indicates that `pic_dra_enabled_flag` does not exist in PPS. When `pic_dra_enabled_present_flag` does not exist, it is inferred to be equal to 0.
[0290] A `pic_dra_enabled_flag` value of 1 indicates that DRA is enabled for all decoded images from the reference PPS. A `pic_dra_enabled_flag` value of 0 indicates that DRA is not enabled for all decoded images from the reference PPS. If it does not exist, `pic_dra_enabled_flag` is inferred to be equal to 0.
[0291] pic_dra_aps_id specifies the adaptation_parameter_set_id for enabling DRA APS for the decoded image of the reference PPS.
[0292] The `adaption_parameter_set_id` provides an identifier for the APS for reference by other syntax elements.
[0293] aps_params_type specifies the type of APS parameters carried in APS, as shown in Table 2.
[0294]
[0295] Table 2 – APS Parameter Type Codes and APS Parameter Types
[0296] dra_descriptor1 should be in the range of 0 to 15 (inclusive). In the current version of the specification, the value of the syntax element dra_descriptor1 is restricted to 4, with other values reserved for future use.
[0297] `dra_descriptor2` specifies the accuracy of the decimal part of the DRA scaling parameter signaling and reconstruction process. The value of `dra_descriptor2` should be in the range of 0 to 15 (inclusive). In the current version of the specification, the value of the syntax element `dra_descriptor2` is limited to 9; other values are reserved for future use.
[0298] The variable numBitsDraScale is derived as follows:
[0299] numBitsDraScale=dra_descriptor1+dra_descriptor2
[0300] The increment of dra_number_ranges_minus1 by 1 specifies the number of ranges to describe the DRA table. The value of dra_number_ranges_minus1 should be in the range of 0 to 31 (inclusive).
[0301] A dra_equal_ranges_flag of 1 indicates that the DRA table is derived using ranges of equal size, the size of which is specified by the syntax element dra_delta_range[0]. A dra_equal_ranges_flag of 0 indicates that the DRA table is derived using dra_number_ranges, the size of each range being specified by the syntax element dra_delta_range[j].
[0302] dra_global_offset specifies the starting codeword position for deriving the DRA table and initializes the variable inDraRange[0] as follows:
[0303] inDraRange[0] = dra_global_offset
[0304] The number of bits used to signal dra_global_offset is BitDepth Y bits.
[0305] dra_delta_range[j] specifies the size of the j-th codeword range for deriving the DRA table. The value of dra_delta_range[j] shall be in the range from 1 to (1<<BitDepthY)-1 (including the end values).
[0306] The variable inDraRange[j] for j in the range from 1 to dra_number_ranges_minus1 (including the end values) is derived as follows:
[0307] inDraRange[j] = inDraRange[j–1] + (dra_equal_ranges_flag == 1)? dra_delta_range[0] : dra_delta_range[j]
[0308] Bitstream consistency requires that inDraRange[j] shall be in the range from 0 to (1<<BitDepthY)–1.
[0309] dra_scale_value[j] specifies the DRA scale value associated with the j-th range of the DRA table. The number of bits used to signal dra_scale_value[j] is equal to numBitsDraScale.
[0310] dra_cb_scale_value specifies the scale value for the chroma samples of the Cb component for deriving the DRA table. The number of bits used to signal dra_cb_scale_value is equal to numBitsDraScale. In the current version of the specification, the value of the syntax element dra_cb_scale_value should be less than 4<<dra_descriptor2, and other values are reserved for future use.
[0311] The dra_cr_scale_value specifies the scaling value for chroma samples of the Cr component used to derive the DRA table. The number of bits used to signal the dra_cr_scale_value is equal to numBitsDraScale. In the current version of the specification, the value of the syntax element dra_cb_scale_value should be less than 4 << dra_descriptor2, and other values are reserved for future use.
[0312] The values of dra_scale_value[j], dra_cb_scale_value, and dra_cr_scale_value shall not be equal to 0.
[0313] The dra_table_idx specifies the access entry of the ChromaQpTable used to derive the chroma scale value. The value of dra_table_idx shall be in the range of 0 to 57 (inclusive).
[0314] Some video encoders signal DRA data as a separate network abstraction layer (NAL) unit in the picture parameter set (PPS) for all pictures used for reference PPS, using a specific applicable APS identifier. The video decoder 300 can apply the inverse DRA process during the output process, which can be decoupled in time from the decoding process, e.g., in a random access codec scenario.
[0315] However, the potential decoupling of the decoding process and the output process may result in the following situation: the output process and thus the DRA application may be specified by a DRA APS that may have been overwritten by a new DRA APS in the DRA APS buffer during the decoding process.
[0316] To ensure that the DRA APS data in the APS buffer is not overwritten by different DRA APS data during the decoding process until the DRA is applied by the video decoder (such as the video decoder 300) during the output process, the codec (such as the video encoder 200) can prevent the DRA APS buffer entries from being overwritten by different data during the decoding process by constraining the bitstream such that each RA APS with a specific ID number should consist of the same (identical) content (or alternatively include the same content). This effectively implements a static APS buffer size of N, e.g., N is equal to 32 entries, as in MPEG5 EVC.
[0317] For example, video encoder 200 can determine a first DRA APS ID for a first frame of video data and determine a first DRA APS for the first frame. Video encoder 200 can also determine a second DRA APS ID for a second frame of video data and determine a second DRA APS for the second frame. Video encoder 200 can process the first frame according to the first DRA APS and process the second frame according to the second DRA APS. In some examples, video encoder 200 can assign a second DRA APS ID such that when the first DRA APS is different from the second DRA APS, the second DRA APS ID is different from the first DRA APS ID. In some examples, when the second DRA APS ID is the same as the first DRA APS ID, video encoder 200 can determine that the second DRA APS is equal to the first DRA APS. For example, if the first DRA APS ID is equal to the second DRA APS ID, then the first DRA APS is equal to the second DRA APS.
[0318] For example, video decoder 300 can determine a first DRA APS ID for a first image of video data. Video decoder 300 can determine the DRA APS for the first image. Video decoder 300 can store the DRA APS in an APS buffer. Video decoder 300 can determine a second DRA APS ID for a second image of video data. Based on the fact that the second DRA APS ID is equal to the first DRA APS ID, video decoder 300 can prevent the stored DRA APS from being overwritten by different data. For example, video decoder 300 can avoid overwriting stored DRA APS, or video decoder 300 can overwrite stored DRA APS with the same DRA APS.
[0319] The semantics of the APS raw byte sequence payload (RBSP) according to the technology of this disclosure are now described.
[0320] The `adaption_parameter_set_id` provides an identifier for the APS for reference by other syntax elements.
[0321] All APS NAL units within a codec video sequence (CVS) with specific values of aps_param_type equal to DRA_APS and adaptation_parameter_set_id should have the same content.
[0322] According to the technology disclosed herein, the following conditions apply to bitstream consistency:
[0323] – When two or more image references within a CVS are multiple APSs of type DRA_APS with the same adaptation_parameter_set_id value, the multiple APSs of type DRA_APS with the same adaptation_parameter_set_id value should have the same content.
[0324] Figure 14 This is a block diagram of a video encoder and video decoder system including a DRA unit. A video encoder, such as video encoder 200, may include a forward DRA unit 240 and a codec core 242. In some examples, the codec core 242 may include... Figure 3 The units described in the text, and can be as described above regarding Figure 3 It works as described. The video encoder 200 can also determine multiple APS 244 and multiple PPS 246, which may include information from the forward DRA unit 240.
[0325] According to the technology disclosed herein, the forward DRA unit 240 can determine a first DRA APS for a first image (APS 244) of video data. The forward DRA unit 240 can assign a first DRA APS ID to the first DRA APS. The forward DRA unit 240 can determine a second DRA APS for a second image (APS 244) of video data and assign a second DRA APS ID to the second DRA APS. The codec core 242 can signal the first DRA APS in the bitstream 250. The forward DRA unit 240 can process the first image based on the first DRA APS. The forward DRA unit 240 can determine whether the first DRA APS ID is equal to the second DRA APS ID. If the first DRA APS ID is equal to the second DRA APS ID, the forward DRA unit 240 can process the second image according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, the codec core 242 can signal the second DRA APS in the bitstream, and the forward RA unit 240 can process the second image according to the second DRA APS.
[0326] A video decoder, such as video decoder 300, may include a codec core 340 and an output DRA unit 342. In some examples, the codec core 342 may include... Figure 4 The units described in the text, and can be as described above regarding Figure 4 It works as described. The video decoder 300 can also be determined to include multiple APS 344 and multiple PPS 346 that will include information output by the DRA unit 342.
[0327] According to the technology disclosed herein, the output DRA unit 342 can determine a first DRA APS for a first image of video data. The output DRA unit 342 can determine the DRA APS for the first image. The output DRA unit 342 can store the DRA APS in the APS buffer (APSB) 348. The output DRA unit 342 can determine a second DRA APS ID for a second image of video data. Based on the fact that the second DRA APS ID is equal to the first DRA APS ID, the output DRA unit 342 can prevent the stored DRA APS from being overwritten by different data and process the first and second images according to the DRA APS.
[0328] Figure 15 This is a flowchart illustrating an example DRA APS encoding technique according to this disclosure. Video encoder 200 can determine a first DRA APS (470) for a first frame of video data. For example, video encoder 200 can determine DRA parameters to be applied to the first frame of video data and include those DRA parameters in the first DRA APS. Video encoder 200 can assign a first DRA APS ID (471) to the first DRA APS. For example, video encoder 200 can determine a first DRA APS ID to identify the first DRA APS. Video encoder 200 can signal the first DRA APS ID in a PPS associated with the first frame. Video encoder 200 can determine a second DRA APS (472) for a second frame of video data. For example, video encoder 200 can determine DRA parameters to be applied to the second frame of video data and signal the second DRA APS based on the determined DRA parameters.
[0329] The video encoder 200 can assign a second DRA APS ID (473) to the second DRA APS. For example, the video encoder 200 can assign a second DRA APS ID to identify the second DRA APS. The video encoder 200 can signal the second DRA APS ID in the PPS associated with the second picture.
[0330] The video encoder 200 can signal the first DRA APS (474) in the bitstream. For example, the video encoder 200 can signal the first DRA APS to the video decoder 300 to store it in the APS buffer.
[0331] The video encoder 200 can process the first image (475) according to the first DRA APS. For example, the video encoder 200 can apply the first DRA to the first image based on the parameters that the video encoder 200 can encode and decode in the first DRA APS, and signal the first DRA APS to the video decoder 300 for storage in the APS buffer.
[0332] The video encoder 200 can determine whether the first DRA APS ID is equal to the second DRA APS ID (476). For example, the video encoder 200 can compare the first DRA APS ID with the second DRA APS ID. If the first DRA APS ID is equal to the second DRA APS ID (476), the video encoder 200 can determine whether the first DRA APS ID is equal to the second DRA APS ID. Figure 15 If the path is "yes" in the first DRA APS, then the video encoder 200 can process the second image according to the first DRA APS (477). For example, the video encoder 200 can apply DRA to the second image according to the first DRA APS. If the first DRA APS ID is not equal to the second DRA APS ID ( Figure 15 If the path is "No" in the bitstream, the video encoder 200 can signal the second DRA APS in the bitstream and process the second picture according to the second DRA APS (478). For example, the video encoder 200 can apply DRA to the second picture according to the second DRA APS and signal the second DRA APS to the video decoder 300 to store it in the APS buffer. In this way, the video encoder 200 can prevent data from being overwritten by different data in the APS buffer of the video decoder 300.
[0333] In some examples, the video encoder 200 may limit the number of bits in the DRA APS to N, where N is an integer, such as 32. In some examples, the video encoder 200 may signal the first DRA APS ID in the bitstream and signal the second DRA APS ID in the bitstream.
[0334] In some examples, when the first DRA APS is not equal to the second DRA APS, the video encoder 200 can avoid assigning the value of the second DRA APS ID to be equal to the value of the first DRA APS ID. For example, when the first DRA APS and the second DRA APS are different or contain different data, the video encoder 200 can assign a DRA APS ID to the second image that is different from the DRA APS ID that the video encoder 200 can assign to the first image. In some examples, the video encoder can determine whether the first DRA APS is equal to the second DRA APS. In some examples, as part of avoiding assigning the value of the second DRA APS ID to be equal to the value of the first DRA APS ID, the video encoder 200 can assign a value to the second DRA APS ID that is different from the value of the first DRA APS ID. In some examples, the video encoder 200 can determine whether the first DRA APS is equal to the second DRA APS, and based on the first DRA APS being equal to the second DRA APS, the video encoder 200 can determine that the second DRA APS ID is equal to the first DRA APS ID.
[0335] In some examples, the APS buffer has a static size. In some examples, the APS buffer is configured to store 32 entries. In some examples, the video encoder 200 prevents data loss due to overwriting of the APS buffer in the video decoder 300 by making the second DRA APS equal to the first DRA APS.
[0336] Figure 16 This is a flowchart illustrating an example DRA APS decoding technique according to this disclosure. Video decoder 300 can determine a first DRA APS ID (480) for a first frame of video data. For example, video decoder 300 can parse syntax elements (such as adaptation_parameter_set_id) in the bitstream that can be associated with the first frame in the PPS to determine the first DRA APS ID. Video decoder 300 can determine the DRA APS for the first frame (482). For example, video decoder 300 can parse the DRA APS associated with the first frame in the bitstream to determine the DRA APS for the first frame. Video decoder 300 can store the DRA APS in an APS buffer (484). For example, video decoder 300 can store the DRA APS in an APS buffer 348 (…). Figure 14 ).
[0337] The video decoder 300 can determine a second DRA APS ID for a second image of the video data (486). For example, the video decoder 300 can parse syntax elements (such as adaptation_parameter_set_id) in the bitstream that can be in the PPS associated with the second image to determine the second DRA APS ID. Based on the fact that the second DRA APS ID is equal to the first DRA APS ID, the video decoder 300 can prevent stored DRA APS from being overwritten by different data (488). For example, as part of preventing stored DRA APS from being overwritten by different data, the video decoder 300 can avoid overwriting stored DRA APS. In another example, as part of preventing stored DRA APS from being overwritten by different data, the video decoder 300 can overwrite stored DRA APS with the same DRA APS.
[0338] The video decoder 300 can process the first and second images (490) according to the stored DRA APS. For example, the video decoder 300 can use parameters in the stored DRA APS to perform DRA on the first and second images to create a DRA-adjusted first image and a DRA-adjusted second image. In some examples, the video decoder 300 can output the DRA-adjusted first image and the DRA-adjusted second image. For example, the video decoder 300 can output the DRA-adjusted first image and the DRA-adjusted second image for display on a display device, such as... Figure 1 Display device 118.
[0339] Figure 17 A flowchart illustrating an example method for encoding the current block is shown. The current block may include the current CU. Although for video encoder 200 ( Figure 1 and Figure 3 This has been described, but it should be understood that other devices can be configured to perform the same actions. Figure 17 A similar approach.
[0340] In this example, the video encoder 200 initially predicts the current block (350). For example, the video encoder 200 may form a prediction block for the current block. Then, the video encoder 200 may compute the residual block for the current block (352). To compute the residual block, the video encoder 200 may compute the difference between the original, uncoded block and the prediction block for the current block. Then, the video encoder 200 may transform the residual block and quantize the transform coefficients of the residual block (354). Next, the video encoder 200 may scan the quantized transform coefficients of the residual block (356). During or after the scan, the video encoder 200 may entropy encode the transform coefficients (358). For example, the video encoder 200 may encode the transform coefficients using CAVLC or CABAC. Then, the video encoder 200 may output the entropy-encoded data of the block (360). The video encoder 200 may also perform... Figure 15 DRA technology.
[0341] Figure 18 This is a flowchart illustrating an example method for decoding the current block of video data. The current block may include the current CU. Although for video decoder 300 ( Figure 1 and Figure 4 This has been described, but it should be understood that other devices can be configured to perform the same actions. Figure 18 A similar approach.
[0342] The video decoder 300 can receive entropy-coded data of the current block (370), such as entropy-coded prediction information and entropy-coded data for the transform coefficients of the residual block corresponding to the current block. The video decoder 300 can entropy decode the entropy-coded data to determine the prediction information of the current block and reproduce the transform coefficients of the residual block (372). For example, the video decoder 300 can predict the current block using, for example, an intra-frame or inter-frame prediction mode indicated by the prediction information for the current block to compute a prediction block for the current block (374). The video decoder 300 can then inversely scan the reproduced transform coefficients (376) to create a block of quantized transform coefficients. The video decoder 300 can then inversely quantize and inverse transform the transform coefficients to produce a residual block (378). Finally, the video decoder 300 can decode the current block by combining the prediction block and the residual block (380). The video decoder 300 can also apply DRA to decoded images, such as those related to... Figure 16 As described.
[0343] This disclosure includes the following examples.
[0344] Clause 1A. A method for encoding and decoding video data, the method comprising: determining a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of the video data; determining a DRA APS identifier (ID) for the first image of the video data; determining whether a DRA APS ID for a second image of the video data is equal to a DRA APS ID for the first image of the video data; determining that a DRA APS for the second image of the video data is equal to a DRA APS ID for the first image of the video data based on the fact that a DRA APS ID for the second image of the video data is equal to a DRA APS ID for the first image of the video data; and processing the second image of the video data based on the DRA APS for the second image of the video data.
[0345] Clause 2A. The method described in Clause 1A further includes storing the DRA APS of the first image of the video data in the APS buffer.
[0346] Clause 3A. The method described in Clause 2A, wherein the APS buffer has a static size.
[0347] Clause 4A. The method described in Clause 3A, wherein the APS buffer is configured to store 32 entries.
[0348] Clause 5A. The method according to any one of Clauses 2A-4A further includes preventing overwriting of the APS buffer based on the fact that the DRA APS ID of the second picture for the video data is equal to the DRA APS ID of the first picture for the video data.
[0349] Clause 6A. The method according to any one of Clauses 1A-5A, wherein encoding / decoding includes decoding.
[0350] Clause 7A. The method according to any one of Clauses 1A-6A, wherein encoding / decoding includes encoding.
[0351] Clause 8A. An apparatus for encoding and decoding video data, the apparatus comprising one or more components for performing the methods described in any one of Clauses 1A-7A.
[0352] Clause 9A. The device as described in Clause 8A, wherein the one or more components include one or more processors implemented in a circuit.
[0353] Clause 10A. The device pursuant to any one of Clauses 8A-9A further includes a memory for storing video data.
[0354] Clause 11A. The device pursuant to any one of Clauses 8A-10A further includes a display configured to display decoded video data.
[0355] Clause 12A. Devices under any of Clauses 8A-11A, wherein the device includes one or more of a camera, computer, mobile device, broadcast receiver device, or set-top box.
[0356] Clause 13A. A device pursuant to any of Clauses 8A-12A, wherein the device includes a video decoder.
[0357] Clause 14A. An apparatus pursuant to any of Clauses 8A-13A, wherein the apparatus includes a video encoder.
[0358] Clause 15A. A computer-readable storage medium having instructions stored thereon, which, when executed, cause one or more processors to perform the method described in any one of Clauses 1A-7A.
[0359] Clause 16A. An apparatus for encoding video data, the apparatus comprising: components for determining a dynamic range adjustment (DRA) adaptive parameter set (APS) for a first image of the video data; components for determining a DRA APS ID for the first image of the video data; components for determining whether a DRA APS ID for a second image of the video data is equal to a DRA APS ID for the first image of the video data; components for determining that a DRA APS for the second image of the video data is equal to a DRA APS ID for the first image of the video data based on the fact that a DRA APS ID for the second image of the video data is equal to a DRA APS ID for the first image of the video data; and components for processing the second image of the video data based on the DRA APS for the second image of the video data.
[0360] Clause 1B. A method for encoding video data, the method comprising: determining a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first frame of the video data; assigning a first DRA APS ID to the first DRA APS;
[0361] The process involves: determining a second DRA APS for a second image of video data; assigning a second DRA APS ID to the second DRA APS; signaling the first DRA APS in the bitstream; processing the first image according to the first DRA APS; determining whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, processing the second image according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signaling the second DRA APS in the bitstream and processing the second image according to the second DRA APS.
[0362] Clause 2B. The method described in Clause 1B further includes limiting the number of bits in the DRA APS to N, where N is an integer.
[0363] Clause 3B. The method described under Clause 1B or 2B further includes: signaling a first DRA APSID in the bitstream; and signaling a second DRA APSID in the bitstream.
[0364] Clause 4B. The method described under any combination of Clauses 1B-3B further comprises: avoiding assigning a value of the second DRA APS ID to be equal to the value of the first DRA APS ID when the first DRA APS is not equal to the second DRA APS.
[0365] Clause 5B. The method described in Clause 4B, wherein avoiding assigning a value to the second DRA APS ID equal to the value of the first DRA APS ID comprises: assigning a value to the second DRA APS ID that is different from the value of the first DRA APS ID.
[0366] Clause 6B. The method described under any combination of Clauses 1B-5B further comprises: avoiding signaling the second DRA APS in the bitstream when the first DRA APS ID is equal to the second DRA APS ID.
[0367] Clause 7B. The method according to any combination of Clauses 1B-6B, wherein the first DRA APS is signaled to the video decoder for storage in the APS buffer.
[0368] Clause 8B. The method described in accordance with Clause 7B, wherein the APS buffer has a static size.
[0369] Clause 9B. The method described in accordance with Clause 8B, wherein the APS buffer is configured to store 32 entries.
[0370] Clause 10B. An apparatus for encoding video data, the apparatus comprising: a memory configured to store the video data and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first frame of the video data; assign a first DRA APS ID to the first DRA APS; determine a second DRA APS for a second frame of the video data; assign a second DRA APS ID to the second DRA APS; signal the first DRA APS in a bitstream; process the first frame according to the first DRA APS; determine whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, process the second frame according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signal the second DRA APS in a bitstream and process the second frame according to the second DRA APS.
[0371] Clause 11B. The device as described in Clause 10B, wherein the one or more processors are further configured to limit the number of bits in DRAAPS to N, where N is an integer.
[0372] Clause 12B. The device according to Clause 10B or 11B, wherein the one or more processors are further configured to: signal a first DRA APS ID in a bitstream; and signal a second DRA APS ID in a bitstream.
[0373] Clause 13B. The device according to any combination of Clauses 10B-12B, wherein the one or more processors are further configured to: avoid assigning a value of the second DRA APS ID equal to the value of the first DRA APS ID when the first DRA APS is not equal to the second DRA APS.
[0374] Clause 14B. The device as described in Clause 13B, wherein, as a means to avoid assigning a value to the second DRA APS ID that is equal to a portion of the value of the first DRA APS ID, the one or more processors are configured to assign a value to the second DRA APS ID that is different from the value of the first DRA APS ID.
[0375] Clause 15B. The device according to any combination of Clauses 10B-14B, wherein the one or more processors are further configured to: avoid signaling a second DRAAPS in the bitstream when the first DRAAPS ID is equal to the second DRAAPS ID.
[0376] Clause 16B. The device according to any combination of Clauses 10B-15B, wherein the first DRA APS is signaled to the video decoder for storage in the APS buffer.
[0377] Clause 17B. The method described in accordance with Clause 16B, wherein the APS buffer has a static size.
[0378] Clause 18B. The device as described in Clause 17B, wherein the APS buffer is configured to store 32 entries.
[0379] Clause 19B. The device described under any combination of Clauses 10B-18B further includes: a camera configured to capture video data.
[0380] Clause 20B. A non-transitory computer-readable storage medium for storing instructions that, when executed, cause one or more processors to: determine a first dynamic range adjustment (DRA) adaptive parameter set (APS) for a first picture of video data; assign a first DRA APS ID to the first DRA APS; determine a second DRA APS for a second picture of video data; assign a second DRA APS ID to the second DRA APS; signal the first DRA APS in a bitstream; process the first picture according to the first DRA APS; determine whether the first DRA APS ID is equal to the second DRA APS ID; if the first DRA APS ID is equal to the second DRA APS ID, process the second picture according to the first DRA APS; and if the first DRA APS ID is not equal to the second DRA APS ID, signal the second DRA APS in a bitstream and process the second picture according to the second DRA APS.
[0381] Clause 21B. A method for decoding video data, the method comprising: determining a first dynamic range adjustment (DRA) adaptive parameter set (APS) identifier (ID) for a first image of the video data; determining a DRA APS for the first image; storing the DRA APS in an APS buffer; determining a second DRA APS ID for a second image of the video data; preventing the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and processing the first image and the second image according to the stored DRA APS.
[0382] Clause 22B. The method described in Clause 21B, wherein preventing stored DRA APS from being overwritten by different data includes: avoiding overwriting stored DRA APS.
[0383] Clause 23B. The method described in Clause 21B or 22B, wherein preventing stored DRA APS from being overwritten by different data comprises: overwriting the stored DRA APS with the same DRA APS.
[0384] Clause 24B. The method according to any combination of Clauses 21B-23B, wherein processing the first image and the second image creates a DRA-adjusted first image and a DRA-adjusted second image, the method further comprising: outputting the DRA-adjusted first image and the DRA-adjusted second image.
[0385] Clause 25B. The method according to any combination of Clauses 21B-24B, wherein the APS buffer has a static size.
[0386] Clause 26B. The method described in accordance with Clause 25B, wherein the APS buffer is configured to store 32 entries.
[0387] Clause 27B. An apparatus for decoding video data, the apparatus comprising a memory configured to store the video data and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors being configured to: determine a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first image of the video data; determine the DRA APS for the first image; store the DRA APS in an APS buffer; determine a second DRA APS ID for a second image of the video data; prevent the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and process the first image and the second image according to the stored DRA APS.
[0388] Clause 28B. The device as described in Clause 27B, wherein, as part of preventing the stored DRA APS from being overwritten by different data, the one or more processors are configured to avoid overwriting the stored DRA APS.
[0389] Clause 29B. The device described in Clause 27B or 28B, wherein, as part of preventing stored DRA APS from being overwritten by different data, the one or more processors are configured to overwrite stored DRA APS with the same DRA APS.
[0390] Clause 30B. The device described in any combination of Clauses 27B-29B, wherein processing a first image and a second image creates a DRA-adjusted first image and a DRA-adjusted second image, the one or more processors being further configured to output the DRA-adjusted first image and the DRA-adjusted second image.
[0391] Clause 31B. The device described in any combination of Clauses 27B-30B, wherein the APS buffer has a static size.
[0392] Clause 32B. The device as described in Clause 31B, wherein the APS buffer is configured to store 32 entries.
[0393] Clause 33B. A non-transitory computer-readable storage medium storing instructions that, when executed, cause one or more processors to perform the following operations: determine a first Dynamic Range Adjustment (DRA) Adaptive Parameter Set (APS) identifier (ID) for a first picture of video data; determine a DRA APS for the first picture; store the DRA APS in an APS buffer; determine a second DRA APS ID for a second picture of video data; prevent the stored DRA APS from being overwritten by different data based on the second DRA APS ID being equal to the first DRA APS ID; and process the first picture and the second picture according to the stored DRA APS.
[0394] It should be recognized that, based on the examples, certain actions or events of any technique described herein may be performed in a different order, may be added, combined, or may be omitted entirely (e.g., not all described actions or events are necessary for technical practice). Furthermore, in some examples, actions or events may be performed concurrently rather than sequentially, for example, through multithreading, interrupt handling, or multiple processors.
[0395] In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functionality may be stored or transmitted as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. A computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium such as a data storage medium, or a communication medium, including, for example, any medium that facilitates the transfer of a computer program from one place to another according to a communication protocol. In this way, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures to implement the techniques described in this disclosure. Computer program products may include computer-readable media.
[0396] By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, flash memory or any other device that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, any connection is appropriately referred to as a computer-readable medium. For example, if instructions are transmitted from a website, server or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, the definition of medium includes coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather refer to non-transient tangible storage media. Disks and optical discs as used herein include optical discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, and Blu-ray discs, where disks typically copy data magnetically and optical discs copy data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.
[0397] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Therefore, the terms "processor" and "processing circuit" as used herein can refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein can be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into combined codecs. Similarly, the technique can be fully implemented in one or more circuit or logic elements.
[0398] The techniques disclosed herein can be implemented in a variety of devices or apparatuses, including wireless handheld devices, integrated circuits (ICs), or a set of integrated circuits (e.g., chipsets). Various components, modules, or units are described in this invention to emphasize functional aspects of a device configured to perform the disclosed techniques, but they do not necessarily need to be implemented by different hardware units. Rather, as described above, various units can be combined in a codec hardware unit, or provided by a collection of interoperable hardware units including suitable software and / or firmware, said interoperable hardware units including one or more processors as described above.
[0399] Various examples have been described. These and other examples are within the scope of the appended claims.
Claims
1. A method for encoding video data, the method comprising: Determine the first dynamic range adjustment DRA adaptive parameter set APS for the first image of the video data; Assign a first DRA APS ID to the first DRA APS; Determine the second DRA APS for the second image of the video data; Assigning a second DRA APS ID to the second DRA APS includes assigning a value different from the value of the first DRA APS ID to the second DRA APS ID when the first DRA APS is not equal to the second DRA APS; The first DRA APS is signaled to the video decoder in the bitstream so that it can be stored in the APS buffer by the video decoder; The first image is processed according to the first DRA APS; Determine whether the first DRA APS ID is equal to the second DRA APS ID; If the first DRA APS ID is equal to the second DRA APS ID, then avoid signaling the second DRA APS in the bitstream and process the second image according to the first DRA APS; as well as If the first DRA APS ID is not equal to the second DRA APS ID, then the second DRA APS is signaled in the bitstream, and the second image is processed according to the second DRA APS.
2. The method according to claim 1, further comprising: The number of bits in the first DRA APS and the second DRA APS is limited to N, where N is an integer.
3. The method according to claim 1, wherein, The APS buffer has a static size.
4. The method according to claim 3, wherein, The APS buffer is configured to store 32 entries.
5. An apparatus for encoding video data, the apparatus comprising: The memory is configured to store the video data; as well as One or more processors, implemented in a circuit and communicatively coupled to the memory, are configured to: Determine the first dynamic range adjustment DRA adaptive parameter set APS for the first image of the video data; Assign a first DRA APS ID to the first DRA APS; Determine the second DRA APS for the second image of the video data; Assign a second DRA APS ID to the second DRA APS, wherein the one or more processors are further configured to: assign a value different from the value of the first DRA APS ID to the second DRA APS ID when the first DRA APS is not equal to the second DRA APS; The first DRA APS is signaled to the video decoder in the bitstream so that it can be stored in the APS buffer by the video decoder; The first image is processed according to the first DRA APS; Determine whether the first DRA APS ID is equal to the second DRA APS ID; If the first DRA APS ID is equal to the second DRA APS ID, then avoid signaling the second DRA APS in the bitstream and process the second image according to the first DRA APS; and If the first DRA APS ID is not equal to the second DRA APS ID, then the second DRA APS is signaled in the bitstream, and the second image is processed according to the second DRA APS.
6. The device according to claim 5, wherein, The one or more processors are further configured to limit the number of bits in the first DRA APS and the second DRA APS to N, where N is an integer.
7. The device according to claim 5, wherein, The APS buffer has a static size.
8. The device according to claim 7, wherein, The APS buffer is configured to store 32 entries.
9. The device according to claim 5, further comprising: A camera is configured to capture the video data.
10. A non-transitory computer-readable storage medium storing instructions, which, when executed, cause one or more processors to: Determine the first dynamic range adjustment DRA adaptive parameter set APS for the first image of the video data; Assign a first DRA APS ID to the first DRA APS; Determine the second DRA APS for the second image of the video data; Assign a second DRA APS ID to the second DRA APS, wherein, The instruction also causes the one or more processors to: assign a value to the second DRA APS ID that is different from the value of the first DRA APS ID when the first DRA APS is not equal to the second DRA APS; The first DRA APS is signaled to the video decoder in the bitstream so that it can be stored in the APS buffer by the video decoder; The first image is processed according to the first DRA APS; Determine whether the first DRA APS ID is equal to the second DRA APS ID; If the first DRA APS ID is equal to the second DRA APS ID, then avoid signaling the second DRA APS in the bitstream and process the second image according to the first DRA APS; as well as If the first DRA APS ID is not equal to the second DRA APS ID, then the second DRA APS is signaled in the bitstream, and the second image is processed according to the second DRA APS.
11. A method for decoding video data, the method comprising: Determine the first dynamic range adjustment DRA adaptive parameter set APS identifier ID for the first image of the video data; Determine the first DRA APS for the first image; Store the first DRA APS in the APS buffer; Determine a second DRA APS ID for a second image of the video data, wherein the second DRA APS ID corresponds to a second DRA APS for the second image of the video data, and wherein, when the first DRA APS is not equal to the second DRA APS, the second DRA APS ID is assigned a value different from the value of the first DRA APS ID; Based on the fact that the second DRA APS ID is equal to the first DRA APS ID, avoid overwriting the stored first DRA APS or use the second DRA APS to overwrite the stored first DRA APS, so as to prevent the stored first DRA APS from being overwritten by different data; and The first image and the second image are processed according to the stored first DRA APS.
12. The method according to claim 11, wherein, The method further includes processing the first image and the second image to create a first image adjusted by DRA and a second image adjusted by DRA, wherein the method also includes: Output the first image adjusted by DRA and the second image adjusted by DRA.
13. The method according to claim 11, wherein, The APS buffer has a static size.
14. The method according to claim 13, wherein, The APS buffer is configured to store 32 entries.
15. An apparatus for decoding video data, the apparatus comprising: The memory is configured to store the video data; as well as One or more processors, implemented in a circuit and communicatively coupled to the memory, are configured to: Determine the first dynamic range adjustment DRA adaptive parameter set APS identifier ID for the first image of the video data; Determine the first DRA APS for the first image; Store the first DRA APS in the APS buffer; Determine a second DRA APS ID for a second image of the video data, wherein the second DRA APS ID corresponds to a second DRA APS for the second image of the video data, and wherein, when the first DRA APS is not equal to the second DRA APS, the second DRA APS ID is assigned a value different from the value of the first DRA APS ID; Based on the fact that the second DRA APS ID is equal to the first DRA APS ID, avoid overwriting the stored first DRA APS or use the second DRA APS to overwrite the stored first DRA APS, so as to prevent the stored first DRA APS from being overwritten by different data; and The first image and the second image are processed according to the stored first DRA APS.
16. The device according to claim 15, wherein, Processing the first image and the second image to create a DRA-adjusted first image and a DRA-adjusted second image, wherein the one or more processors are further configured to: Output the first image adjusted by DRA and the second image adjusted by DRA.
17. The device according to claim 15, wherein, The APS buffer has a static size.
18. The device according to claim 17, wherein, The APS buffer is configured to store 32 entries.
19. A non-transitory computer-readable storage medium storing instructions, which, when executed, cause one or more processors to: Determine the first dynamic range adjustment DRA adaptive parameter set APS identifier ID for the first image of the video data; Determine the first DRA APS for the first image; Store the first DRA APS in the APS buffer; Determine a second DRA APS ID for a second image of the video data, wherein the second DRA APS ID corresponds to a second DRA APS for the second image of the video data, and wherein, when the first DRA APS is not equal to the second DRA APS, the second DRA APS ID is assigned a value different from the value of the first DRA APS ID; Based on the fact that the second DRA APS ID is equal to the first DRA APS ID, avoid overwriting the stored first DRA APS or use the second DRA APS to overwrite the stored first DRA APS, so as to prevent the stored first DRA APS from being overwritten by different data; and The first image and the second image are processed according to the stored first DRA APS.