Method, medium, and computer program product for generating color transform data under coding efficiency constraints

By generating proprietary color transformation matrices and offset vectors on the encoder side and combining them with backward-integrated metadata, the problem of low encoding efficiency in HDR content distribution is solved, achieving efficient content protection and HDR image restoration.

CN116391356BActive Publication Date: 2026-06-12DOLBY LABORATORIES LICENSING CORP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
DOLBY LABORATORIES LICENSING CORP
Filing Date
2021-10-14
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In HDR content distribution, existing technologies struggle to improve encoding efficiency while maintaining high dynamic range, and traditional decoders cannot effectively protect content, resulting in low encoding efficiency and the risk of content leakage.

Method used

By generating color transformation matrices and offset vectors on the encoder side, combined with coding efficiency constraints and backward shaping metadata, a non-standard transformation from RGB to YCbCr is achieved, and backward shaping is performed on the decoder side to restore HDR images, while content protection is achieved using proprietary color transformation.

🎯Benefits of technology

It improves encoding efficiency, ensures content security, and restores high-quality HDR images while supporting backward compatibility.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116391356B_ABST
    Figure CN116391356B_ABST
Patent Text Reader

Abstract

The present disclosure relates to methods, media, and computer program products for generating color transform data under coding efficiency constraints. Using a standard-based RGB to YCbCr color transform, a new RGB to YCC 3x3 transform matrix and 3x1 offset vector are derived under a set of coding efficiency constraints. The new RGB to YCC 3x3 transform includes a luminance scaling factor and a 2x2 chroma sub-matrix, which preserves the energy of the standard-based RGB to YCbCr transform while maintaining or improving coding efficiency. It also adds support for an authorization mechanism or watermarking mechanism in streaming video applications. Examples of using the new color transform with image reshaping are also provided.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-reference to related applications

[0002] This application claims priority to U.S. Provisional Patent Application No. 63 / 091,436, filed October 14, 2020, and European Patent Application No. 20201683.8, filed October 14, 2020, both of which are incorporated herein by reference in their entirety. Technical Field

[0003] This disclosure generally relates to images. More specifically, embodiments of the invention relate to color transformation and processing for high dynamic range (HDR) video with coding efficiency constraints. Background Technology

[0004] As used herein, the term "dynamic range (DR)" can relate to the ability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, such as from the darkest gray (black) to the brightest white (highlight). In this sense, DR relates to the intensity of a "scene-referred" intensity. DR can also refer to the ability of a display device to fully or approximately render a specific breadth of intensity range. In this sense, DR relates to the intensity of a "display-referred" intensity. Unless a particular meaning is explicitly specified to have a specific connotation at any point in the description herein, it should be inferred that the terms can be used interchangeably in either sense, for example.

[0005] As used herein, the term High Dynamic Range (HDR) refers to a DR width spanning 14 to 15 orders of magnitude across the human visual system (HVS). In practice, the DR, which represents a broad range of intensity that humans can simultaneously perceive relative to HDR, may be slightly truncated. As used herein, the terms Visual Dynamic Range (VDR) or Enhanced Dynamic Range (EDR) can be associated, individually or interchangeably, with this DR: a DR that can be perceived within a scene or image by the human visual system (HVS), including eye movements, thus allowing for some changes in light adaptability across the scene or image. As used herein, VDR can be associated with a DR spanning 5 to 6 orders of magnitude. Therefore, while it may be slightly narrower than HDR relative to a reference real-world scene, VDR or EDR can represent a wide DR width and can also be referred to as HDR.

[0006] In practice, an image comprises one or more color components (e.g., luminance Y and chrominance Cb and Cr), where each color component is represented by n bits per pixel with precision (e.g., n = 8). For example, using gamma luminance coding, images where n ≤ 8 (e.g., color 24-bit JPEG images) are considered to have standard dynamic range, while images where n ≥ 10 can be considered to have enhanced dynamic range. HDR images can also be stored and distributed using high-precision (e.g., 16-bit) floating-point formats such as the OpenEXR file format developed by Industrial Light and Magic.

[0007] Most consumer desktop monitors currently support 200 to 300 cd / m³. 2 Or nits of brightness. Most consumer HDTVs range from 300 to 500 nits, with newer models reaching 1,000 nits (cd / m²). 2 Therefore, such conventional displays represent a lower dynamic range (LDR) relative to HDR, also known as standard dynamic range (SDR). As the availability of HDR content has increased due to advancements in both capture devices (e.g., cameras) and HDR displays (e.g., Dolby Laboratories' PRM-4200 professional reference monitor), HDR content can be color-graded and displayed on HDR displays that support higher dynamic ranges (e.g., from 1,000 nits to 5,000 nits or higher).

[0008] As used herein, the term "forward reshaping" refers to the process of mapping a digital image from its original bit depth and original codeword distribution or representation (e.g., gamma, PQ, HLG, etc.) to images with the same or different bit depths and different codeword distributions or representations, either sample-to-sample or codeword-to-codeword. Reshaping allows for improved compressibility or image quality at a fixed bit rate. For example, without limitation, reshaping can be applied to HDR video encoded with 10-bit or 12-bit PQ to improve coding efficiency in 10-bit video coding architectures. In the receiver, after decompressing the received signal (which may or may not have undergone reshaping), the receiver can apply an "inverse (or backward) reshaping function" to restore the signal to its original codeword distribution and / or achieve a higher dynamic range.

[0009] Due to bit depth limitations and the potential need for backward compatibility, in typical single-layer HDR distribution scenarios, HDR content is transmitted as a combination of an SDR base layer and metadata. Traditional decoders can extract the visible SDR stream from the SDR base layer, but newer HDR decoders can apply metadata to the SDR base layer to reconstruct an HDR version that closely approximates the original HDR source. As the inventors understand herein, improved techniques that combine content protection and enhanced coding efficiency are desired in such HDR content distribution.

[0010] The methods described in this section are permissible but not necessarily methods that have been previously conceived or employed. Therefore, unless otherwise indicated, no method described in this section should be considered prior art simply by virtue of its inclusion in this section. Similarly, unless otherwise specified, no problem identified with respect to one or more methods should be considered to have been recognized in any prior art based on this section. Summary of the Invention

[0011] The example embodiments described herein relate to color transformation under coding efficiency constraints in HDR image coding. In the embodiments, the processor:

[0012] Receive an input image in a first dynamic range and a first color space;

[0013] The first 3×3 color transformation matrix and the first 3×1 offset vector are obtained based on the color transformation from the first color space to the standard color space.

[0014] The first 3×3 color transformation matrix is ​​applied (210) to the input image to generate a first image with luminance and chrominance components;

[0015] The minimum and maximum pixel values ​​for the luminance and chrominance components of the first image are generated.

[0016] The brightness scaling value (215) is calculated based on the minimum and maximum brightness pixel values ​​in the first image.

[0017] The elements of the 2×2 chromaticity transformation matrix are calculated based on the coding efficiency constraint and the first 3×3 color transformation matrix; the 3×3 intermediate transformation matrix is ​​formed based on the brightness scaling value and the 2×2 chromaticity transformation matrix.

[0018] Apply the 3×3 intermediate transformation matrix (225) to the first image to generate the second image;

[0019] An intermediate 3×1 offset vector is generated based on the first offset vector and the minimum and maximum pixel values ​​in the luminance and chrominance components of the second image; and (230) is generated:

[0020] Output a 3×3 color transformation matrix by multiplying the first 3×3 color transformation matrix with the 3×3 intermediate transformation matrix;

[0021] Output a 3×1 offset vector by adding the first 3×1 offset vector to the middle 3×1 offset vector; and

[0022] The output image in the second color space is obtained by adding the output 3×1 offset vector to the pixel values ​​of the second image. Attached Figure Description

[0023] Embodiments of the invention are illustrated in the accompanying drawings by way of example rather than limitation, and similar reference numerals refer to similar elements, and in the drawings:

[0024] Figure 1A An example single-layer encoder for HDR data using color transformation and shaping functions according to an embodiment is depicted;

[0025] Figure 1B The description of the embodiment with Figure 1A Example HDR decoder corresponding to the encoder; and

[0026] Figure 2 An example process for generating color transformation parameters according to an embodiment is described. Detailed Implementation

[0027] This document describes a method for color transformation under coding efficiency constraints in HDR video coding. In the following description, numerous specific details are set forth for purposes of explanation in order to provide a thorough understanding of the invention. However, it will be apparent that the invention can be practiced without these specific details. In other instances, well-known structures and devices have not been described in detail to avoid unnecessarily obscuring, obscuring, or confusing the invention.

[0028] Example HDR encoding system

[0029] Figure 1A and Figure 1B An example single-layer backward-compatible codec framework using both color transformation and image reshaping is illustrated in the embodiment. More specifically, Figure 1A The illustration shows an example encoder architecture that can be implemented using one or more computing processors in an upstream video encoder. Figure 1B The illustration shows an example decoder architecture that can also be implemented using one or more computational processors in one or more downstream video decoders.

[0030] Within this framework, given reference HDR content (105) and corresponding reference SDR content (108) (i.e., representing the same image as the HDR content, but color-graded and represented within the standard dynamic range), the shaped SDR content (134) is encoded as SDR content in a single layer of the encoded video signal (144) by an upstream encoding device implementing an encoder architecture and transmitted as SDR content. The received SDR content is received and decoded in a single layer of the video signal by a downstream decoding device implementing a decoder architecture. Back-shaped metadata (152) is also encoded and transmitted in the video signal along with the shaped content, enabling the HDR display device to reconstruct the HDR content based on the (shaped) SDR content and the back-shaped metadata. Without loss of generality, in some embodiments, such as in non-backward-compatible systems, the shaped SDR content itself may not be viewable and must be combined with a back-shaping function that generates viewable SDR or HDR content to be viewed. In other embodiments that support backward compatibility, a traditional SDR decoder can still replay the received SDR content without using a backward shaping function.

[0031] like Figure 1A As illustrated, given an HDR image (120), its corresponding SDR image (125), and a target dynamic range, after generating a forward shaping function in step 130, a forward shaping mapping step (132) is applied to the HDR image (120) to generate a shaped SDR base layer (134), given the forward shaping function. A compression block (142) (e.g., an encoder implemented according to any known video coding algorithm such as AVC, HEVC, AV1, VVC, etc.) compresses / encodes the SDR image (134) in a single layer (144) of the video signal. Additionally, a backward shaping function generator (150) can generate a backward shaping function that can be transmitted to the decoder as metadata (152). In some embodiments, the metadata (152) can represent the forward shaping function (130), and therefore, the backward shaping function (not shown) will be generated by the decoder.

[0032] Examples of backward-shaping metadata representing / specifying the optimal backward-shaping function may include, but are not limited to, any of the following: inverse tone mapping function, inverse luma mapping function, inverse chroma mapping function, lookup table (LUT), polynomial, inverse display management coefficients / parameters, etc. In various embodiments, the luma backward-shaping function and the chroma backward-shaping function may be obtained / optimized jointly or separately, and may be obtained using various techniques, such as, but not limited to, those described later in this disclosure.

[0033] The backward-shaping metadata (152) generated by the backward-shaping function generator (150) based on the shaped SDR image (134) and the target HDR image (120) can be multiplexed as part of the video signal 144 (e.g., as a supplementary enhancement information (SEI) message).

[0034] In some embodiments, backward-shaping metadata (152) is carried in the video signal as part of overall image metadata, which is carried separately from the individual layers in the video signal in which the SDR image is encoded. For example, backward-shaping metadata (152) may be encoded in a component stream of an encoded bitstream, which may or may not be separate from the individual layers (of the encoded bitstream) in which the SDR image (134) is encoded.

[0035] Therefore, backward-integrated metadata (152) can be generated or pre-generated on the encoder side to take advantage of the powerful computing resources and offline encoding processes available on the encoder side (including but not limited to content-adaptive multi-round, advance operation, inverse luminance mapping, inverse chrominance mapping, CDF-based histogram approximation and / or transfer, etc.).

[0036] Figure 1A The encoder architecture can be used to avoid directly encoding the target HDR image (120) into an encoded / compressed HDR image in the video signal; instead, the backward-shaping metadata (152) in the video signal can be used to enable downstream decoding devices to backward-shape the SDR image (134) (encoded in the video signal) into a reconstructed image that is the same as or close to the approximate / best approximation of the reference HDR image (120).

[0037] In the embodiments, the reference SDR and HDR signals may be available in color formats that are not suitable for direct encoding (e.g., in RGB color formats). Traditionally, such signals are converted to codec-friendly formats such as YCbCr, YUV, etc. Color transformation blocks 110 and 115 represent RGB to YCC transformations using a 3×3 RGB matrix and a 3×1 offset vector. As used herein, the term “YCC” stands for a generic luma-chroma class color representation, similar to YUV and YCbCr, but generated from RGB input data using a proprietary color transformation. These blocks may be used to perform conventional color transformations (e.g., according to Rec.709 (Reference [4]) or Rec.2020 (Reference [5])) or color transformations under coding efficiency constraints according to the example embodiments to be presented in subsequent sections.

[0038] In some embodiments, such as Figure 1BAs illustrated in the diagram, the decoder side of the codec framework receives a video signal as input, encoded with a shaped SDR image in a single layer (144) and backward-shaped metadata (152) as part of the overall image metadata. The decompression block (154) decompresses / decodes the compressed video data in the single layer (144) of the video signal into a decoded SDR image (156). Decompression 154 typically corresponds to the reverse process of compression 142. The decoded SDR image (156) can be identical to the SDR image (134) that has undergone quantization errors in the compression block (142) and decompression block (154), which may have been optimized for SDR display devices. In a backward-compatible system, the decoded SDR image (156) can be output in an output SDR video signal (177) after a suitable inverse SDR color transformation (175) representing the inverse transformation of the SDR color transformation (110) (e.g., via an HDMI interface, via a video link, etc.) for rendering on an SDR display device.

[0039] Optionally, alternatively, or additionally, in the same or another embodiment, backward shaping block 158 extracts backward (or forward) shaping metadata (152) from the input video signal, constructs a backward shaping function based on the shaping metadata (152), and performs a backward shaping operation on the decoded SDR image (156) based on the optimal backward shaping function to generate a backward-shaped image (160) (or a reconstructed HDR image). In some embodiments, after a suitable inverse HDR color transformation (170) (representing the inverse transformation of HDR color transformation 115), the backward-shaped image (172) represents an HDR image of the same or close approximation / best approximation of the reference HDR image (105) in production quality or close to production quality. The backward-shaped image (172) may be output in the output HDR video signal (e.g., via an HDMI interface, via a video link, etc.) for rendering on an HDR display device.

[0040] In some embodiments, color transformation (170) may be part of a display management operation specific to an HDR display device and may be performed as part of an HDR image rendering operation that renders the back-shaped image (160) on an HDR display device.

[0041] Content protection via color transformation

[0042] Return to Figure 1A In this embodiment, the reference HDR RGB signal (105) is converted to a 3×3 RGB-to-YCbCr matrix. and 3×1 offset vector Convert to YCbCr. The corresponding 3×3 YCbCr to RGB matrix. and 3×1 offset vector This will be used at the decoder side (170). To ensure reproducibility, in this embodiment, the backward color transformation matrix is ​​the inverse of the forward matrix, i.e., In an embodiment, and The value can be part of the metadata (152). Typically, but not restrictively, it is determined using the Rec.709 or Rec.2020 specification. and

[0043] In block 110, the 3×3 RGB to YCC color matrix at the encoder side will be used. and 3×1 offset vector To perform color conversion on the reference SDR RGB signal (108). For the inverse SDR color conversion (175), the corresponding 3×3 YCC to RGB matrix is... and 3×1 offset vector This will be used on the decoder side. To ensure reproducibility, the backward color transformation matrix is ​​the inverse of the forward matrix, i.e., In an embodiment, and Data can be transmitted as part of metadata (152). and These are not necessarily defined by the Rec.709 specification (or any other standard-based transformation); in fact, they are design parameters to be adjusted according to the example embodiment. Note that after forward shaping (132), the shaped signal (134) will be in the same color space as the output (125) of the SDR color transformation (110). In the decoder, as... Figure 1B The following example operation scenarios are possible:

[0044] Content protection for licensed devices

[0045] As discussed, on the encoder side, via and The base layer is mapped to YCC. These parameters are designed by the algorithm described later and are not standard parameters (e.g., Rec.709, Rec.2020, etc.). To recover the correct colors, corresponding parameters are required. and because and The value is not necessarily based on standards such as Rec.709, so when traditional equipment does not have information about... and When the receiver receives the correct information, the device will convert YCC to RGB using the default Rec.709 matrix, which will produce an unwatchable image in terms of color and brightness. Conversely, if the receiver can access the correct... and If the value is correct, then the colors of the reconstructed SDR RGB image will be correct. Therefore, in such scenarios, proprietary color transformation can be used as a "licensing" mechanism to control the viewing of even HDR content in SDR versions.

[0046] In another embodiment, proprietary color information can be used as a "fingerprint" or "watermark" scheme. For example, a content provider may decide to use different color transformation matrices for different distribution regions (e.g., Europe, Asia, the United States, etc.) or for specific streaming providers (e.g., Amazon, Hulu, etc.).

[0047] Restore HDR content

[0048] In HDR playback mode, an HDR RGB image can be reconstructed from a baseline YCC SDR image using only one color transformation, based on the incoming metadata. In an embodiment, this is possible by using a color cross-predictor such as a multi-channel multivariate regression (MMR) predictor (reference [1]). The nature of the MMR predictor allows the decoder to recover HDR data in the correct YCbCr color even if the color of the SDR source is incorrect. Given a YCbCr HDR output (160), an HDR RGB image can be reconstructed using the following methods: and The relevant metadata is used to convert the output from YCbCr to RGB.

[0049] In summary, in the embodiments, and Color transformation parameters can play a crucial role in protecting content. This is especially true if the device does not have the correct parameters. and Using the default YCC2RGB matrix will generate incorrect output. (Using...) and Careful design can create a non-viewable base layer and even improve compression efficiency to maintain or even improve coding efficiency. The following sections present the methods used to build... and The methods to achieve those goals.

[0050] As discussed earlier, the goal is to design an SDR RGB to YCC color transformation under constraints. and the corresponding SDR YCC to RGB Inverse color transformation is used to maintain or improve encoding efficiency. and The 12 coefficients are represented as follows:

[0051]

[0052] as well as

[0053]

[0054] Given the forward transform parameters, the backward or inverse transform can be obtained from the forward transform as follows:

[0055]

[0056] as well as

[0057]

[0058] If any coefficients in the RGB to YCC matrix deviate from a standard-based matrix (e.g., a Rec.709 matrix), using a regular Rec.709 matrix to convert to RGB will result in incorrect colors and brightness. This is despite the fact that all encoding standards are designed under the assumption that they operate in the YCbCr color space. and The parameters seem to offer great flexibility, but they cannot be chosen randomly because they may affect coding efficiency. Therefore, two main issues need to be addressed: (1) maintaining overall video coding efficiency and (2) preventing the leakage of luminance / chrominance information.

[0059] In embodiments, new matrices can be obtained based on existing color space transformation matrices such as Rec.709 or Rec.2020. A standard color transformation set (e.g., those defined in Rec.709 or Rec.2020) can be represented as:

[0060]

[0061] as well as

[0062]

[0063] For example, under Rec.709, for the SMPTE range (for an 8-bit codeword, [16, 235]):

[0064]

[0065] as well as

[0066]

[0067] For the entire range ([0, 1))

[0068]

[0069] as well as

[0070]

[0071] make

[0072]

[0073] In an embodiment, The new coefficients in are obtained by... With another 3×3 matrix W s The product obtained by multiplication, of which,

[0074]

[0075] therefore:

[0076]

[0077] as well as

[0078]

[0079] For W equal to the identity matrix I s ,but

[0080]

[0081] The three input components of the i-th pixel of the RGB SDR signal are represented as a 3×1 vector.

[0082]

[0083] In application After transformation, the intermediate result is For the i-th pixel,

[0084]

[0085] In relation to W s After multiplication, the result is s″ = W s s′ is given by the following formula:

[0086]

[0087] For a 3×1 vector, define a new vector.

[0088]

[0089] The final 3×1 vector is

[0090]

[0091] or

[0092]

[0093] s″ and The combination of values ​​represents the base layer, which will be passed to the video compression module:

[0094]

[0095] or

[0096]

[0097] First, consider the case where the brightness component remains constant. Then, reduce the design degrees of freedom from 12 parameters to 8 parameters.

[0098]

[0099] as well as

[0100]

[0101] The matrix can be further constrained to prevent luminance from leaking into chrominance (i.e., no luminance and chrominance crosstalk), as shown in the following equation:

[0102]

[0103] Since the luminance component remains constant, it can be assumed that the video compression efficiency depends on the energy of the two chrominance components. Therefore, in the embodiment, the coding efficiency can be controlled by having another constraint as follows: via W s The chromaticity energy after transformation should be proportional to the chromaticity energy before transformation. In other words, given a scaling value α... 2 :

[0104] (s″ i,1 ) 2 +(s″ i,2 ) 2 =α 2 ((s′ i,1 ) 2 +(s′ i,2 ) 2 (16)

[0105] If α = 1, the chroma energy before and after adjustment is the same, and the impact on the video compression codec should be minimal. On the other hand, if it is desirable to increase the margin of color encryption to make guessing the parameters more difficult, a smaller δ ≥ 0 difference (e.g., δ = 0.1) can be allowed.

[0106] 1-δ≤α≤1+δ,

[0107] At the same time, energy constraints are satisfied.

[0108] Note that increasing this chroma scaling factor is equivalent to allocating more codewords to chroma in the base layer. Therefore, the baseband signal quantization loss will be reduced, and better video coding gain is expected, which has been verified by numerical simulations. Then, a 3×3 matrix W can be obtained. s The design guidelines are as follows. From W s Substituting the matrices into s″1 and s″2, we get

[0109] (w 11 s′ i,1 +w 12 s′ i,2 ) 2 +(w 21 s′ i,1 +w 22 s′ i,2 ) 2 =α 2 (s′ i,1 ) 2 +α 2 (s′ i,2 ) 2

[0110] By rearranging the equations

[0111] ((w 11 ) 2 +(w 21 ) 2 -α 2 )(s′ i,1 ) 2 +((w 12 ) 2 +(w 22 ) 2 -α 2 )(s′ i,2 ) 2

[0112] +2(w 11 w 12 +w 12 w 22 )s′ i,1 s′ i,2 =0

[0113] To satisfy all s′ i,1 and s′ i,2 The values ​​and matrix coefficients must satisfy the following set of equations:

[0114]

[0115] Equation (17) represents the core constraint of the proposed design. Examples, without limitation, that satisfy Equation (17) include:

[0116] First Example

[0117] Let θ represent the control parameter, such that, for example

[0118]

[0119] Note that by simply replacing the positions of cos(θ) and sin(θ) and changing their corresponding signs to satisfy equation (17), equation (18) can be rewritten with a variety of other variations. For example:

[0120]

[0121] The θ parameter can be expressed in degrees or radians, for example, θ∈[0,2π] or θ∈[0,360] degrees.

[0122] Second example

[0123] A simple exchange of Cb and Cr also satisfies equation (17), producing

[0124]

[0125] Furthermore, by appropriately changing the signs, alternative variations can be generated, as shown in the following formula:

[0126]

[0127] To generate 3×1 Vectors, combined into a 3×3 matrix W s and The vector needs to satisfy the following constraint so that the final output of the normalized data is within the range [0, 1], that is, no data is pruned:

[0128]

[0129] or

[0130]

[0131] Or more accurately, we get 3×1 Vector, such that

[0132]

[0133] if If it does not exist, then W needs to be redesigned. s For example, under equation (18), new values ​​need to be chosen for α and θ.

[0134] Note that adjusting the offset is equivalent to adjusting the DC value of the chroma. When using lossy video compression, this adjustment will not affect encoding efficiency.

[0135] In an embodiment, simple linear scaling and offset can also be applied to the luminance component to utilize additional base layer codewords and thus reduce baseband quantization error and improve coding efficiency. In this case, equation (5) is reduced from 12 parameters to 10 parameters.

[0136]

[0137] as well as

[0138]

[0139] Furthermore, to satisfy equations (17) and (22), the values ​​of β and p0 also need to be constrained so that the shaped luminance SDR It will not be clipped. Therefore, the following constraint is given.

[0140]

[0141] but

[0142] 0≤βs′ i,0 +p0+n s,0 ≤1. (24)

[0143] According to equation (24), given the value of β, the offset p0 is determined as:

[0144] -βmin{s′ i,0}-n s,0 ≤p0≤1-βmax{s′ i,0}-n s,0 (25)

[0145] When mapping brightness to the full range of the substrate (e.g., [0, 1)), these parameters can be selected as follows:

[0146]

[0147] p0=-βmin{s′ i,0}-n s,0 (27)

[0148] Figure 2 A method for determining a color transformation matrix according to an embodiment is described. Example flow (200). In step 205, W is in simplified 3×3 form of the intermediate 3×3 matrix. sInitially, this form satisfies the constraint for maintaining coding efficiency, namely equation (17). For example, without loss of generality, a combination of equations (18) and (23) can be used to begin with the following form.

[0149]

[0150] as well as

[0151]

[0152] In this example, the parameters β, α, θ and the vector need to be determined.

[0153] In this embodiment, the design process can be carried out as follows:

[0154] Step 210: Given a standard-based color transformation matrix (e.g., according to the matrix of Rec.709, etc.) and reference input SDR data (s i ),calculate And determine the minimum and maximum values ​​of the transformed data in each color channel, i.e., min{s′ i,ch} and max{s′ i,ch}, where ch represents one of the three color channels (e.g., Y, Cb, or Cr).

[0155] • Step 215: In this step, a brightness scaling value can be selected. For example, if β ≠ 1, then β needs to be determined. Its upper limit is defined by the maximum and minimum brightness values ​​calculated in step 210. Choosing β > 1 can improve the coding efficiency in brightness, but may force more iterations to generate the output color transformation matrix. According to equation (26),

[0156]

[0157] Step 220: In this step, you can... s In this step, select chroma-related parameters, such as α and θ. Choosing α > 1 can improve encoding efficiency in chroma. Choosing θ ≠ 0 can change the color, thus (optionally) providing a simple content protection mechanism. At the end of this step, matrix W... s Completely limited.

[0158] Step 225: Place W s The output of step 210 is applied to calculate s″ for each pixel. i =W s s′ i These values ​​will assist in selecting the offset 3×1 vector. For example, according to equations (25) and (27), given a standard-based offset vector...

[0159]

[0160]

[0161] If no feasible solution is found according to step 225, it may be necessary to return to step 220 and select alternative values ​​for α and / or θ after optionally adjusting the brightness scaling value (227). Otherwise, in defining the vector... After obtaining the parameters, the final matrix can be generated in step 230. and offset

[0162] In the decoder, given and YCC to RGB color conversion operations include:

[0163]

[0164] Existing reshaping techniques can be frame-based, where new reshaping metadata is transmitted with each new frame, or scene-based, where new reshaping metadata is transmitted with each new scene. As used herein, for a video sequence (a sequence of frames / images), the term "scene" can refer to a series of consecutive frames in a video sequence that share similar brightness, color, and dynamic range characteristics. Scene-based approaches work well in video workflow pipelines that access the entire scene; however, it is not uncommon for some scenes to consist of only a single frame (e.g., in fade-ins and fade-outs).

[0165] As needed, process 200 can also be updated at the frame level or scene level; however, to maintain coding efficiency, it is preferable to maintain the same color transformation throughout the sequence, scene, or group of pictures (GOP). To adapt process 200 to a scene or group of pictures, the minimum and maximum pixel values ​​based on the frame can be replaced with scene-based minimum and maximum pixel values. For example, equation (26) can be rewritten as:

[0166]

[0167] in,

[0168]

[0169]

[0170] For L consecutive frames, max{s ′,j i,0} and min{s ′,ji,0} represents the maximum and minimum pixel values ​​in the j-th frame s′ of the 0th (e.g., luminance) color component. At the sequence level, and These can be calculated values ​​or range values ​​constrained by SMPTE. The scene-based minimum and maximum pixel values ​​for each color component of s″ can be calculated in a similar manner.

[0171] Constructing Integer Functions

[0172] As discussed earlier, non-standard color transformation (110) can also be used as part of a system that allows for the reconstruction of a near approximation of a reference HDR input (105) from received shaped SDR data. After applying an appropriate inverse SDR transform to the received base layer (156), backward shaping (158) can be applied; however, with appropriate forward shaping, the additional inverse color transformation step can be eliminated and the reconstructed HDR content in an appropriate (e.g., standard-defined) YCbCr color space can be directly extracted. This process will be described below. Without limitation, some steps are described in the context of a representation known as a three-dimensional mapping table (3DMT or d3DMT), in which, for the sake of simplicity, each frame is represented as a three-dimensional mapping table, in which each color component (e.g., Y, Cb, or Cr) is subdivided into “bins” and the image is represented by the average pixel value within each bin instead of an explicit pixel value. Details of the 3DMT formulation can be found in reference [2].

[0173] In this embodiment, the step (130) of generating the forward shaping function requires two sets of reference inputs: (1) reference HDRYCbCr pixel values ​​(120) and (2) reference SDR YCC pixel values ​​(125). For the HDR signal, the three color components of the i-th pixel of the reference RGB HDR data input are represented as a 3×1 vector.

[0174]

[0175] The RGB to YCbCr matrix and offset vector to be used during HDR color conversion (115) are represented as follows:

[0176]

[0177] as well as

[0178]

[0179] By multiplying the matrix And add an offset to obtain the YCbCr HDR value. Right now Therefore, for the i-th pixel

[0180]

[0181] For SDR input, instead of obtaining the output from the existing standard-based RGB to YCbCr transformation, the output of the modified transformation is given by equation (13):

[0182]

[0183] Given these two reference signals, known methods can be applied to generate forward and backward shaping functions. As an example, and not a limitation, the luminance forward shaping function can be constructed via a cumulative density function (CDF) matching process (references [2-3]), where: a) the HDR luminance signal is calculated. The histogram is represented as {h v (b)}, and calculate the color-transformed SDR luminance signal. The histogram is represented as {h s (b)}; b) Construct the SDR cumulative density function and the HDR cumulative density function based on these two histograms; and c) Use CDF matching to obtain a lookup table (FLUT) to map the HDRYCbCr input data to the SDR YCC data. This can be expressed as:

[0184]

[0185] in, This indicates the forward mapping to be used in block 132.

[0186] For chroma, it is used in HDR signals. With SDR signal The parameterized (e.g., MMR) representations between them are used to construct the d3DMT representation (reference [2]). The d3DMT pairs are represented as... and Assume there are P such warehouses / pairs. For example, given the MMR extended model, it can be described as:

[0187]

[0188] It can collect all P bins of all HDR three-channel inputs.

[0189]

[0190] Then, let the ch channel in the SDR be represented as:

[0191]

[0192] Let “a / B” represent this, where T represents the transpose matrix.

[0193] B F =(V F ) T V F ,

[0194]

[0195] Under the mean square error (MSE) minimization criterion, the prediction error is minimized to the target reference SDR signal s. ch The optimal MMR coefficient can be given as:

[0196]

[0197] Furthermore, the forward-shaped SDR d3DMT entry can be expressed as:

[0198]

[0199] The backward luminance shaping function can be generated as follows: First, the forward LUT is traced backward to generate the backward shaped 1D-LUT (BLUT), represented as... Then The mapping approximates a piecewise polynomial function (e.g., using an 8-segment approximation that uses a second-order polynomial).

[0200] To generate a chroma backward-shaped mapping, a polynomial approximation (e.g., MMR) can also be used, in such a way that a parameterized model is considered between the following two items: (a) the forward-shaped 3-channel SDR bin in d3DMT and (2) the original HDR chroma bin in d3DMT.

[0201] The MMR extension form of the i-th forward-shaped SDR bin is expressed as:

[0202]

[0203] After collecting P bins of input from all three SDR channels

[0204]

[0205] The vector representation of the ch channel in HDR is as follows:

[0206]

[0207] And another "a / B" is represented as

[0208] B B =(S B ) T S B,

[0209]

[0210] The optimal MMR coefficients that minimize the prediction error between the reference HDR signal and the reconstructed HDR signal are given by the following equation:

[0211]

[0212] Furthermore, the reconstructed HDR signal (160) can be generated as:

[0213]

[0214] Metadata 152 includes and The value of .

[0215] References

[0216] Each of these references is included in its full text via citation.

[0217] 1. GM. Su et al., “Multiple color channel multiple regression predictor”, US Patent 8,811,490.

[0218] 2. Q. Song et al., PCT patent application serial number PCT / US2019 / 031620, "High-fidelity fullreference and high-efficiency reduced reference encoding in end-to-endsingle-layer backward compatible encoding pipeline", filed on May 9, 2019, published as WO 2019 / 217751.

[0219] 3. B. Wen et al., “Inverse luma / chroma mappings with histogram transfer and approximation”, U.S. Patent 10,264,287.

[0220] 4.Rec.ITU-R BT.709, “Parameter values ​​for the HDTV standards for production and international program exchange”, ITU, 06 / 2015.

[0221] 5. Rec.ITU-R BT.2020, "Parameter values ​​for ultra-high definition television systems for production and international program exchange", ITU, 06 / 2014.

[0222] Example computer system implementation

[0223] Embodiments of the present invention may be implemented using computer systems, systems configured with electronic circuits and components, integrated circuit (IC) devices (such as microcontrollers, field-programmable gate arrays (FPGAs) or other configurable or programmable logic devices (PLDs), discrete-time or digital signal processors (DSPs), application-specific integrated circuits (ASICs)), and / or means including one or more of such systems, devices, or components. The computer and / or IC may execute, control, or implement instructions related to color transformations under the coding efficiency constraints of HDR video coding, as described herein. The computer and / or IC may calculate any of the various parameters or values ​​related to the color transformations under the coding efficiency constraints of HDR video coding as described herein. Image and video dynamic range extension embodiments may be implemented in hardware, software, firmware, and various combinations thereof.

[0224] Some embodiments of the present invention include a computer processor that executes software instructions that cause the processor to perform the method of the present invention. For example, one or more processors, such as those in a display, encoder, set-top box, transcoder, etc., can implement a method for color transformation under coding efficiency constraints for HDR video coding as described above by executing software instructions in a processor-accessible program memory. The present invention can also be provided in the form of a program product. The program product may include any non-transitory tangible medium carrying a set of computer-readable signals including instructions that, when executed by a data processor, cause the data processor to perform the method of the present invention. The program product according to the present invention can take any of a variety of non-transitory and tangible forms. The program product may include, for example, physical media, such as magnetic data storage media including floppy disks and hard disk drives, optical data storage media including CD-ROMs and DVDs, electronic data storage media including ROMs and flash RAMs, etc. The computer-readable signals on the program product may optionally be compressed or encrypted.

[0225] In the case of the components mentioned above (e.g., software modules, processors, components, devices, circuits, etc.), unless otherwise specified, references to such components (including references to “devices”) should be interpreted as including any component that performs the function of the described component (e.g., functionally equivalent) as an equivalent of that component, including components that are not structurally equivalent to the disclosed structures that perform the functions of the illustrated exemplary embodiments of the invention.

[0226] Equivalents, extensions, alternatives and miscellaneous

[0227] This description presents an example embodiment relating to color transformation under coding efficiency constraints of HDR video coding. In the foregoing description, embodiments of the invention have been described with reference to numerous specific details, which may vary depending on the implementation. Therefore, the sole and exclusive indication of the invention and the applicant's inventive intent is the set of claims published in specific form according to this application, wherein such claim publication includes any subsequent corrections. Any definitions expressly set forth herein with respect to terms contained in such claims shall govern the meaning of such terms as used in the claims. Therefore, any limitations, elements, properties, features, advantages, or attributes not expressly referenced in the claims should not in any way limit the scope of such claims. Therefore, this specification and drawings should be viewed in an illustrative rather than restrictive sense.

[0228] Various aspects of the present invention can be understood from the following enumerated example embodiments (EEE):

[0229] EEE 1. A method for generating color transformation data under coding efficiency constraints, the method comprising:

[0230] Receive an input image in a first dynamic range and a first color space;

[0231] A first 3×3 color transformation matrix and a first 3×1 offset vector are obtained based on the color transformation from the first color space to a standard color space.

[0232] The first 3×3 color transformation matrix is ​​applied (210) to the input image to generate a first image with luminance and chrominance components;

[0233] Generate the minimum and maximum pixel values ​​of the first image for the luminance component and the chrominance component;

[0234] The brightness scaling value (215) is calculated based on the minimum and maximum brightness pixel values ​​in the first image.

[0235] The elements of the 2×2 chromaticity transformation matrix are calculated based on the coding efficiency constraint and the first 3×3 color transformation matrix;

[0236] A 3×3 intermediate transformation matrix is ​​formed based on the brightness scaling value and the 2×2 chromaticity transformation matrix; the 3×3 intermediate transformation matrix is ​​applied (225) to the first image to generate the second image;

[0237] An intermediate 3×1 offset vector is generated based on the first offset vector and the minimum and maximum pixel values ​​in the luminance and chrominance components of the second image; and (230) is generated:

[0238] The output 3×3 color transformation matrix is ​​obtained by multiplying the first 3×3 color transformation matrix with the 3×3 intermediate transformation matrix;

[0239] The output 3×1 offset vector is obtained by adding the first 3×1 offset vector to the intermediate 3×1 offset vector; and

[0240] The output image in the second color space is obtained by adding the output 3×1 offset vector to the pixel values ​​of the second image.

[0241] EEE 2. The method as described in EEE 1, further comprising:

[0242] Generate an inverse 3×3 color transformation matrix and an inverse 3×1 offset vector to transform image data from the second color space to the first color space, wherein the inverse 3×1 offset vector is equal to the output 3×1 offset vector, and the inverse 3×3 color transformation matrix includes the inverse of the output 3×3 color transformation matrix.

[0243] EEE 3. The method as described in EEE 1 or 2, wherein the first color space includes the RGB color space, the standard-based color space includes the YCbCr color space, and the first 3×3 color transformation matrix and the first 3×1 offset vector are based on the Rec.709 standard or the Rec.2020 standard.

[0244] EEE 4. The method as described in any of the preceding EEE methods, wherein the brightness scaling value is set to 1 or defined by the following formula.

[0245]

[0246] Among them, s′ i,0 This represents the brightness value of the i-th pixel in the first image normalized to [0,1], max{s′ i,0} represents the maximum brightness value in the first image, and min{s′i,0} represents the minimum brightness value in the first image.

[0247] EEE 5. The method as described in any of the preceding EEEs, wherein, when calculating the elements of the 2×2 chroma transformation matrix, the coding efficiency constraint includes an energy preservation criterion for the chroma values ​​of pixels in the first image before and after the chroma transformation performed by the 2×2 chroma transformation matrix.

[0248] EEE 6. The method as described in EEE 5, wherein the energy conservation criterion includes

[0249] (s″ i,1 ) 2 +(s″ i,2 ) 2 =α 2 ((s′ i,1 ) 2 +(s′ i,1 ) 2 ),

[0250] For the i-th pixel, s′ i,2 and s′ i,2 This represents the chromaticity value of the pixel before the application of the 2×2 chromaticity transformation matrix, and s″ i,1 and s″ i,2 This represents the chromaticity value of the pixel after the 2×2 chromaticity transformation matrix is ​​applied.

[0251] EEE 7. As described in EEE 6, wherein, for a 2×2 transformation matrix

[0252]

[0253] Under the energy conservation criterion

[0254]

[0255] Where α is a constant close to 1.

[0256] EEE 8. The method as described in EEE 7, wherein the 2×2 transformation matrix includes

[0257]

[0258] Here, θ corresponds to the free parameter in [0, 2π].

[0259] EEE 9. The method as described in any of the preceding EEE methods, wherein the 3×3 intermediate transformation matrix W is formed. s Including formation

[0260]

[0261] Where β represents the brightness scaling value, and α and θ are parameters.

[0262] EEE 10. As described in EEE 6 or EEE 9, where α = 1, or

[0263] 1-δ≤α≤1+δ,

[0264] Where δ is a parameter less than 0.5.

[0265] EEE 11. The method as described in any of the preceding EEE methods, wherein the intermediate 3×1 offset vector [p0 p1p2] is generated. T This includes calculating the elements of the offset vector to satisfy the offset vector constraints.

[0266]

[0267] Among them, s″ i,j Let represent the pixel value of the j-th color component of the i-th pixel in the second image, and [n s,0 n s,1 n s,2 ] T This represents the first offset vector.

[0268] EEE 12. The method as described in EEE 11, further comprising iterating in the manner that, upon determining that at least one offset vector element does not satisfy its offset vector constraint, the method is as follows:

[0269] An updated 3×3 intermediate transformation matrix is ​​formed based on the updated luminance scaling values ​​and / or the updated 2×2 chromaticity transformation matrix;

[0270] The updated 3×3 intermediate transformation matrix is ​​applied (225) to the first image to generate the second image;

[0271] And generate the intermediate 3×1 offset vector to satisfy the offset vector constraint.

[0272] EEE 13. The method as described in any one of the preceding EEE methods, further comprising:

[0273] Receive a first high dynamic range (HDR) image in the first color space, the first HDR image representing the same scene as the first image but in a second dynamic range higher than the first dynamic range; apply a standard-based color transformation to the first HDR image to generate a second HDR image in the second dynamic range;

[0274] A forward shaping function is applied to the second HDR image to generate a base image in the first dynamic range, wherein the forward shaping function maps pixel values ​​from the second dynamic range and the standard-based color space to pixel values ​​in the first dynamic range and the second color space.

[0275] The base image is compressed to generate a compressed image in the second color space; and an output signal is stored, the output signal including the compressed image, the output 3×3 color transformation matrix, the output 3×1 offset vector, and parameters of the backward shaping function, to reconstruct an output HDR image based on the compressed image.

[0276] EEE 14. The method as described in EEE 13, wherein receiving the output signal in the decoder further comprises:

[0277] Decompress the compressed image to generate intermediate base signals in the first dynamic range and the second color space; and

[0278] Based on the intermediate base signal, the output 3×3 color transformation matrix, and the output 3×1 offset vector, a first output signal is generated in the first color space and the first dynamic range, or

[0279] Based on the intermediate base signal and the parameters of the backward shaping function, an HDR output signal in the standard color space is generated, wherein the backward shaping function maps pixel values ​​from the second color space and the first dynamic range to the standard-based color space and the second dynamic range.

[0280] EEE 15. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for performing the method according to any one of EEE 1 to 14 using one or more processors.

Claims

1. A method for generating color transformation data under coding efficiency constraints, the method comprising: Receive an input image in a first dynamic range and a first color space; A first 3×3 color transformation matrix and a first 3×1 offset vector are obtained based on the color transformation from the first color space to a standard color space. The first 3×3 color transformation matrix is ​​applied (210) to the input image to generate a first image with luminance and chrominance components; Generate the minimum and maximum pixel values ​​for the light intensity component of the first image; The brightness scaling value (215) is calculated based on the minimum and maximum brightness pixel values ​​in the first image. The elements of the 2×2 chromaticity transformation matrix are calculated based on the coding efficiency constraint and the first 3×3 color transformation matrix; A 3×3 intermediate transformation matrix is ​​formed based on the brightness scaling value and the 2×2 chromaticity transformation matrix; The 3×3 intermediate transformation matrix is ​​applied (225) to the first image to generate the second image; Generate the minimum and maximum pixel values ​​for the luminance component and the chrominance component of the second image; An intermediate 3×1 offset vector is generated based on the first 3×1 offset vector and the minimum and maximum pixel values ​​in the luminance and chrominance components of the second image; and (230) is generated: The output 3×3 color transformation matrix is ​​obtained by multiplying the first 3×3 color transformation matrix with the 3×3 intermediate transformation matrix; The output 3×1 offset vector is obtained by adding the first 3×1 offset vector to the intermediate 3×1 offset vector; as well as The output image in the second color space is obtained by adding the output 3×1 offset vector to the pixel values ​​of the second image.

2. The method of claim 1, further comprising: Generate an inverse 3×3 color transformation matrix and an inverse 3×1 offset vector to transform image data from the second color space to the first color space, wherein the inverse 3×1 offset vector is equal to the output 3×1 offset vector, and the inverse 3×3 color transformation matrix includes the inverse of the output 3×3 color transformation matrix.

3. The method as described in claim 1 or 2, wherein, The first color space includes the RGB color space, the standard-based color space includes the YCbCr color space, and the first 3×3 color transformation matrix and the first 3×1 offset vector are based on the Rec.709 standard or the Rec.2020 standard.

4. The method as described in claim 1 or 2, wherein, The brightness scaling value is set to 1 or defined by the following formula. where s' i i,0 denotes the luminance value of the i-th pixel in the first image normalized to [0, 1], max{s' i i,0} denotes the maximum luminance value in the first image, and min{s' i i,0} denotes the minimum luminance value in the first image.

5. The method as described in claim 1 or 2, wherein, When calculating the elements of the 2×2 chroma transformation matrix, the coding efficiency constraint includes an energy preservation criterion for the chroma values ​​of pixels in the first image before and after the chroma transformation performed by the 2×2 chroma transformation matrix.

6. The method of claim 5, wherein, The energy conservation criteria include (s" i,1 ) 2 +(s" i,2 ) 2 = a 2 ((s' i,1 ) 2 +(s' i,2 ) 2 ), For the i-th pixel, s′ i,1 and s′ i,2 This represents the chromaticity value of the pixel before the application of the 2×2 chromaticity transformation matrix, and s″ i,1 and s″ i,2 This represents the chromaticity value of the pixel after the 2×2 chromaticity transformation matrix is ​​applied.

7. The method of claim 6, wherein, For a 2×2 transformation matrix Under the energy conservation criterion Where α is a constant close to 1.

8. The method of claim 7, wherein, The 2×2 transformation matrix includes Here, θ corresponds to the free parameter in [0, 2π].

9. The method as claimed in claim 1 or 2, wherein, Forming the 3×3 intermediate transformation matrix W s Including formation Where β represents the brightness scaling value, and α and θ are parameters.

10. The method of claim 6, wherein, α = 1, or 1-δ≤α≤1+δ, Where δ is a parameter less than 0.

5.

11. The method as claimed in claim 1 or 2, wherein, Generate the intermediate 3×1 offset vector [p0 p1 p2] T This includes calculating the elements of the offset vector to satisfy the offset vector constraints. Among them, s″ i,j Let represent the pixel value of the j-th color component of the i-th pixel in the second image, and [n s,0 n s,1 n s,2 ] T This represents the first 3×1 offset vector.

12. The method of claim 11, further comprising, when determining that at least one offset vector element does not satisfy its offset vector constraint, iterating in the following manner: An updated 3×3 intermediate transformation matrix is ​​formed based on the updated luminance scaling values ​​and / or the updated 2×2 chromaticity transformation matrix; The updated 3×3 intermediate transformation matrix is ​​applied (225) to the first image to generate the second image; And generate the intermediate 3×1 offset vector to satisfy the offset vector constraint.

13. The method of claim 1 or 2, further comprising: Receive a first high dynamic range (HDR) image in the first color space, the first HDR image representing the same scene as the first image, but in a second dynamic range higher than the first dynamic range; A standard-based color transformation is applied to the first HDR image to generate a second HDR image in the second dynamic range; A forward shaping function is applied to the second HDR image to generate a base image in the first dynamic range, wherein the forward shaping function maps pixel values ​​from the second dynamic range and the standard-based color space to pixel values ​​in the first dynamic range and the second color space. The base image is compressed to generate a compressed image in the second color space; and an output signal is stored, the output signal including the compressed image, the output 3×3 color transformation matrix, the output 3×1 offset vector, and parameters of the backward shaping function, to reconstruct an output HDR image based on the compressed image.

14. The method of claim 13, wherein, Receiving the output signal in the decoder further includes: Decompress the compressed image to generate intermediate base signals in the first dynamic range and the second color space; and Based on the intermediate base signal, the output 3×3 color transformation matrix, and the output 3×1 offset vector, a first output signal is generated in the first color space and the first dynamic range, or Based on the intermediate base signal and the parameters of the backward shaping function, an HDR output signal in the standard color space is generated, wherein the backward shaping function maps pixel values ​​from the second color space and the first dynamic range to the standard-based color space and the second dynamic range.

15. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for performing the method according to any one of claims 1 to 14 using one or more processors.

16. A computer program product comprising instructions that, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 14.