System and method for derivation of cross-component geometry / wedgelet partitions
Geometric partitioning modes enhance video coding by allowing flexible partitioning boundaries, improving motion prediction and overall video quality for complex objects.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- TENCENT AMERICA LLC
- Filing Date
- 2023-06-20
- Publication Date
- 2026-06-26
Smart Images

Figure 0007881064000007 
Figure 0007881064000008 
Figure 0007881064000009
Abstract
Description
[Technical Field]
[0001] Related applications
[0001] This application claims priority to U.S. Provisional Patent Application No. 63 / 416,362, entitled "Cross-component geometric / wedgelet partition derivation," filed on 14 October 2022, and is a continuation of U.S. Patent Application No. 18 / 208,114, entitled "Systems and Methods for Cross-Component Geometric / Wedgelet Partition Derivation," filed on 9 June 2023, all of which are incorporated herein by reference.
[0002]
[0002] The disclosed embodiments generally relate to video coding, including, but not limited to, systems and methods for cross-component geometric / wedgelet partition derivation. [Background technology]
[0003]
[0003] Digital video is supported by a variety of electronic devices, including digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video gaming consoles, smartphones, video teleconferencing devices, and video streaming devices. Electronic devices transmit and receive digital video data over communication networks, and / or store digital video data in storage devices. Due to the limited bandwidth capacity of communication networks and the limited memory resources of storage devices, video coding may be used to compress video data according to one or more video coding standards before the video data is transmitted or stored.
[0004]
[0004] Multiple video codec standards have been developed. For example, video coding standards include AOMedia Video 1 (AV1), General Purpose Video Coding (VVC), Joint Exploration Test Model (JEM), High Efficiency Video Coding (HEVC / H.265), Advanced Video Coding (AVC / H.264), and Moving Picture Expert Group (MPEG) coding. Video coding generally utilizes prediction methods (e.g., interpretation, intrapretation, etc.) that take advantage of the redundancy inherent in video data. Video coding aims to compress video data into a format that uses a lower bitrate while avoiding or minimizing degradation of video quality.
[0005]
[0005] HEVC, also known as H.265, is a video compression standard designed as part of the MPEG-H project. The ITU-T and ISO / IEC published the HEVC / H.265 standard in 2013 (version 1), 2014 (version 2), 2015 (version 3), and 2016 (version 4). General-Purpose Video Coding (VVC), also known as H.266, is a video compression standard intended as a successor to HEVC. The ITU-T and ISO / IEC published the VVC / H.266 standard in 2020 (version 1) and 2022 (version 2). AV1 is an open video coding format designed as an alternative to HEVC. On January 8, 2019, a verified version 1.0.0 with Errata 1 of the specification was released. [Overview of the project] [Problems that the invention aims to solve]
[0006]
[0006] As described above, encoding (compression) reduces bandwidth and / or storage space requirements. Both lossless and lossy compression can be employed, as will be described in detail later. Lossless compression refers to a technique in which an exact copy of the original signal can be reconstructed from the compressed original signal through the decoding process. Lossy compression refers to a coding / decoding process in which the original video information is not necessarily fully preserved during coding and is not necessarily fully recoverable during decoding. When using lossy compression, the reconstructed signal may not be equivalent to the original signal, but the distortion between the original and reconstructed signals is made small enough to make the reconstructed signal useful for the intended application. The amount of distortion that is acceptable depends on the application. For example, users of some consumer video streaming applications may tolerate higher distortion than users of movie or television broadcast applications. The compression ratio achievable by a particular coding algorithm may be selected or tuned to reflect a variety of distortion tolerances, and higher acceptable distortion generally allows for coding algorithms that result in higher loss and higher compression ratios.
[0007]
[0007] This disclosure describes deriving partitioning boundaries that are not limited to a set of predefined partitioning patterns that use only one straight line as the partitioning boundary. Such single-line partitioning boundaries may not efficiently model irregular partitioning patterns. Thus, existing partitioning modes may be suboptimal for more complex video objects. In these cases, a more accurate partitioning mode can better represent the shape of the video object, thus improving the accuracy of motion prediction, and thus improving the quality / accuracy of video coding and decoding. [Means for solving the problem]
[0008]
[0008] According to several embodiments, a method for video coding is provided. The method includes receiving video data including a picture, wherein the picture is coded using at least a first color component and a second color component, and the picture includes a first block coded in geometric partition mode, wherein the first block includes a first geometric partition and a second geometric partition; reconstructing a sample of the first color component in the first geometric partition of the first block; deriving a sample of the second color component in the first geometric partition of the first block based on the reconstructed sample of the first color component of the first block; and decoding the first block in the picture based on at least the reconstructed samples in the first geometric partition of the first color component and the second color component of the first block. This method includes obtaining reconstructed data of the first color component of a first block of video data using a first geometric partition, and generating reconstructed data of the second color component of a first block of video data based on the reconstructed data of the first color component of the first block of video data using a second geometric partition.
[0009]
[0009] According to some embodiments, a computing system is provided, such as a streaming system, a server system, a personal computer system, or other electronic device. The computing system includes a control circuit and a memory for storing one or more sets of instructions. One or more sets of instructions, including instructions for performing any of the methods described herein. In some embodiments, the computing system includes encoder components and / or decoder components.
[0010]
[0010] According to some embodiments, a non-temporary computer-readable storage medium is provided. The non-temporary computer-readable storage medium stores one or more sets of instructions for execution by a computing system. One or more sets of instructions, including instructions for performing any of the methods described herein.
[0011]
[0011] Accordingly, devices and systems are disclosed along with methods for coding video. Such methods, devices, and systems may complement or replace conventional methods, devices, and systems for coding video.
[0012]
[0012] The features and advantages described herein are not necessarily all, and in particular, several additional features and advantages will become apparent to those skilled in the art in view of the drawings, specification and claims provided herein. Furthermore, it should be noted that the language used herein has been selected primarily for readability and educational purposes and is not necessarily selected to define or limit the subject matter described herein.
[0013]
[0013] A more detailed description may be given by reference to the features of various embodiments, some of which are shown in the accompanying drawings, so that the present disclosure may be understood in detail. However, the accompanying drawings only illustrate the features relevant to the present disclosure and should not necessarily be considered limiting, as such description may lead to other effective features, as will be understood by those skilled in the art by reading the present disclosure. [Brief explanation of the drawing]
[0014] [Figure 1]
[0014] This is a block diagram illustrating an exemplary communication system according to several embodiments. [Figure 2A]
[0015] This is a diagram showing exemplary elements of encoder components according to several embodiments. [Figure 2B]
[0016] A block diagram showing exemplary elements of a decoder component according to some embodiments. [Figure 3]
[0017] A block diagram showing an exemplary server system according to some embodiments. [Figure 4A]
[0018] A diagram showing different examples of geometric partitioning mode (GPM) splits grouped by equivalent angles according to some embodiments. [Figure 4B]
[0019] A diagram of an example of an extended merge prediction process according to some embodiments. [Figure 4C]
[0020] A diagram of an example for generating blending weight w0 using the geometric partitioning mode according to some embodiments. [Figure 5A]
[0021] A diagram showing an example of geometric partitioning mode prediction according to some embodiments. [Figure 5B]
[0022] A diagram showing an exemplary adjusted partition pattern candidate according to some embodiments. [Figure 5C] A diagram showing an exemplary adjusted partition pattern candidate according to some embodiments. [Figure 5D]
[0023] A diagram showing an example of adjusting a partition boundary independently of a partition pattern candidate according to some embodiments. [Figure 6]
[0024] A flowchart showing an exemplary method of encoding a video according to some embodiments.
Best Mode for Carrying Out the Invention
[0015]
[0025] By convention, the various features shown in the drawings are not necessarily depicted to a consistent scale, and the same reference numbers may be used throughout the specification and drawings to indicate similar features.
[0016] This disclosure describes, in particular, the use of various partitioning techniques for partitioning video blocks for better motion prediction and higher quality encoding. For example, existing partitioning modes may be suboptimal for more complex video objects because their GPM / wedgelet designs may allow for a limited set of predefined partitioning patterns that use only one straight line as the partitioning boundary. Such straight-line partitioning boundaries may not provide the most efficient partitioning pattern for irregular objects. In these cases, a more precise partitioning mode can better represent the shape of the video object, thus improving the accuracy of motion prediction and, consequently, the quality / accuracy of video encoding and decoding.
[0017] Exemplary Systems and Devices
[0026] Figure 1 is a block diagram showing a communication system 100 according to several embodiments. The communication system 100 includes a source device 102 and a plurality of electronic devices 120 (e.g., electronic devices 120-1 to 120-m) that are communicatively coupled to one or more networks. In some embodiments, the communication system 100 is a streaming system for use with video-enabled applications such as video conferencing applications, digital TV applications, and media storage and / or distribution applications.
[0018]
[0027] Source device 102 includes a video source 104 (e.g., a camera component or media storage) and an encoder component 106. In some embodiments, the video source 104 is a digital camera (e.g., configured to create an uncompressed video sample stream). The encoder component 106 generates one or more encoded video bitstreams from the video stream. The video stream from video source 104 may have a higher data volume compared to the encoded video bitstream 108 generated by the encoder component 106. Since the encoded video bitstream 108 has a lower data volume (less data) compared to the video stream from the video source, the encoded video bitstream 108 requires less bandwidth to transmit and less storage space to store compared to the video stream from video source 104. In some embodiments, source device 102 does not include the encoder component 106 (e.g., it is configured to transmit uncompressed video data to one or more networks 110).
[0019]
[0028] One or more networks 110 represent any number of networks that transmit information between the source device 102, the server system 112, and / or the electronic device 120, including, for example, wireline (wired) and / or wireless communication networks. One or more networks 110 may exchange data in circuit-switched channels and / or packet-switched channels. Typical networks include telecommunications networks, local area networks, wide area networks, and / or the Internet.
[0020]
[0029] One or more networks 110 include a server system 112 (e.g., a distributed / cloud computing system). In some embodiments, the server system 112 is or includes a streaming server (configured to store and / or deliver video content, such as an encoded video stream from a source device 102). The server system 112 includes a coder component 114 (configured to encode and / or decode video data, for example). In some embodiments, the coder component 114 includes an encoder component and / or a decoder component. In various embodiments, the coder component 114 is instantiated as hardware, software, or a combination thereof. In some embodiments, the coder component 114 is configured to decode an encoded video bitstream 108 using different encoding standards and / or methodologies to produce encoded video data 116, and to re-encode the video data. In some embodiments, the server system 112 is configured to produce multiple video formats and / or encodings from the encoded video bitstream 108.
[0021]
[0030] In some embodiments, the server system 112 functions as a media-aware network element (MANE). For example, the server system 112 may be configured to prune an encoded video bitstream 108 to adjust potentially different bitstreams to one or more of the electronic devices 120. In some embodiments, the MANE is provided separately from the server system 112.
[0022]
[0031] Electronic device 120-1 includes a decoder component 122 and a display 124. In some embodiments, the decoder component 122 is configured to decode encoded video data 116 to generate an outgoing video stream that can be rendered on a display or other type of rendering device. In some embodiments, one or more of the electronic devices 120 do not include a display component (for example, including media storage that is communicably coupled to an external display device). In some embodiments, the electronic device 120 is a streaming client. In some embodiments, the electronic device 120 is configured to access a server system 112 to retrieve encoded video data 116.
[0023]
[0032] The source device and / or multiple electronic devices 120 may be referred to as “terminal devices” or “user devices.” In some embodiments, one or more of the source device 102 and / or electronic devices 120 are instances of a server system, a personal computer, a portable device (e.g., a smartphone, tablet, or laptop), a wearable device, a video conferencing device, and / or other types of electronic devices.
[0024]
[0033] In an exemplary operation of the communication system 100, source device 102 transmits an encoded video bitstream 108 to server system 112. For example, source device 102 may encode a stream of pictures captured by the source device. Server system 112 receives the encoded video bitstream 108 and may decode and / or encode the encoded video bitstream 108 using coder components 114. For example, server system 112 may apply encoding to video data, which is more optimal for network transmission and / or storage. Server system 112 may transmit the encoded video data 116 (e.g., one or more encoded video bitstreams) to one or more of the electronic devices 120. Each electronic device 120 may decode the encoded video data 116 to restore a video picture and optionally display it.
[0025]
[0034] In some embodiments, the transmission described above is a unidirectional data transmission. Unidirectional data transmission may be used in media serving applications, etc. In some embodiments, the transmission described above is a bidirectional data transmission. Bidirectional data transmission may be used in video conferencing applications. In some embodiments, the encoded video bitstream 108 and / or encoded video data 116 are encoded and / or decoded according to one of the video coding / compression standards described herein, such as HEVC, VVC, and / or AV1.
[0026]
[0035] Figure 2A is a block diagram showing exemplary elements of an encoder component 106 according to several embodiments. The encoder component 106 receives a source video sequence from a video source 104. In some embodiments, the encoder component includes a receiver (e.g., a transceiver) component configured to receive the source video sequence. In some embodiments, the encoder component 106 receives a video sequence from a remote video source (e.g., a video source that is a component of a device different from the encoder component 106). The video source 104 may provide a source video sequence in the form of a digital video sample stream, which may be of any preferred bit depth (e.g., 8-bit, 10-bit, or 12-bit), any color space (e.g., BT.601 Y CrCB, or RGB), and any preferred sampling structure (e.g., Y CrCb 4:2:0 or Y CrCb 4:4:4). In some embodiments, the video source 104 is a storage device that stores previously captured / prepared video. In some embodiments, the video source 104 is a camera that captures local image information as a video sequence. Video data may be provided as a series of individual pictures that give movement when viewed sequentially. These pictures themselves may be organized as a spatial array of pixels, with each pixel containing one or more samples depending on the sampling structure, color space, etc., in use. Those skilled in the art will readily understand the relationship between pixels and samples. The following description will focus on samples.
[0027]
[0036] The encoder component 106 is configured to encode and / or compress pictures from a source video sequence into an encoded video sequence 216 in real time or under other time constraints required by the application. One function of the controller 204 is to execute an appropriate coding rate. In some embodiments, the controller 204 controls and is functionally coupled to other functional units, as described below. Parameters set by the controller 204 may include rate control relation parameters (e.g., picture skipping, quantizer, and / or λ value for rate-distortion optimization techniques), picture size, picture group (GOP) layout, maximum motion vector search range, etc. Other functions of the controller 204 may relate to the encoder component 106 which is optimized for a particular system design, so those skilled in the art will be able to easily identify the other functions of the controller 204.
[0028]
[0037] In some embodiments, the encoder component 106 is configured to operate within a coding loop. In a simplified example, the coding loop includes a source coder 202 (responsible for creating symbols, such as a symbol stream, based on, for example, an input picture to be coded and one or more reference pictures) and a (local) decoder 210. The decoder 210 reconstructs the symbols to create sample data in a similar manner to the (remote) decoder (when the compression between the symbols and the coded video bitstream is reversible). The reconstructed sample stream (sample data) is input to the reference picture memory 208. Since decoding the symbol stream yields a bit-exact result independent of the decoder location (local or remote), the contents in the reference picture memory 208 are also bit-exact between the local encoder and the remote encoder. In this way, the prediction portion of the encoder interprets the same sample values as reference picture samples that the decoder would interpret when using predictions during decoding. This principle of reference picture simultaneity (and, if simultaneity cannot be maintained due to, for example, channel errors, the resulting drift) is known to those skilled in the art.
[0029]
[0038] The operation of decoder 210 may be the same as that of a remote decoder, such as decoder component 122, which will be described in detail below in relation to Figure 2B. However, referring briefly to Figure 2B, since symbols are available and the encoding / decoding of symbols to the coded video sequence by the entropy coder 214 and parser 254 may be reversible, the entropy decoding portion of decoder component 122, including buffer memory 252 and parser 254, may not be fully implemented in local decoder 210.
[0030]
[0039] An observation that can be made at this point is that any decoder techniques other than pars / entropy decoding present in the decoder must also be present in the corresponding encoder in a substantially equivalent functional form. For this reason, the subject matter disclosed will focus on decoder operation. The description of encoder techniques may be omitted, as encoder techniques are the inverse of the decoder techniques that are described comprehensively. Further details are required only in a few areas and are provided below.
[0031]
[0040] As part of its operation, the source coder 202 may perform motion-compensated predictive coding, predictively coding the input frame by referencing one or more previously coded frames from a video sequence designated as a reference frame. In this way, the coding engine 212 codes the difference between the pixel blocks of the input frame and the pixel blocks of one or more reference frames that may be selected as predictive references to the input frame. The controller 204 may manage the coding operation of the source coder 202, including, for example, setting parameters and subgroup parameters used to encode the video data.
[0032]
[0041] Decoder 210 decodes the coded video data of a frame that may be designated as a reference frame based on symbols created by source coder 202. The operation of coding engine 212 may, advantageously, be an irreversible process. When the coded video data is decoded by a video decoder (not shown in Figure 2A), the reconstructed video sequence may be a copy of the source video sequence with some errors. Decoder 210 replicates the decoding process that may be performed on the reference frame by a remote video decoder, which may cause the reconstructed reference frame to be stored in reference picture memory 208. In this way, encoder component 106 locally stores a copy of the reconstructed reference frame with common content as the reconstructed reference frame that will be acquired by the remote video decoder (without transmission errors).
[0033]
[0042] The predictor 206 may perform a predictive search for the coding engine 212. That is, for a new frame to be coded, the predictor 206 may search the reference picture memory 208 for sample data (as candidate reference pixel blocks) or metadata such as reference picture motion vectors and block shapes that can act as appropriate predictive references for the new picture. The predictor 206 may operate on a sample block-by-pixel-block basis to find appropriate predictive references. In some cases, the input picture may have predictive references drawn from multiple reference pictures stored in the reference picture memory 208, as determined by the search results obtained by the predictor 206.
[0034]
[0043] The outputs of all the aforementioned functional units can undergo entropy coding in the entropy coder 214. The entropy coder 214 converts the symbols generated by the various functional units into coded video sequences by reversibly compressing those symbols according to techniques known to those skilled in the art (e.g., Huffman coding, variable-length coding, and / or arithmetic coding).
[0035]
[0044] In some embodiments, the output of the entropy coder 214 is coupled to a transmitter. The transmitter may be configured to buffer the coded video sequence (one or more) created by the entropy coder 214 in order to prepare it for transmission over a communication channel 218, which may be a hardware / software link to a storage device that will store the coded video data. The transmitter may be configured to merge the coded video data from the source coder 202 with other data to be transmitted, such as coded audio data and / or auxiliary data streams (source not shown). In some embodiments, the transmitter may transmit additional data along with the coded video. The source coder 202 may include such data as part of the coded video sequence. The additional data may include time / space / SNR enhancement layers, other forms of redundant data such as redundant pictures and slices, supplemental enhancement information (SEI) messages, visual usability information (VUI) parameter set fragments, and the like.
[0036]
[0045] The controller 204 can manage the operation of the encoder component 106. During coding, the controller 204 may assign a coded picture type to each coded picture, which may affect the coding technique applied to each picture. For example, a picture may be assigned as an intra-picture (I-picture), a predictive picture (P-picture), or a bidirectional predictive picture (B-picture). An intra-picture can be coded and decoded without using other frames in the sequence as a source for prediction. Some video codecs allow different types of intra-pictures, including, for example, independent decoder refresh (IDR) pictures. Those skilled in the art will be aware of their variations of I-pictures and their respective applications and characteristics, and therefore they will not be repeated here. A predictive picture can be coded and decoded using intra-prediction or inter-prediction with at most one motion vector and reference index to predict the sample value of each block. A bidirectional predictive picture can be coded and decoded using intra-prediction or inter-prediction with at most two motion vectors and reference indices to predict the sample values for each block. Similarly, a multi-predictive picture can use three or more reference pictures and associated metadata for the reconstruction of a single block.
[0037]
[0046] A source picture can typically be spatially subdivided into multiple sample blocks (for example, blocks of 4x4, 8x8, 4x8, or 16x16 samples each) and coded block by block. Blocks can be predictively coded by referencing other (already coded) blocks determined by the coding assignment applied to each picture in the block. For example, blocks in picture I can be coded unpredictably, or they can be coded predictively by referencing already coded blocks of the same picture (spatial prediction or intra-prediction). Pixel blocks in picture P can be coded unpredictably via spatial prediction or temporal prediction by referencing one previously coded reference picture. Blocks in picture B can be coded unpredictably via spatial prediction or temporal prediction by referencing one or two previously coded reference pictures.
[0038]
[0047] Video can be captured as multiple source pictures (video pictures) in a time sequence. Intra-picture prediction (often abbreviated as intra-prediction) utilizes spatial correlations in a given picture, while inter-picture prediction utilizes (temporal or other) correlations between pictures. In one example, a specific picture to be encoded / decoded, called the current picture, is partitioned into blocks. When a block in the current picture is similar to a reference block in a previously coded and still-buffered reference picture in the video, the block in the current picture can be coded by a vector called a motion vector. The motion vector points to a reference block in the reference picture and may have a third dimension to identify the reference picture if multiple reference pictures are in use.
[0039]
[0048] The encoder component 106 may perform coding operations in accordance with a predetermined video coding technique or standard, such as any of those described herein. In these operations, the encoder component 106 may perform various compression operations, including predictive coding operations that leverage temporal and spatial redundancy in the input video sequence. Thus, the coded video data may conform to the syntax specified by the video coding technique or standard being used.
[0040]
[0049] Figure 2B is a block diagram showing exemplary elements of a decoder component 122 according to several embodiments. The decoder component 122 in Figure 2B is coupled to channel 218 and display 124. In some embodiments, the decoder component 122 includes a transmitter coupled to a loop filter 256 and configured to transmit data to display 124 (for example, via a wired or wireless connection).
[0041]
[0050] In some embodiments, the decoder component 122 includes a receiver coupled to channel 218 and configured to receive data from channel 218 (e.g., via a wired or wireless connection). The receiver may be configured to receive one or more coded video sequences to be decoded by the decoder component 122. In some embodiments, the decoding of each coded video sequence is independent of other coded video sequences. Each coded video sequence may be received from channel 218, which may be a hardware / software link to a storage device that stores coded video data. The receiver may receive coded video data together with other data, e.g., coded audio data and / or auxiliary data streams, which may be forwarded to their respective use entities (not shown). The receiver may isolate the coded video sequence from other data. In some embodiments, the receiver receives additional (redundant) data along with the coded video. The additional data may be included as part of one or more coded video sequences. The additional data may be used by the decoder component 122 to decode the data and / or to more accurately reconstruct the original video data. Additional data may include, for example, time, space, or SNR enhancement layers, redundant slices, redundant pictures, or forward error correction codes.
[0042]
[0051] According to some embodiments, the decoder component 122 includes a buffer memory 252, a parser 254 (sometimes called an entropy decoder), a scaler / inverse unit 258, an intra-picture prediction unit 262, a motion compensation prediction unit 260, an aggregator 268, a loop filter unit 256, a reference picture memory 266, and a current picture memory 264. In some embodiments, the decoder component 122 is implemented as an integrated circuit, a series of integrated circuits, and / or other electronic circuits. In some embodiments, the decoder component 122 is implemented at least partially in software.
[0043]
[0052] Buffer memory 252 is coupled between channel 218 and parser 254 (for example, to eliminate network jitter). In some embodiments, buffer memory 252 is separate from decoder component 122. In some embodiments, a separate buffer memory is provided between the output of channel 218 and decoder component 122. In some embodiments, in addition to buffer memory 252 inside decoder component 122 (configured to handle playout timing, for example), a separate buffer memory is provided outside decoder component 122 (for example, to eliminate network jitter). When receiving data from a storage / transfer device with sufficient bandwidth and controllability, or from an isosynchronous network, buffer memory 252 may not be required or may be small. For use in best-effort packet networks such as the Internet, buffer memory 252 may be required, may be relatively large, may be advantageously adaptive in size, and may be at least partially implemented in an operating system or similar element (not shown) outside decoder component 122.
[0044]
[0053] The parser 254 is configured to reconstruct symbols 270 from the coded video sequence. These symbols may include, for example, information used to manage the operation of decoder component 122 and / or information for controlling rendering devices such as the display 124. The control information for (one or more) rendering devices may be, for example, in the form of supplemental enhancement information (SEI) messages or video usability information (VUI) parameter set fragments (not shown). The parser 254 parses (entropy decodes) the coded video sequence. The coding of the coded video sequence may follow video coding techniques or standards and may follow principles well known to those skilled in the art, including variable-length coding, Huffman coding, and arithmetic coding with or without context sensitivity. From the coded video sequence, the parser 254 may extract a set of subgroup parameters for at least one of the subgroups of pixels in the video decoder, based on at least one parameter corresponding to that group. Subgroups can include picture groups (GOP), pictures, tiles, slices, macroblocks, coding units (CU), blocks, transformation units (TU), and prediction units (PU). Parser 254 can also extract information such as transformation coefficients, quantizer parameter values, and motion vectors from the coded video sequence.
[0045]
[0054] The reconstruction of symbol 270 may involve multiple different units, depending on the type of coded video picture or part thereof (such as interpictures and intrapictures, interblocks and intrablocks), and other factors. Which units are involved and how they are involved may be controlled by parser 254 through subgroup control information parsed from the coded video sequence. The flow of such subgroup control information between parser 254 and the following multiple units is not illustrated for clarity.
[0046]
[0055] In addition to the functional blocks already described, the decoder component 122 can be conceptually subdivided into several functional units, as described below. In actual implementations operating under commercial constraints, many of these units may interact closely with each other and, at least partially, integrate with one another. However, for the purpose of illustrating the subject matter being disclosed, the conceptual subdivision into the following functional units is maintained.
[0047]
[0056] The scaler / inverse unit 258 receives from the parser 254, as one or more symbols 270, the quantized transformation coefficients and control information (such as which transformation to use, block size, quantization factor, and / or quantization scaling metric). The scaler / inverse unit 258 can output a block containing sample values that can be input to the aggregator 268.
[0048]
[0057] In some cases, the output samples of the scaler / inverse unit 258 relate to intracoded blocks, i.e., blocks that do not use predictive information from previously reconstructed pictures but can use predictive information from parts of the picture that were previously reconstructed. Such predictive information may be provided by the intrapicture predictive unit 262. The intrapicture predictive unit 262 may generate a block of the same size and shape as the block being reconstructed, using surrounding already reconstructed information fetched from the current (partially reconstructed) picture from the picture memory 264. The aggregator 268 may, sample by sample, add the predictive information generated by the intrapicture predictive unit 262 to the output sample information provided by the scaler / inverse unit 258.
[0049]
[0058] In other cases, the output samples of the scaler / inverse unit 258 relate to an interconnected and potentially motion-compensated block. In such cases, the motion-compensated prediction unit 260 can access the reference picture memory 266 to fetch samples to be used for prediction. After motion-compensating the fetched samples according to the symbols 270 relating to the block, these samples can be added by the aggregator 268 to the output of the scaler / inverse unit 258 (called residual samples or residual signals in this case) to generate output sample information. The address in the reference picture memory 266 from which the motion-compensated prediction unit 260 fetches the predicted samples may be controlled by a motion vector. The motion vector may be available to the motion-compensated prediction unit 260 in the form of a symbol 270 which may have, for example, X, Y, and reference picture components. Motion compensation may also include interpolation of sample values fetched from the reference picture memory 266 when the exact motion vector of a subsample is in use, a motion vector prediction mechanism, etc.
[0050]
[0059] The output samples of the aggregator 268 can undergo various loop filtering techniques in the loop filter unit 256. The video compression technique may include in-loop filtering techniques, which are contained in the coded video bitstream and controlled by parameters made available to the loop filter unit 256 as symbols 270 from the parser 254, but may also respond to metadata obtained during decoding of previous portions (in the decoding order) of the coded picture or coded video sequence, as well as to previously reconstructed and loop-filtered sample values.
[0051]
[0060] The output of the loop filter unit 256 may be a sample stream that is output to a render device such as the display 124 and can also be stored in the reference picture memory 266 for use in future interpicture prediction.
[0052]
[0061] Some coded pictures, once fully reconstructed, can be used as reference pictures for future predictions. Once a coded picture is fully reconstructed and identified as a reference picture (for example, by parser 254), the current reference picture can become part of reference picture memory 266, and fresh current picture memory can be reallocated before starting the reconstruction of subsequent coded pictures.
[0053]
[0062] The decoder component 122 may perform decoding operations according to a predetermined video compression technique that may be documented in a standard, such as one of the standards described herein. The coded video sequence may comply with the syntax specified by the video compression technique or standard being used, in that it conforms to the syntax of the video compression technique or standard as specified in the video compression technique documentation or standard, and in particular in the profile documentation therein. Also, for compliance with some video compression techniques or standards, the complexity of the coded video sequence may be within limits defined by the level of the video compression technique or standard. In some cases, the level limits the maximum picture size, maximum frame rate, maximum reconstruction sample rate (measured, for example, in megasamples per second), maximum reference picture size, etc. The limits set by the level may, in some cases, be further limited through the virtual reference decoder (HRD) specification and metadata for HRD buffer management signaled in the coded video sequence.
[0054]
[0063] Figure 3 is a block diagram showing a server system 112 according to several embodiments. The server system 112 includes a control circuit 302, one or more network interfaces 304, memory 314, a user interface 306, and one or more communication buses 312 for interconnecting these components. In some embodiments, the control circuit 302 includes one or more processors (e.g., CPU, GPU, and / or DPU). In some embodiments, the control circuit includes one or more field-programmable gate arrays (FPGAs), hardware accelerators, and / or one or more integrated circuits (e.g., application-specific integrated circuits).
[0055]
[0064] The (one or more) network interface 304 may be configured to interface with one or more communication networks (e.g., wireless networks, wireline networks, and / or optical networks). The communication networks may be local, wide-area, metropolitan, automotive and industrial, real-time, latency-tolerant, etc. Examples of communication networks include local area networks such as Ethernet, wireless LAN, and cellular networks, including GSM, 3G, 4G, 5G, and LTE; TV wireline or wireless wide-area digital networks, including cable TV, satellite TV, and terrestrial broadcast TV; and automotive and industrial networks, including CANBus. Such communications may be unidirectional receive-only (e.g., broadcast TV), unidirectional transmit-only (e.g., CANbus to several CANbus devices), or bidirectional (e.g., with other computer systems using local or wide-area digital networks). Such communications may include communications to one or more cloud computing networks.
[0056]
[0065] The user interface 306 includes one or more output devices 308 and / or one or more input devices 310. The input devices 310 may include one or more of the following: a keyboard, mouse, trackpad, touchscreen, data glove, joystick, microphone, scanner, camera, etc. The output devices 308 may include one or more of the following: an audio output device (e.g., a speaker), a visual output device (e.g., a display or monitor), etc.
[0057]
[0066] Memory 314 may include high-speed random-access memory (such as DRAM, SRAM, DDR RAM, and / or other random-access solid-state memory devices) and / or non-volatile memory (such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, and / or other non-volatile solid-state memory devices). Memory 314 optionally includes one or more storage devices located remotely from the control circuit 302. Memory 314, or alternatively, one or more non-volatile solid-state memory devices within Memory 314, includes a non-temporary computer-readable storage medium. In some embodiments, Memory 314, or the non-temporary computer-readable storage medium of Memory 314, stores the following programs, modules, instructions, and data structures, or subsets or supersets thereof: Operating System 316 includes procedures for handling various basic system services and for performing hardware-dependent tasks. • A network communication module 318 used to connect the server system 112 to other computing devices via one or more network interfaces 304 (for example, via wired and / or wireless connections). • A coding module 320 for performing various functions related to encoding and / or decoding data, such as video data. In some embodiments, the coding module 320 is an instance of the coder component 114. The coding module 320 is, but is not limited to, Regarding the decoder component 122, the decoding module 322 performs various functions related to decoding the encoded data, such as those previously described, Regarding the encoder component 106, the encoding module 340 performs various functions related to encoding data, as previously described. This includes one or more of the following, and For example, a picture memory 352 for storing pictures and picture data for use with the coding module 320. In some embodiments, the picture memory 352 includes one or more of the following: a reference picture memory 208, a buffer memory 252, a current picture memory 264, and a reference picture memory 266.
[0058]
[0067] In some embodiments, the decoding module 322 includes a parsing module 324 (configured to perform various functions previously described with respect to, for example, the parser 254), a conversion module 326 (configured to perform various functions previously described with respect to, for example, the scalar / inverse conversion unit 258), a prediction module 328 (configured to perform various functions previously described with respect to, for example, the motion compensation prediction unit 260 and / or the intrapicture prediction unit 262), and a filter module 330 (configured to perform various functions previously described with respect to, for example, the loop filter 256).
[0059]
[0068] In some embodiments, the coding module 340 includes a coding module 342 (configured to perform various functions previously described with respect to, for example, the source coder 202 and / or the coding engine 212) and a prediction module 344 (configured to perform various functions previously described with respect to, for example, the predictor 206). In some embodiments, the decoding module 322 and / or the coding module 340 includes a subset of the modules shown in Figure 3. For example, a shared prediction module is used by both the decoding module 322 and the coding module 340.
[0060]
[0069] Each of the modules identified above and stored in memory 314 corresponds to a set of instructions for performing the functions described herein. The modules identified above (e.g., sets of instructions) do not need to be implemented as separate software programs, procedures, or modules, and therefore various subsets of these modules may be combined or possibly rearranged in various embodiments. For example, the coding module 320 may optionally not include separate decoding and coding modules, but rather use the same set of modules to perform both sets of functions. In some embodiments, memory 314 stores a subset of the modules and data structures identified above. In some embodiments, memory 314 stores additional modules and data structures not described above, such as an audio processing module.
[0061]
[0070] In some embodiments, the server system 112 includes a web or hypertext transfer protocol (HTTP) server, a file transfer protocol (FTP) server, and web pages and applications implemented using Common Gateway Interface (CGI) scripts, PHP hypertext preprocessor (PHP), Active Server Pages (ASP), hypertext markup language (HTML), extensible markup language (XML), Java, JavaScript, asynchronous JavaScript and XML (AJAX), XHP, Javelin, Wireless Universal Resource File (WURFL), and the like.
[0062]
[0071] Figure 3 shows server systems 112 in several embodiments, but is intended to be a functional description of various features that may be present in one or more server systems, rather than a schematic diagram of the structures of the embodiments described herein. In practice, and as will be recognized by those skilled in the art, items shown separately may be combined, and some items may be separated. For example, some items shown separately in Figure 3 may be implemented on a single server, and a single item may be implemented by one or more servers. The actual number of servers used to implement server system 112, and how features are allocated among them, may vary from implementation to implementation and, at choice, may in part depend on the amount of data traffic that the server system handles during peak and average usage periods.
[0063] Exemplary coding techniques
[0072] Geometric partitioning modes in VVC
[0064]
[0073] VVC supports a Geometric Partitioning Mode (GPM) for inter-prediction. GPM separates a coding block into two regions by one of 64 predefined types of straight lines, generates inter-predicted samples for each separated region, and then blends them to obtain a final inter-predicted sample. In some embodiments, GPM includes non-horizontal splitting of a block into two parts. The Geometric Partitioning Mode (GPM) may focus on inter-picture predicted CUs. When GPM is applied to a CU, the CU is split into two parts via a linear partitioning boundary in the conventional method. The location of the partitioning boundary can be mathematically defined by an angular parameter φ and an offset parameter ρ. These parameters can be quantized and combined in a GPM partitioning index lookup table. The GPM partitioning index of the current CU can be coded into a bitstream. For example, in a VVC, 64 partitioning modes are supported by the GPM for k, l ∈ {3...6} for a CU with size w×h = 2k×2l (with respect to luma samples). After partitioning, two GPM sections (partitions) contain individual motion information that can be used to predict the corresponding section in the CU. In some embodiments, only unidirectional motion-compensated prediction (MCP) is enabled for each section of the GPM, and consequently, the memory bandwidth required for MCP in the GPM is equal to that for normal bidirectional MCP. To simplify motion information coding and reduce the number of possible combinations for the GPM, the motion information may be coded in merge mode. A GPM merge candidate list may be derived from a conventional merge candidate list to ensure that it contains only unidirectional motion information.
[0065]
[0074] Geometric partitioning mode is one type of merge mode. Other types of merge modes include normal merge mode, MMVD mode, CIIP mode, and subblock merge mode. Geometric partitioning mode is signaled using the CU level flag as the type of merge mode. In total, there are 64 partitions, each with a possible CU size w×h=2, except for 8×64 and 64×8. m ×2 n For m, n ∈ {3···6}, this is supported by geometric partitioning modes.
[0066]
[0075] As shown in Figure 4A, when this mode is used, the CU is split into two parts by a line whose geometric location is identified. Example 400 in Figure 4A shows different GPM splits grouped by equivalent angles. The locations of the splitting lines are mathematically derived from the angle and offset parameters of a particular partition. Each part of the geometric partition in the CU is interpreted using its own motion. In some embodiments, single prediction is enabled for each partition, and therefore each part has one motion vector and one reference index. Single prediction motion constraints ensure that only two motion-compensated predictions are required for each CU, as in conventional biprediction. The single prediction motion for each partition is derived using the process described below.
[0067]
[0076] When the geometric partitioning mode is currently used for the CU, the geometric partition index (angle and offset) indicating the partition mode of the geometric partition, and two merge indices (one for each partition) are further signaled. The number of maximum GPM candidate sizes is explicitly signaled in the SPS, specifying syntax binarization for the GPM merge index. After predicting each of the parts of the geometric partition, the sample values along the geometric partition edges are adjusted using a blending process with adaptive weights. This is the prediction signal for the entire CU, and then the transformation and quantization processes are applied to the entire CU as in the case of other prediction modes. Finally, the motion field of the CU predicted using the geometric partitioning mode is stored.
[0068]
[0077] Single prediction candidate list construction
[0069]
[0078] The single prediction candidate list is directly derived from the merge candidate list constructed according to the extended merge prediction process in Figure 4B, which represents the single prediction MV selection for the geometric partitioning mode. Let n be the index of the single prediction motion in the geometric single prediction candidate list. The LX motion vector of the nth extended merge candidate, where X is equal to the parity of n, is used as the nth single prediction motion vector for the geometric partitioning mode. These motion vectors are marked with "x" in Table 402 in Figure 4B. Following the determination that there is no corresponding LX motion vector for the nth extended merge candidate, the L(1-X) motion vector of the same candidate is used instead as the single prediction motion vector for the geometric partitioning mode.
[0070]
[0079] Blending along geometric partitioning edges
[0071]
[0080] After predicting each part of the geometric partition using its own motion, blending is applied to two prediction signals to derive samples around the geometric partition edge. The blending weights for each position of the CU are derived based on the distance between the individual position and the partition edge.
[0072]
[0081] The distance of the position (x, y) to the partition edge is derived as follows. d(x,y)=(2x + 1 - w)cos(φ i )+(2y + 1 - h)sin(φ i ) - ρ j (1) ρ j =ρ x,j cos(φ i )+ρ y,j sin(φ i ) (2)
[0073]
Number
[0074]
Number
[0075]
[0082] Here, i and j are indices for the angles and offsets of the geometric partition that depend on the signaled geometric partition index. The signs of ρ x,j and ρ y,j depend on the angle index i. <00003w1(x,y)=1-w0(x,y) (7)
[0078]
[0084] partIdx depends on the angle index i. An example 404 for generating blending weights w0 using geometric partitioning mode is shown in Figure 4C.
[0079]
[0085] Motion field memory for geometric partitioning mode
[0080]
[0086] In some embodiments, Mv1 from a first part of the geometric partition, Mv2 from a second part of the geometric partition, and Mv, which is a combination of Mv1 and Mv2, are stored in the motion field of the geometrically partitioned mode coded CU.
[0081]
[0087] The stored motion vector type for each individual position in the motion field is determined as follows:
[0082]
[0088] sType=abs(motionIdx)<32?2:(motionIdx≦0?(1-partIdx):partIdx) (8)
[0083]
[0089] Here, motionIdx is equal to d(4x+2,4y+2), which is recalculated from equation (1). partIdx depends on the angle index i.
[0084]
[0090] If sType is equal to 0 or 1, Mv0 or Mv1 is stored in the corresponding motion field; otherwise, if sType is equal to 2, a combined Mv from Mv0 and Mv2 is stored. The combined Mv is generated using the following process:
[0085]
[0091] If Mv1 and Mv2 are from different reference picture lists (one from L0 and the other from L1), then Mv1 and Mv2 are simply combined to form a bipredictive motion vector.
[0086]
[0092] Instead, if Mv1 and Mv2 are from the same list, only the single predicted movement Mv2 will be stored.
[0087]
[0093] Wedgelet partition in AV1
[0088]
[0094] Several coding techniques (e.g., AV1) apply wedgelet (or wedge) partitioning for interpretation. Wedge-based prediction is a compound prediction mode (e.g., in AV1), similar to GPM. Wedge-based prediction can be used for both inter-inter-combinations and inter-intra-combinations. The boundaries of moving objects are often difficult to approximate with on-grid block partitions. The solution is to predefine a codebook of 16 possible wedge partitions and signal wedge indices in the bitstream when coding units choose to be further partitioned in that manner. In current wedge designs in AV1 and AVM, 16 modes are supported, since up to 16 symbols can be signaled in a single syntax element using the multi-symbol adaptive context coding used in AV1 and AVM. Extending the number of modes in wedges can further increase coding performance. A 16-ary shape codebook, including partition orientations that are horizontal, vertical, or diagonal (e.g., with a tilt of ±2 or ±0.5), is designed for both square blocks 540 and rectangular blocks 542, as shown in Figure 5D. To mitigate spurious high-frequency components, often generated by directly juxtaposing two predictors, a soft-cliff shaped 2D wedge mask may be employed to smooth the edges around the intended partitions (e.g., m(i,j) is close to 0.5 around the edge, gradually changing, and becoming a binary weight at both ends).
[0089]
[0095] In wedge-based prediction mode, for each block size, a set of 16 predefined 2D weighting arrays is hardcoded. In each array, the weights are configured in such a way that they support a predefined wedge partitioning pattern. In each wedge partitioning pattern, two wedge partitions are specified along a certain edge direction and position. In some embodiments, a sample located in one of the two wedge partitions has a weight set to 64, and a sample located in the other wedge partition has a weight set to 0. In some embodiments, along the wedge partition boundary, the weights are set to values between 0 and 64 according to a predefined lookup table.
[0090]
[0096] In wedge-based prediction mode, two syntax elements are defined: wedge_index, which specifies a wedge partitioning pattern index (ranging from 0 to 15 for a set of 16 predefined 2D weighting arrays), and wedge_sign, which specifies which of the two partitions should be assigned to which prediction.
[0091]
[0097] The wedge-based prediction mode can also be applied to compound inter-intra prediction, i.e., wedge-based inter-intra prediction. In some embodiments, the prediction block is a combination of intra-prediction blocks and inter-prediction blocks, and the weights are specified using a wedge partitioning pattern specified by wedge_index (ranging from 0 to 15). The wedge-based inter-intra motion prediction mode differs from the normal wedge-based motion prediction mode described above in that the wedge_sign value, which specifies the partition with the dominant weight, is derived as 0 instead of being signaled.
[0092]
[0098] Unlike current GPM / wedgelet designs, the methods and systems described herein do not limit partitioning boundaries to a limited set of predefined partitioning patterns that use only a single straight line as the partitioning boundary. Such single-line partitioning boundaries may not efficiently model irregular partitioning patterns.
[0093]
[0099] The methods and systems described herein derive a geometric / wedgelet partition for one color component using a reconstructed sample of another color component. In some embodiments, the first and second color components may each be one of the Y (luma) color component, the Cb (chroma) color component, and the Cr (chroma) color component. In some embodiments, the first and second color components may each be one of the R (red), G (green), and B (blue) color components. For example, the first color component is Y (luma) and the second color component is Cb and / or Cr (chroma). The methods described herein may be applied to both geometric partitions and wedgelet partitions (for example, they may be applied interchangeably to geometric partitions and wedgelet partitions, replacing a geometric partition with a wedgelet partition or vice versa).
[0094]
[0100] The methods and systems described herein use a reconstructed sample of the first color component to derive a geometric partition for the second color component. In some embodiments, different geometric partitions are applied for the first and second color components. For example, for a currently coded block coded using geometric partitions, assuming a selected / signaled geometric partition (e.g., for the first color component), the geometric partition for the second color component is adjusted using a reconstructed sample of the first color component.
[0095]
[0101] In some embodiments, the second color component is adjusted using a predefined group of candidate geometric partitions based on a selected / signaled geometric partition (for example, of the first color component). Each of the candidate geometric partitions is evaluated with respect to a reconstructed block of the first color component in order to calculate a cost value. The candidate geometric partition that minimizes the cost value is selected as the adjusted geometric partition for the second color component.
[0096]
[0102] Figure 5A shows a rectangular block partitioned using a geometric partition mode with straight lines, where the partition boundary is indicated by a solid line 502 to a first portion P1 to the right of the solid line 502 and a second portion P0 to the left of the solid line 502. However, the actual object boundary 503 separating the actual first region 504 from the actual second region 506 is a curved boundary. The rectangular block shown in Figure 5A also shows a reconstructed sample 501 of the first color component. In some embodiments, the reconstructed sample 501 is the result of processing the rectangular block, which includes partitioning the rectangular block using a geometric partition mode indicated by a solid line 502.
[0097]
[0103] Figure 5B shows several modified partition pattern candidates 508, 510, 512, and 514 with linear partition boundaries. Modified partition pattern candidates 508, 510, 512, and 514 all intersect the solid line 502 at the same location but are oriented at different angles to each other. Modified partition pattern candidates 512 and 514 have steeper slopes (e.g., larger angles relative to the horizontal) compared to the solid line 502, while modified partition pattern candidates 508 and 510 have gentler slopes (e.g., smaller angles relative to the horizontal) compared to the solid line 502.
[0098]
[0104] Figure 5C shows that candidate geometric partitions are not limited to straight lines. Candidate geometric partition 516 is a curve that is convex when viewed from the second portion P0 (for example, geometric partition 516 is concave when viewed from the first portion P1). In contrast, candidate geometric partition 518 is a curve that is concave when viewed from the second portion P0 (for example, geometric partition 516 is convex when viewed from the first portion P1).
[0099]
[0105] In some embodiments, each candidate geometric partition (e.g., each partition pattern candidate) is evaluated with respect to one or more reconstructed samples (e.g., reconstructed sample 501) of the first color component in order to calculate a cost value. The partition pattern candidate with the minimum cost value is then selected as the partition pattern for the second color component.
[0100]
[0106] In some embodiments, the cost value is calculated as the variance of the samples contained within a particular partition of each candidate partition pattern. For example, the variance is the sum over all sample locations (or "samples") of the difference between the value of the variable at a particular sample location and the mean value of the variable, for a given partition. Mathematically, exemplary cost functions can be expressed using equations (9) and (10) as follows:
[0101]
[0107]
number
[0102]
[0108]
number
[0103]
[0109] Here,
[0104]
number
[0105]
[0110] For example, referring to the reconstructed sample in Figure 5A, all samples in the first region 504 are assigned to have a value of 0, and all samples in the second region 506 are assigned to have a value of 1. In some embodiments, the reconstructed sample 501 is a reconstructed sample of the first color component (e.g., Y (luma)). When the candidate geometric partition 516 in Figure 5C is used, the mean value in the second portion P0 will be 0, and the variance for the second portion will be 0. Similarly, the mean value in the first portion P1 will be 1, and the variance for the first portion will also be 0. In contrast, when the candidate geometric partition 510 in Figure 5B is used, the mean value in the second portion P0 will be greater than 1 because some portions of the second region 506 (e.g., each sample location with a value of 1) will be included in the second portion P0. Similarly, the mean value in the first portion P1 will be less than 1 because some portions of the first region 504 (e.g., each sample location with a value of 0) will be included in the first portion P1. As a result, the variance for both the first portion P1 and the second portion P0 is no longer zero. Consequently, the cost function associated with candidate geometric partition 516 is lower than the cost function for candidate geometric partition 510, and candidate geometric partition 516 is selected as the geometric partitioning mode for the second color component. In some embodiments, the geometric partition selected for the second color component is also called the adjusted partition pattern for the second color component.
[0106]
[0111] In some embodiments, instead of testing different partition pattern candidates as described above with respect to Figures 5B and 5C, the partition pattern is adjusted using only the reconstructed samples of the first color component to find an adjusted partition pattern for the second color component. As an example, in Figure 5D, the partition pattern may be adjusted sample by sample across each row or each column. For example, Figure 5D shows samples 530, 532, 534, and 536. To adjust the partition pattern for the samples across each row, the sample in the first row 522 is processed before the sample in the second row 524 is processed. In some embodiments, each of the samples 530, 532, 534, and 536 is divided into smaller samples.
[0107]
[0112] In Figure 5D, the portion of the partition boundary indicated by the solid line 502 lies within samples 530, 534, and 536. While processing sample 530, the location of the portion of the solid line 502 is shifted left or right to obtain updated cost function values, or variance values, for the partitions associated with the shifted location (e.g., a first partition to the left of the shifted location of the solid line 502 portion, and a second partition to the right of the shifted location of the solid line 502 portion). For example, the solid line 502 may be shifted away from sample 530 and moved into sample 532. The sample mean is calculated based on the updated location of line 502 such that samples to the left of the updated location of line 502 are treated as belonging to the first partition, and samples to the right of the updated location of line 502 are treated as belonging to the second partition. The process is then repeated for the second row 524 to obtain the arrangement of line 502 within each row that minimizes the cost function (for example, by shifting the position of the portion of line 502 to the left or right within that particular row) or has the minimum variance. In some embodiments, only samples adjacent to a portion of line 502 are processed, rather than processing all samples within a particular row. The position of the portion of line 502 within a particular row that minimizes the cost function or variance is then set as an adjusted partition pattern for that respective row. The process continues by adjusting the position of the portion of line 502 in the second row 524, in the same manner as obtaining the updated position of the portion of line 502 that minimizes the cost function or variance.
[0108]
[0113] In contrast, to adjust the partitioning pattern for samples across each column, the sample in the Nth column 526 is processed before the sample in the (N+1)th column 528 is processed, and vice versa. For example, while processing sample 530 in the Nth column 526, the location of the solid line 502 is shifted up or down to obtain updated cost function or variance values for the partitions associated with the shifted location (e.g., a first partition above the shifted location of the portion of the solid line 502, and a second partition below the shifted location of the portion of the solid line 502). In one scenario, the solid line 502 may be shifted down from sample 530 and moved into sample 534. The sample mean is calculated based on the updated location of line 502 such that samples above the updated location of line 502 are treated as belonging to the first partition, and samples below the updated location of line 502 are treated as belonging to the second partition. The process is then repeated for the (N+1)th column 528 to obtain the arrangement of the line 502 in each column that minimizes the cost function or variance (for example, by shifting the position of the portion of the line 502 upward or downward within that particular column). In some embodiments, only samples adjacent to the portion of the line 502 are processed, rather than processing all samples in a particular column. The position of the portion of the line 502 in a particular column that minimizes the cost function or variance is then set as an adjusted partition pattern for that respective column. The process continues by adjusting the position of the portion of the line 502 in the (N+1)th column 528, as in obtaining the updated position of the portion of the line 502 that minimizes the cost function or variance. In some embodiments, the amount of shifting (for example, left or right shift, and / or up or down shift) has a fixed offset value. In some embodiments, the amount of shifting (for example, left or right shift, and / or up or down shift) has a variable offset value.The offset value that provides the minimum variance within each partition is applied as the adjusted partition boundary.
[0109]
[0114] As a result, instead of selecting an adjusted partition pattern from several candidate partition patterns, a specific partition pattern (e.g., a linear partition pattern, a curved partition pattern, an irregular partition pattern, or another type of partition pattern) is adjusted row by row and / or column by column.
[0110]
[0115] In some embodiments, portions of the partition boundary are adjusted by left / right displacement and / or up / down displacement while maintaining the boundary angle. In some embodiments, the amounts of left / right displacement and / or up / down displacement are fixed, and the angle of the partition boundary is varied. In some embodiments, the adjustments applied to portions of the partition boundary correspond to offsets having indices in a lookup table. For example, an entry in the lookup table contains the angle of the current partition boundary and offset values for both left / right offset and / or up / down offset.
[0111]
[0116] In some embodiments, the offset value is within a predefined range, for example, [-N, +N], where exemplary values of N include, but are not limited to, 1, 2, 3, 4, ..., 16, ..., 32. In some embodiments, the motion vector used for each partition of the first color component is reused for the adjusted partition and applied to the second color component.
[0112]
[0117] In some embodiments, the blending intensity along the partition boundary of the second color component is further adjusted based on a reconstructed sample of the first color component.
[0113]
[0118] In some embodiments, the current block is signaled to determine whether the same geometric partition is applied to the second color component or whether a different, adjusted geometric partition is applied. In some embodiments, whether the same geometric partition is applied to the second color component or whether a different, adjusted geometric partition is applied is implicitly determined using the residual of the first color component (for example, whether the residual is zero, whether the residual is below a threshold, whether the residual is below a dynamically determined threshold, or whether the residual is below a static / predetermined threshold). For example, the residual may be a cost function or variance value of the first color component.
[0114]
[0119] In some embodiments, the first and second color components do not have the same dimensions (for example, 4:2:0 video content or 4:2:2 video content). In such cases, the mask for the geometric partitioning mode is first downsampled to a smaller dimension before the geometric partitioning method described herein is applied to the second color component. In some embodiments, downsampling is performed using a downsampling filter. In some embodiments, downsampling is performed without a downsampling filter.
[0115]
[0120] The proposed methods may be used separately or in any combination. Furthermore, the proposed methods may be implemented by processing circuits (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-temporary computer-readable medium.
[0116]
[0121] Figure 6 is a flowchart showing a method 600 for coding video according to several embodiments. The method 600 can be implemented in a computing system (e.g., a server system 112, a source device 102, or an electronic device 120) having a control circuit and a memory for storing instructions for execution by the control circuit. In some embodiments, the method 600 is implemented by executing instructions stored in the memory of the computing system (e.g., memory 314).
[0117]
[0122] The system receives video data including a picture (602), the picture is coded using at least a first color component and a second color component, the picture includes a first block coded in geometric partition mode, the first block includes a first geometric partition and a second geometric partition. The system reconstructs a sample of the first color component in the first geometric partition of the first block (604). Based on the reconstructed sample of the first color component of the first block, the system derives a sample of the second color component in the first geometric partition of the first block (606). The system decodes the first block in the picture based on at least the reconstructed samples in the first geometric partition of the first color component and the second color component of the first block (608).
[0118]
[0123] In some embodiments, the second geometric partition is separate from the first geometric partition, and the second geometric partition is determined based on the first geometric partition.
[0119]
[0124] In some embodiments, the second geometric partition is selected from a predefined group of candidate geometric partitions related to the first geometric partition. In some embodiments, the first geometric partition is the selected / signaled geometric partition. In some embodiments, the predefined group of candidate geometric partitions includes adjustments from the first geometric partition. In some embodiments, the predefined group of candidate geometric partitions includes various straight partition boundaries and / or curved partition boundaries.
[0120]
[0125] In some embodiments, the method includes evaluating the cost value for each predefined candidate geometric partition in a group of predefined candidate geometric partitions with respect to the reconstruction data of the first color component. In some embodiments, the predefined candidate geometric partition from the group of predefined candidate geometric partitions that minimizes the cost value is selected as the second geometric partition for the second color component.
[0121]
[0126] In some embodiments, evaluating the cost value involves calculating a variance value associated with each of the predefined candidate geometric partitions. In some embodiments, the variance value measures the deviation of data points (e.g., pixels) of a quantity (luma value) from the mean value of that quantity within each partition defined by the predefined candidate geometric partitions.
[0122]
[0127] In some embodiments, the second geometric partition is determined using reconstruction data of the first color component. In some embodiments, the second geometric partition is determined without using a predefined candidate geometric partition.
[0123]
[0128] In some embodiments, the partition boundary of a second geometric partition is determined by shifting the partition boundary from a reference position to an offset position and selecting the offset position having the minimized cost value as the location of the partition boundary. In some embodiments, the reference position is the position of the first geometric partition, and the offset position is along one of the two dimensions of the data block (e.g., row or column).
[0124]
[0129] In some embodiments, shifting a partition boundary from a reference position consists of one or more elements selected from the group comprising: shifting the partition boundary along one or more of the horizontal and vertical dimensions while fixing the angle of the partition boundary; changing the angle of the partition boundary; shifting the partition boundary by an offset corresponding to an index in a lookup table containing entries having offset values for the angle and displacement of the partition boundary; and shifting the partition boundary by an offset value within a predefined range.
[0125]
[0130] In some embodiments, the method includes obtaining a motion vector for each partition in the reconstruction data of the first color component, and applying the motion vector for each partition to the partition in the reconstruction data of the second color component. In some embodiments, the method includes reusing the motion vectors and / or the partitions in the reconstruction data of the second color component are adjusted partitions obtained using the second geometric partitions.
[0126]
[0131] In some embodiments, the method includes adjusting the blending intensity along the partition boundary of the second geometric partition in the reconstruction data of the second color component based on the reconstruction data of the first color component.
[0127]
[0132] In some embodiments, the method includes providing an indication of whether a second geometric partition applied to a second color component is equivalent to a first geometric partition, or whether the second geometric partition is different from the first geometric partition. In some embodiments, the indication includes providing a signal (as metadata) to the decoder.
[0128]
[0133] In some embodiments, providing instructions includes determining residuals related to a first color component, the method including providing instructions that a second geometric partition applied to a second color component is equivalent to a first geometric partition according to a determination that the residual related to the first color component is 0, and providing instructions that a second geometric partition applied to a second color component is different from a first geometric partition according to a determination that the residual related to the first color component is non-zero.
[0129]
[0134] In some embodiments, the method includes downsampling a mask for a first geometric partition used to generate reconstructed data for the first color component, before generating reconstructed data for the second color component of the first block of video data, based on the determination that the dimension of the first color component in the first block of video data is different from the dimension of the second color component in the first block of video data. In some embodiments, the video content includes 4:2:0 video content. In some embodiments, downsampling the mask for the first geometric partition is performed using a downsampling filter. In some embodiments, downsampling the mask for the first geometric partition is performed without using a downsampling filter.
[0130]
[0135] In some embodiments, the first color component is lumer (Y), and the second color component is chroma (e.g., Cb and / or Cr).
[0131]
[0136] Figure 6 shows several logical stages in a specific order, but the order-independent stages can be rearranged, and the other stages can be combined or separated. Some rearrangements or other groupings not described in detail will be apparent to those skilled in the art, and therefore the orderings and groupings presented herein are not exhaustive. Furthermore, it should be recognized that the stages can be implemented in hardware, firmware, software, or any combination thereof.
[0132]
[0137] Next, we will refer to some exemplary embodiments.
[0133]
[0138] (A1) In one embodiment, several embodiments include a method for video coding (e.g., method 600). In some embodiments, the video coding method is implemented in a computing system (e.g., server system 112) having memory and one or more processors. In some embodiments, the method is implemented in a coding module (e.g., coding module 320). The method includes receiving video data including a picture, wherein the picture is coded using at least a first color component and a second color component, and the picture includes a first block coded in geometric partition mode, the first block including a first geometric partition and a second geometric partition; reconstructing a sample of the first color component in the first geometric partition of the first block; deriving a sample of the second color component in the first geometric partition of the first block based on the reconstructed sample of the first color component of the first block; and decoding the first block in the picture based on at least the reconstructed samples in the first geometric partition of the first color component and the second color component of the first block.
[0134]
[0139] In some embodiments, the method includes obtaining reconstructed data of a first color component of a first block of video data using a first geometric partition, and generating reconstructed data of a second color component of a first block of video data based on the reconstructed data of the first color component of the first block of video data using a second geometric partition. In some embodiments, the second geometric partition is different from the first geometric partition. In some embodiments, the second geometric partition is the same as the first geometric partition. In some embodiments, the first color component is lumer and the second color component is chroma (Cb or Cr).
[0135]
[0140] (A2) In some embodiments of A1, the second geometric partition is separate from the first geometric partition, and the second geometric partition is determined based on the first geometric partition.
[0136]
[0141] (A3) In some embodiments of A1 or A2, the second geometric partition is selected from a predefined group of candidate geometric partitions related to the first geometric partition. In some embodiments, the first geometric partition is the selected / signaled geometric partition. In some embodiments, the predefined group of candidate geometric partitions includes adjustments from the first geometric partition. In some embodiments, the predefined group of candidate geometric partitions includes various straight partition boundaries and / or curved partition boundaries.
[0137]
[0142] (A4) In some embodiments of A3, the method includes evaluating the cost value for each predefined candidate geometric partition in a group of predefined candidate geometric partitions with respect to the reconstruction data of the first color component. In some embodiments, the predefined candidate geometric partition from the group of predefined candidate geometric partitions that minimizes the cost value is selected as the second geometric partition for the second color component.
[0138]
[0143] (A5) In some embodiments of A4, evaluating the cost value involves calculating a variance value associated with each of the predefined candidate geometric partitions. In some embodiments, the variance value measures the deviation of data points (e.g., pixels) of a quantity (luma value) from the mean value of that quantity within each partition defined by the predefined candidate geometric partitions.
[0139]
[0144] (A6) In some embodiments of A1 to A5, the second geometric partition is determined using reconstruction data of the first color component. In some embodiments, the second geometric partition is determined without using a predefined candidate geometric partition.
[0140]
[0145] In some embodiments of (A7)A6, the partition boundary of the second geometric partition is determined by shifting the partition boundary from a reference position to an offset position and selecting the offset position having the minimized cost value as the location of the partition boundary. In some embodiments, the reference position is the position of the first geometric partition, and the offset position is along one of the two dimensions of the data block (e.g., row or column).
[0141]
[0146] In some embodiments of (A8)A7, shifting the partition boundary from a reference position consists of one or more elements selected from the group consisting of shifting the partition boundary along one or more of the horizontal and vertical dimensions while fixing the angle of the partition boundary; changing the angle of the partition boundary; shifting the partition boundary by an offset corresponding to an index in a lookup table that includes entries having offset values for the angle and displacement of the partition boundary; and shifting the partition boundary by an offset value within a predefined range.
[0142]
[0147] (A9) In some embodiments of A1 to A8, the method includes obtaining a motion vector for each partition in the reconstruction data of the first color component, and applying the motion vector for each partition to the partition in the reconstruction data of the second color component. In some embodiments, the method includes reusing the motion vectors and / or the partition in the reconstruction data of the second color component is an adjusted partition obtained using the second geometric partition.
[0143]
[0148] (A10) In some embodiments of A1 to A9, the method includes adjusting the blending intensity along the partition boundary of the second geometric partition in the reconstruction data of the second color component based on the reconstruction data of the first color component.
[0144]
[0149] (A11) In some embodiments of A1 to A10, the method includes providing an indication (for example, whether a second geometric partition applied to a second color component is equivalent to a first geometric partition, or whether the second geometric partition is different from the first geometric partition. In some embodiments, the indication includes providing a signal (as metadata) to the decoder.
[0145]
[0150] (A12) In some embodiments of A11, providing instructions includes determining residuals related to a first color component, the method including providing instructions that a second geometric partition applied to a second color component is equivalent to a first geometric partition according to a determination that the residual related to the first color component is 0, and providing instructions that a second geometric partition applied to a second color component is different from a first geometric partition according to a determination that the residual related to the first color component is non-zero.
[0146]
[0151] (A13) In some embodiments of A1 to A12, the method includes downsampling a mask for a first geometric partition used to generate reconstruction data for the first color component, before generating reconstruction data for the second color component of the first block of video data, according to the determination that the dimension of the first color component in the first block of video data is different from the dimension of the second color component in the first block of video data. In some embodiments, the video content includes 4:2:0 video content. In some embodiments, downsampling the mask for the first geometric partition is performed using a downsampling filter. In some embodiments, downsampling the mask for the first geometric partition is performed without using a downsampling filter.
[0147]
[0152] (A14) In some embodiments of A1 to A13, the first color component is lumer (Y) and the second color component is chroma (e.g., Cb and / or Cr).
[0148]
[0153] In other embodiments, some embodiments include a computing system (e.g., server system 112) which includes a control circuit (e.g., control circuit 302) and a memory coupled to the control circuit (e.g., memory 314), the memory which stores one or more sets of instructions configured to be executed by the control circuit, the one or more sets of instructions which include instructions for performing any of the methods described herein (e.g., A1 to A14 above).
[0149]
[0154] In another embodiment, some embodiments include a non-temporary computer-readable storage medium that stores one or more sets of instructions for execution by a control circuit of a computing system, the one or more sets of instructions including instructions for performing any of the methods described herein (for example, A1 to A14 above).
[0150]
[0155] Terms such as "first," "second," etc., may be used herein to describe various elements, but it should be understood that these elements should not be limited by these terms. These terms are used merely to distinguish one element from another.
[0151]
[0156] The technical terms used herein are for the purpose of describing specific embodiments and do not limit the scope of the claims. In the descriptions of those embodiments and the appended claims, the singular forms “a,” “an,” and “the” are also to include the plural form unless the context otherwise explicitly indicates. Furthermore, the terms “and / or” as used herein should be understood to refer to and encompass one or more of any and all possible combinations of the listed items relating to the invention. In addition, the terms “comprises” and / or “comprising” as used herein specify the presence of the described features, assemblies, processes, operations, elements, and / or components, but should not be understood to exclude the presence or addition of one or more other features, assemblies, processes, operations, elements, components, and / or groups thereof.
[0152]
[0157] As used herein, the term “if” may be interpreted, depending on the context, as to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting” that the stated condition precedent is true. Similarly, the phrases “if” or “if” or “when” may be interpreted, depending on the context, as to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true.
[0153]
[0158] The above description has been provided with reference to specific embodiments for illustrative purposes. However, the above exemplary description is neither exhaustive nor does it limit the claims to the exact form disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments have been selected and described to best illustrate the principles of operation and practical applications, thereby enabling others skilled in the art.
Claims
1. A method for video decoding performed in a computing system having memory and one or more processors, wherein the method is: Receiving video data including a picture, wherein the picture is coded using at least a first color component and a second color component, and the picture includes a first block coded in geometric partition mode, and the first block includes a first geometric partition and a second geometric partition. Reconstructing a sample in the first geometric partition of the first color component of the first block, Based on the reconstructed sample of the first color component of the first block, a sample of the second color component of the first block in the first geometric partition is derived, at least decoding the first block in the picture based on the reconstructed samples in the first geometric partition of the first color component and the second color component of the first block. Methods that include...
2. The method according to claim 1, wherein the second geometric partition is separate from the first geometric partition, and the second geometric partition is determined based on the first geometric partition.
3. The method according to claim 1, wherein the second geometric partition is selected from a predefined group of candidate geometric partitions related to the first geometric partition.
4. The method according to claim 3, further comprising evaluating a cost value for each predefined candidate geometric partition in the group of predefined candidate geometric partitions with respect to the reconstruction data of the first color component, wherein the predefined candidate geometric partition from the group of predefined candidate geometric partitions that minimizes the cost value is selected as the second geometric partition for the second color component.
5. The method according to claim 4, wherein evaluating the cost value includes calculating the variance value associated with each of the predefined candidate geometric partitions.
6. The method according to claim 1, wherein the second geometric partition is determined using the reconstruction data of the first color component.
7. The method according to claim 6, wherein the partition boundary of the second geometric partition is determined by shifting the partition boundary from a reference position to an offset position and selecting the offset position having a minimized cost value as the location of the partition boundary.
8. Shifting the partition boundary from the reference position includes shifting the partition boundary along one or more of the horizontal and vertical dimensions while fixing the angle of the partition boundary, and changing the angle of the partition boundary. The method according to claim 7, comprising one or more elements selected from the group consisting of, shifting the partition boundary by an offset corresponding to an index in a lookup table that includes entries having offset values for the angle and displacement of the partition boundary, and shifting the partition boundary by an offset value within a predefined range.
9. In the reconstruction data of the first color component, the motion vectors for each partition are obtained, In the reconstruction data of the second color component, the respective motion vectors for each partition are applied to the partitions. The method according to claim 1, further comprising:
10. Based on the reconstruction data of the first color component, the blending intensity in the reconstruction data of the second color component is adjusted along the partition boundary of the second geometric partition. The method according to claim 1, further comprising:
11. To provide an indication of whether the second geometric partition applied to the second color component is equivalent to the first geometric partition, or whether the second geometric partition is different from the first geometric partition. The method according to claim 1, further comprising:
12. Providing the instructions includes determining the residual related to the first color component, and the method is To provide an indication that the second geometric partition applied to the second color component is equivalent to the first geometric partition, according to the determination that the residual related to the first color component is 0, The determination that the residual related to the first color component is non-zero provides an indication that the second geometric partition applied to the second color component is different from the first geometric partition. The method according to claim 11, further comprising:
13. In accordance with the determination that the sampling structure of the first color component in the first block of the video data component is different from the sampling structure of the second color component in the first block of the video data, Downsampling the mask for the first geometric partition used to generate the reconstruction data for the first color component before generating the reconstruction data for the second color component of the first block of video data. The method according to claim 1, further comprising:
14. The method according to claim 1, wherein the first color component is lumar (Y) and the second color component is chroma.
15. Control circuit and Memory and One or more sets of instructions stored in the memory and configured for execution by the control circuit, A computing system comprising, wherein one or more sets of instructions, Receiving video data including a picture, wherein the picture is coded using at least a first color component and a second color component, and the picture is geometric Receiving includes a first block coded in partition mode, wherein the first block includes a first geometric partition and a second geometric partition, Reconstructing a sample in the first geometric partition of the first color component of the first block, Based on the reconstructed sample of the first color component of the first block, a sample of the second color component of the first block in the first geometric partition is derived, at least decoding the first block in the picture based on the reconstructed samples in the first geometric partition of the first color component and the second color component of the first block. A computing system equipped with instructions for performing a certain task.
16. The computing system according to claim 15, wherein the second geometric partition is separate from the first geometric partition, and the second geometric partition is determined based on the first geometric partition.
17. The computing system according to claim 15, wherein the second geometric partition is selected from a predefined group of candidate geometric partitions related to the first geometric partition.
18. The computing system according to claim 17, further comprising evaluating a cost value for each predefined candidate geometric partition in the group of predefined candidate geometric partitions with respect to the reconstruction data of the first color component, wherein the predefined candidate geometric partition from the group of predefined candidate geometric partitions that minimizes the cost value is selected as the second geometric partition for the second color component.
19. A non-temporary computer-readable storage medium for storing one or more sets of instructions configured for execution by a computing device having a control circuit and memory, wherein the one or more sets of instructions are Receiving video data including a picture, wherein the picture is coded using at least a first color component and a second color component, and the picture includes a first block coded in geometric partition mode, and the first block includes a first geometric partition and a second geometric partition. Reconstructing a sample in the first geometric partition of the first color component of the first block, Based on the reconstructed sample of the first color component of the first block, a sample of the second color component of the first block in the first geometric partition is derived, at least decoding the first block in the picture based on the reconstructed samples in the first geometric partition of the first color component and the second color component of the first block. A non-temporary computer-readable storage medium equipped with instructions for performing the following actions.
20. The non-temporary computer-readable storage medium according to claim 19, wherein the second geometric partition is determined using reconstruction data of the first color component.
21. A program to cause a computer to perform the method described in any one of claims 1 to 14. Grams.