Method and apparatus for signaling sub-picture division information
By signaling subpicture subdivision information in video encoding, the method improves compression efficiency and error tolerance, addressing the limitations of existing standards like HEVC and VVC.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- ALIBABA GROUP HOLDING LTD
- Filing Date
- 2026-04-03
- Publication Date
- 2026-07-02
AI Technical Summary
Existing video encoding standards like HEVC and VVC lack efficient methods for signaling subpicture subdivision information, which is crucial for enhancing compression efficiency and error tolerance in video processing.
The method involves determining the presence of subpicture information in a bitstream and signaling details such as the number, position, and identifier of subpictures, as well as loop filtering options, to improve video encoding and decoding processes.
This approach enhances the compression efficiency and error tolerance of video encoding by allowing independent processing of video regions, reducing memory and bandwidth requirements while maintaining image quality.
Smart Images

Figure 2026110604000001_ABST
Abstract
Description
Technical Field
[0001] Cross - reference to related applications
[0001] This disclosure claims priority to U.S. Provisional Patent Application No. 62 / 954,014, filed Dec. 27, 2019, which is incorporated herein by reference in its entirety.
[0002] Technical Field
[0002] This disclosure generally relates to video processing, and more particularly, to methods and apparatuses for signaling sub - picture segmentation information.
Background Art
[0003] Background
[0003] Video is a set of static pictures (or “frames”) that capture visual information. To reduce memory storage and transmission bandwidth, video can be compressed before storage or transmission and restored before display. The compression process is usually referred to as encoding, and the restoration process is usually referred to as decoding. Most commonly, there are various video encoding formats that use standardized video encoding techniques based on prediction, transformation, quantization, entropy encoding, and in - loop filtering. Video encoding standards such as the High - Efficiency Video Coding (HEVC / H.265) standard, the Versatile Video Coding (VVC / H.266) standard, and the AVS standard, which specify a particular video encoding format, have been developed by standardization bodies. As evolving video encoding techniques are successively adopted by video standards, the encoding efficiency of new video encoding standards becomes even higher.
Summary of the Invention
Means for Solving the Problems
[0004]
[0004] In some embodiments, exemplary methods for signaling subpicture subdivision information include determining whether a bitstream contains subpicture information according to a subpicture information presence flag signaled in the bitstream, and in response to the bitstream containing subpicture information, signaling in the bitstream at least one of the number of subpictures in the picture, the width, height, position and identifier (ID) mapping of the target subpicture, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag.
[0005]
[0005] In some embodiments, the exemplary video processing device includes at least one memory for storing instructions and at least one processor. The at least one processor is configured to determine whether the bitstream contains subpicture information according to a subpicture information presence flag signaled in the bitstream, and to execute instructions to cause the device to signal in the bitstream at least one of the following in response to the bitstream containing subpicture information: the number of subpictures in the picture, the width, height, position and identifier (ID) mapping of the target subpicture, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag.
[0006]
[0006] In some embodiments, an exemplary non-temporary computer-readable storage medium stores a set of instructions. The set of instructions can be executed by one or more processing units to cause a video processing device to determine whether the bitstream contains subpicture information according to a subpicture information presence flag signaled in the bitstream, and in response to the bitstream containing subpicture information, to signal in the bitstream at least one of the following: the number of subpictures in the picture, the width, height, position and identifier (ID) mapping of the target subpicture, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag.
[0007] Brief explanation of the drawing
[0007] Embodiments and various aspects of the present disclosure are illustrated in the following detailed description and accompanying figures. Various features shown in the figures are not depicted to scale. [Brief explanation of the drawing]
[0008] [Figure 1]
[0008] This is a schematic diagram showing the structure of an exemplary video sequence according to some embodiments of the present disclosure. [Figure 2A]
[0009] A schematic diagram illustrating an exemplary coding process of a hybrid video coding system according to embodiments of this disclosure is shown. [Figure 2B]
[0010] A schematic diagram of another exemplary coding process for a hybrid video coding system according to embodiments of this disclosure is shown. [Figure 3A]
[0011] A schematic diagram illustrating an exemplary decoding process of a hybrid video coding system according to embodiments of this disclosure is shown. [Figure 3B]
[0012] A schematic diagram of another exemplary decoding process of a hybrid video coding system according to embodiments of this disclosure is shown. [Figure 4]
[0013] The following are block diagrams of exemplary devices for encoding or decoding video, according to some embodiments of the present disclosure. [Figure 5]
[0014] This is a schematic diagram showing an example of a picture divided into coding tree units (CTUs) according to some embodiments of the present disclosure. [Figure 6]
[0015] This is a schematic diagram showing an example of a picture divided into tiles and raster scan slices according to some embodiments of the present disclosure. [Figure 7]
[0016] This is a schematic diagram showing an example of a picture divided into tiles and rectangular slices according to some embodiments of the present disclosure. [Figure 8]
[0017] This is a schematic diagram illustrating another example of a picture divided into tiles and rectangular slices, according to some embodiments of the present disclosure. [Figure 9]
[0018] This is a schematic diagram showing an example of a picture divided into subpictures according to some embodiments of the present disclosure. [Figure 10]
[0019] Table 1 illustrates exemplary sequence parameter set (SPS) syntax for subpicture splitting according to some embodiments of this disclosure. [Figure 11]
[0020] Table 2 illustrates exemplary SPS syntax for subpicture identifiers according to some embodiments of this disclosure. [Figure 12]
[0021] Table 3 illustrates exemplary picture parameter set (PPS) syntax for subpicture identifiers according to some embodiments of this disclosure. [Figure 13]
[0022] Table 4 shows an exemplary picture header (PH) syntax for a subpicture identifier according to some embodiments of this disclosure. [Figure 14]
[0023] This is a schematic diagram illustrating exemplary bitstream conformance constraints according to some embodiments of the present disclosure. [Figure 15]
[0024] FIG. 5 shows an exemplary PH syntax of another subpicture identifier according to some embodiments of the present disclosure. [Figure 16]
[0025] FIG. 6 shows an exemplary PH syntax of another subpicture identifier according to some embodiments of the present disclosure. [Figure 17A]
[0026] FIG. 7A shows an exemplary SPS syntax according to some embodiments of the present disclosure. [Figure 17B]
[0027] FIG. 7B shows an exemplary SPS syntax of another according to some embodiments of the present disclosure. [Figure 18]
[0028] FIG. 8 shows an exemplary SPS syntax of another according to some embodiments of the present disclosure. [Figure 19]
[0029] FIG. 9 shows an exemplary SPS syntax of another according to some embodiments of the present disclosure. [Figure 20]
[0030] FIG. 10 shows an exemplary SPS syntax of another according to some embodiments of the present disclosure. [Figure 21]
[0031] FIG. 11 shows a flowchart of an exemplary video processing method according to some embodiments of the present disclosure. [Figure 22]
[0032] FIG. 12 shows a flowchart of another exemplary video processing method according to some embodiments of the present disclosure. [Figure 23]
[0033] FIG. 13 shows a flowchart of another exemplary video processing method according to some embodiments of the present disclosure. [Figure 24]
[0034] FIG. 14 shows a flowchart of another exemplary video processing method according to some embodiments of the present disclosure. [Figure 25]
[0035] FIG. 15 shows a flowchart of another exemplary video processing method according to some embodiments of the present disclosure. DETAILED DESCRIPTION OF THE INVENTION
[0009] Detailed explanation
[0036] Herein, we refer in detail to exemplary embodiments illustrated in the accompanying drawings. The following description refers to the accompanying drawings, where the same reference numerals in different drawings represent the same or similar elements unless otherwise indicated. The implementations shown in the following description of exemplary embodiments do not represent all implementations according to the present invention. Rather, they are merely examples of devices and methods according to aspects relating to the present invention as enumerated in the accompanying claims. Specific aspects of this disclosure are described in more detail below. In the event of any conflict between terms and / or definitions incorporated by reference and those provided herein, the terms and definitions provided herein shall prevail.
[0010]
[0037] The Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (ITU-T VCEG) and the ISO / IEC Moving Picture Experts Group (ISO / IEC MPEG) is currently developing the Multipurpose Video Coding (VVC / H.266) standard. The VVC standard aims to double the compression efficiency of its predecessor, the High Efficiency Video Coding (HEVC / H.265) standard. In other words, the goal of VVC is to achieve the same subjective quality as HEVC / H.265 using half the bandwidth.
[0011]
[0038] To achieve the same subjective quality as HEVC / H.265 using half the bandwidth, JVET is developing techniques that surpass HEVC using the Joint Search Model (JEM) reference software. Because the coding techniques are incorporated into JEM, JEM achieves substantially higher coding performance than HEVC.
[0012]
[0039] The VVC standard is a relatively recent development and continues to incorporate more encoding techniques to deliver better compression performance. VVC is based on the same hybrid video encoding system used in modern video compression standards such as HEVC, H.264 / AVC, MPEG2, and H.263.
[0013]
[0040] Video is a set of static pictures (or "frames") arranged in chronological order to store visual information. A video capture device (e.g., a camera) can be used to capture and store these pictures in chronological order, and a video playback device (e.g., a television, computer, smartphone, tablet computer, video player, or any end-user terminal with display capabilities) can be used to display such pictures in chronological order. Depending on the application, the video capture device can also transmit the captured video in real time to a video playback device (e.g., a computer with a monitor) for purposes such as supervision, conference hosting, or live broadcasting.
[0014]
[0041] To reduce the memory space and transmission bandwidth required for such applications, video can be compressed before storage and transmission and decompressed before display. Compression and decompression can be performed by software or specialized hardware executed by a processor (e.g., a general-purpose computer processor). The module for compression is generally called an "encoder," and the module for decompression is generally called a "decoder." Encoders and decoders can be collectively called a "codec." Encoders and decoders can be implemented as any of various suitable hardware, software, or combinations thereof. For example, hardware implementations of encoders and decoders can include circuit mechanisms such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, or any combination thereof. Software implementations of encoders and decoders can include program code, computer-executable instructions, firmware, or any suitable computer-implemented algorithm or process fixed within a computer-readable medium. Video compression and decompression can be performed using various algorithms or standards, such as MPEG-1, MPEG-2, MPEG-4, H.26x series, or similar. Depending on the application, a codec can decompress video from a first encoding standard and then recompress the decompressed video using a second encoding standard. In this case, the codec may be referred to as a "transcoder."
[0015]
[0042] A video encoding process can identify and maintain useful information that can be used to reconstruct a picture, while ignoring information that is not important for reconstruction. If the ignored, unimportant information cannot be fully reconstructed, such an encoding process may be called "lossy." Otherwise, it may be called "lossy." Most encoding processes are lossy, which is a trade-off to reduce the required memory space and transmission bandwidth.
[0016]
[0043] Useful information about the picture being encoded (referred to as the "current picture") includes changes relative to the reference picture (e.g., a previously encoded and reconstructed picture). Such changes can include changes in pixel position, brightness, or color, with position being the most important. Changes in the position of groups of pixels representing an object can reflect the movement of the object between the reference picture and the current picture.
[0017]
[0044] A picture encoded without referencing another picture (i.e., it is its own reference picture) is called an "I picture". A picture encoded using a previous picture as its reference picture is called a "P picture". A picture encoded using both a previous picture and a future picture as reference pictures (i.e., the reference is "bidirectional") is called a "B picture".
[0018]
[0045] Figure 1 shows the structure of an exemplary video sequence 100 according to some embodiments of the present disclosure. The video sequence 100 may be live video or captured and archived video. The video 100 may be real video, computer-generated video (e.g., computer game video), or a combination thereof (e.g., real video with augmented reality effects). The video sequence 100 may be input from a video capture device (e.g., a camera), a video archive containing previously captured video (e.g., video files stored in a storage device), or a video supply interface for receiving video from a video content provider (e.g., a video broadcast transceiver).
[0019]
[0046] As shown in Figure 1, the video sequence 100 may include a series of pictures arranged in time along a timeline, including pictures 102, 104, 106, and 108. Pictures 102-106 are consecutive, with further pictures between pictures 106 and 108. In Figure 1, picture 102 is an I picture, and its reference picture is picture 102 itself. Picture 104 is a P picture, and its reference picture is picture 102, as indicated by the arrows. Picture 106 is a B picture, and its reference pictures are pictures 104 and 108, as indicated by the arrows. In some embodiments, the reference picture of a picture (e.g., picture 104) may not be immediately before or after that picture. For example, the reference picture of picture 104 may be the picture before picture 102. Please note that the reference pictures 102-106 are merely examples, and this disclosure does not limit the embodiments of the reference pictures to the examples shown in Figure 1.
[0020]
[0047] Typically, due to the computational complexity of such tasks, video codecs do not encode or decode the entire picture at once. Rather, they can divide the picture into basic segments and encode or decode the picture segment by segment. Such basic segments are referred to in this disclosure as basic processing units ("BPUs"). For example, structure 110 in Figure 1 shows an exemplary structure of a picture (e.g., any of pictures 102-108) in video sequence 100. In structure 110, the picture is divided into 4x4 basic processing units, their boundaries shown as dashed lines. In some embodiments, basic processing units may be referred to as "macroblocks" in some video encoding standards (e.g., the MPEG family, H.261, H.263, or H.264 / AVC) or as "encoding tree units" ("CTUs") in some other video encoding standards (e.g., H.265 / HEVC or H.266 / VVC). The basic processing unit can have a variable size in the picture, or any shape and size of pixels, such as 128×128, 64×64, 32×32, 16×16, 4×8, 16×32, etc. The size and shape of the basic processing unit can be selected for the picture based on a balance between encoding efficiency and the level of detail that should be maintained in the basic processing unit.
[0021]
[0048] A basic processing unit can be a logical unit that can contain groups of different types of video data stored in computer memory (e.g., in a video frame buffer). For example, a basic processing unit for a color picture may include a luminance component (Y) representing colorless luminance information, one or more chroma components (e.g., Cb and Cr) representing color information, and associated syntactic elements, where the luminance and chroma components may have the same size as the basic processing unit. The luminance and chroma components may be referred to as an "encoded tree block" ("CTB") in some video encoding standards (e.g., H.265 / HEVC or H.266 / VVC). Any operation performed on a basic processing unit may be repeated on each of its luminance and chroma components.
[0022]
[0049] Video encoding involves multiple computational stages, examples of which are shown in Figures 2A-2B and 3A-3B. At each stage, the size of the basic processing unit may still be too large for processing and can therefore be further divided into segments referred to in this disclosure as “basic processing subunits.” In some embodiments, a basic processing subunit may be referred to as a “block” in some video encoding standards (e.g., the MPEG family, H.261, H.263, or H.264 / AVC) or as an “encoding unit” (“CU”) in some other video encoding standards (e.g., H.265 / HEVC or H.266 / VVC). A basic processing subunit may be the same size as or smaller than a basic processing unit. Like a basic processing unit, a basic processing subunit is a logical unit that can contain groups of different types of video data (e.g., Y, Cb, Cr, and associated syntactic elements) stored in computer memory (e.g., in a video frame buffer). Any operation performed on a basic processing subunit can be repeated on each of its luma and chroma components. Note that such divisions can be carried out to further levels as needed for processing. Also note that different stages can divide the basic processing unit using different methods.
[0023]
[0050] For example, during the mode determination phase (an example of which is shown in Figure 2B), the encoder can determine which prediction mode (e.g., intrapicture prediction or interpicture prediction) to use for the basic processing unit, but the basic processing unit may be too large to make such a decision. The encoder can divide the basic processing unit into multiple basic processing subunits (e.g., CUs, as in the case of H.265 / HEVC or H.266 / VVC) and determine the type of prediction for each basic processing subunit.
[0024]
[0051] As another example, in the prediction phase (an example of which is shown in Figures 2A and 2B), the encoder can perform prediction calculations at the level of the basic processing subunit (e.g., CU). However, in some cases, the basic processing subunit may still be too large to process. The encoder can further divide the basic processing subunit into smaller segments (e.g., referred to as "prediction blocks" or "PBs" in H.265 / HEVC or H.266 / VVC), and prediction calculations can be performed at that level.
[0025]
[0052] As another example, in the conversion stage (an example of which is shown in Figures 2A and 2B), the encoder can perform conversion operations for residual basic processing subunits (e.g., CUs). However, in some cases, the basic processing subunit may still be too large to process. The encoder can further divide the basic processing subunit into smaller segments (e.g., referred to as "conversion blocks" or "TBs" in H.265 / HEVC or H.266 / VVC), and conversion operations can be performed at that level. Note that the division method for the same basic processing subunit may differ in the prediction and conversion stages. For example, in H.265 / HEVC or H.266 / VVC, the prediction blocks and conversion blocks of the same CU may have different sizes and numbers.
[0026]
[0053] In the structure 110 of Figure 1, the basic processing unit 112 is further divided into 3x3 basic processing subunits, their boundaries shown as dotted lines. Different basic processing units of the same picture may be divided into basic processing subunits in different ways.
[0027]
[0054] Depending on the implementation, a picture can be divided into processing regions to bring parallel processing and error tolerance to video encoding and decoding. This means that the encoding or decoding process does not have to rely on information from any other regions of the picture with respect to a given region of the picture. In other words, each region of the picture can be processed independently. This allows the codec to process different regions of the picture in parallel, thereby increasing encoding efficiency. Furthermore, if the data in a region is corrupted during processing or lost during network transmission, the codec can correctly encode or decode other regions of the same picture without relying on the corrupted or lost data, thus providing error tolerance. Some video encoding standards allow a picture to be divided into different types of regions. For example, H.265 / HEVC and H.266 / VVC offer two types of regions: "slices" and "tiles". It should also be noted that different pictures in video sequence 100 may have different division schemes for dividing the picture into regions.
[0028]
[0055] For example, in Figure 1, structure 110 is divided into three regions 114, 116, and 118, their boundaries shown as solid lines within structure 110. Region 114 contains four basic processing units. Regions 116 and 118 each contain six basic processing units. Note that the basic processing units, basic processing subunits, and regions of structure 110 in Figure 1 are merely examples, and this disclosure does not limit its embodiments.
[0029]
[0056] Figure 2A shows a schematic diagram of an exemplary encoding process 200A according to an embodiment of the present disclosure. For example, the encoding process 200A may be performed by an encoder. As shown in Figure 2A, the encoder can encode a video sequence 202 into a video bitstream 228 according to process 200A. Similar to video sequence 100 in Figure 1, video sequence 202 may include a set of pictures arranged in chronological order (referred to as “original pictures”). Similar to structure 110 in Figure 1, each original picture in video sequence 202 may be divided by the encoder into a basic processing unit, basic processing subunit, or region for processing. In some embodiments, the encoder may perform process 200A at the level of a basic processing unit for each original picture in video sequence 202. For example, the encoder may perform process 200A in an iterative manner, in which case the encoder may encode a basic processing unit in a single iteration of process 200A. In some embodiments, the encoder can perform process 200A in parallel for each region of the original picture in the video sequence 202 (e.g., regions 114-118).
[0030]
[0057] In Figure 2A, the encoder can supply the basic processing unit (referred to as the "original BPU") of the original picture of the video sequence 202 to the prediction stage 204, generating prediction data 206 and prediction BPU 208. The encoder can subtract the prediction BPU 208 from the original BPU to generate residual BPU 210. The encoder can supply the residual BPU 210 to the conversion stage 212 and the quantization stage 214, generating quantization conversion coefficients 216. The encoder can supply the prediction data 206 and quantization conversion coefficients 216 to the binary coding stage 226, generating video bitstream 228. Components 202, 204, 206, 208, 210, 212, 214, 216, 226, and 228 may be referred to as the "forward path". During process 200A, after the quantization stage 214, the encoder may supply the quantization conversion coefficients 216 to the inverse quantization stage 218 and the inverse conversion stage 220 to generate the reconstructed residual BPU 222. The encoder may add the reconstructed residual BPU 222 to the prediction BPU 208 to generate the prediction criterion 224, which will be used in the prediction stage 204 for the next iteration of process 200A. Components 218, 220, 222, and 224 of process 200A may be referred to as the “reconstruction path”. The reconstruction path may be used to ensure that both the encoder and the decoder use the same reference data for prediction.
[0031]
[0058] The encoder can iteratively perform process 200A to encode each original BPU of the original picture (in the forward path) and to generate a prediction criterion 224 (in the reconstruction path) for encoding the next original BPU of the original picture. After encoding all the original BPUs of the original picture, the encoder can proceed to encode the next picture in the video sequence 202.
[0032]
[0059] Referring to process 200A, the encoder can receive a video sequence 202 generated by a video acquisition device (e.g., a camera). As used herein, the term “receive” can mean receiving, inputting, acquiring, obtaining, getting, reading, accessing, or any act in any way for inputting data.
[0033]
[0060] In prediction stage 204, in the current iteration, the encoder receives the original BPU and prediction criterion 224, performs prediction calculations, and can generate prediction data 206 and prediction BPU 208. The prediction criterion 224 may be generated from the reconstruction path of a previous iteration of process 200A. The objective of prediction stage 204 is to reduce information redundancy by extracting prediction data 206, which can be used to reconstruct the original BPU as prediction BPU 208 from the prediction data 206 and prediction criterion 224.
[0034]
[0061] Ideally, the predicted BPU 208 can be identical to the original BPU. However, due to non-ideal prediction and reconstruction operations, the predicted BPU 208 is generally slightly different from the original BPU. To record such differences, after generating the predicted BPU 208, the encoder can subtract it from the original BPU to generate the residual BPU 210. For example, the encoder can subtract the pixel values (e.g., grayscale values or RGB values) of the predicted BPU 208 from the corresponding pixel values of the original BPU. Each pixel of the residual BPU 210 may have a residual value resulting from such a subtraction between the original BPU and the corresponding pixels of the predicted BPU 208. Compared to the original BPU, the predicted data 206 and residual BPU 210 may have fewer bits, but they can be used to reconstruct the original BPU without significant quality degradation. Therefore, the original BPU is compressed.
[0035]
[0062] To further compress the residual BPU210, in the transformation step 212, the encoder can reduce the spatial redundancy of the residual BPU210 by decomposing it into a set of two-dimensional "basis patterns," each basis pattern associated with a "transformation coefficient." The basis patterns can have the same size (e.g., the size of the residual BPU210). Each basis pattern can represent the changing frequency components (e.g., the frequency of brightness changes) of the residual BPU210. No basis pattern can be reconstructed from any combination (e.g., a linear combination) of any other basis pattern. In other words, the decomposition can decompose the changes in the residual BPU210 into the frequency domain. Such a decomposition is analogous to the discrete Fourier transform of a function, in which case the basis patterns are analogous to the basis functions of the discrete Fourier transform (e.g., trigonometric functions), and the transformation coefficients are analogous to the coefficients associated with the basis functions.
[0036]
[0063] Different transformation algorithms can use different basis patterns. For example, various transformation algorithms such as discrete cosine transform, discrete sine transform, or similar can be used in transformation stage 212. The transformation in transformation stage 212 is inversely operable. That is, the encoder can recover the residual BPU 210 by the inverse operation of the transformation (referred to as "inverse transform"). For example, to recover the pixels of the residual BPU 210, the inverse transform can be generated by multiplying the values of the corresponding pixels in the basis pattern by their respective associated coefficients, adding the products, and generating a weighted sum. For video coding standards, both the encoder and decoder can use the same transformation algorithm (and therefore the same basis pattern). Therefore, the encoder can record only the transformation coefficients, and the decoder can reconstruct the residual BPU 210 from the transformation coefficients without receiving the basis pattern from the encoder. Compared to the residual BPU 210, the transformation coefficients may have fewer bits, but they can be used to reconstruct the residual BPU 210 without significant quality degradation. Therefore, the residual BPU210 is further compressed.
[0037]
[0064] The encoder can further compress the conversion coefficients in the quantization stage 214. In the conversion process, different basis patterns can represent different change frequencies (e.g., brightness change frequencies). Since the human eye is generally better at recognizing low-frequency changes, the encoder can ignore information about high-frequency changes without causing significant quality degradation in decoding. For example, in the quantization stage 214, the encoder can generate quantization conversion coefficients 216 by dividing each conversion coefficient by an integer value (referred to as a "quantization parameter") and rounding the quotient to its nearest integer. After such an operation, some conversion coefficients of high-frequency basis patterns may be converted to 0, and conversion coefficients of low-frequency basis patterns may be converted to smaller integers. The encoder can ignore quantization conversion coefficients 216 that are 0, thereby further compressing the conversion coefficients. The quantization process is also inversely operable, in which case the quantization conversion coefficients 216 can be reconstructed into conversion coefficients in the inverse operation of quantization (referred to as "inverse quantization").
[0038]
[0065] Because the encoder rounds off the remainder of such divisions, the quantization stage 214 can be irreversible. Typically, the quantization stage 214 can contribute the greatest information loss in process 200A. The greater the information loss, the fewer bits are required for the quantization conversion coefficient 216. To obtain different levels of information loss, the encoder can use different values for the quantization parameters or any other parameters of the quantization process.
[0039]
[0066] In the binary coding stage 226, the encoder can encode the prediction data 206 and the quantization conversion coefficients 216 using binary coding techniques such as entropy coding, variable-length coding, arithmetic coding, Huffman coding, context-adaptive binary arithmetic coding, or any other lossless or lossy compression algorithm. In some embodiments, in addition to the prediction data 206 and the quantization conversion coefficients 216, the encoder can encode other information in the binary coding stage 226, such as the prediction mode used in the prediction stage 204, the parameters of the prediction operation, the type of conversion in the conversion stage 212, the parameters of the quantization process (e.g., quantization parameters), the encoder control parameters (e.g., bitrate control parameters), or the like. The encoder can generate a video bitstream 228 using the output data from the binary coding stage 226. In some embodiments, the video bitstream 228 can be further packetized for network transmission.
[0040]
[0067] Referring to the reconstruction path of process 200A, in the inverse quantization step 218, the encoder can perform inverse quantization on the quantization transformation coefficients 216 to generate reconstruction transformation coefficients. In the inverse transformation step 220, the encoder can generate reconstruction residual BPU 222 based on the reconstruction transformation coefficients. The encoder can add the reconstruction residual BPU 222 to the prediction BPU 208 to generate a prediction criterion 224, which will be used in the next iteration of process 200A.
[0041]
[0068] It should be noted that other variations of process 200A can also be used to encode the video sequence 202. In some embodiments, the steps of process 200A may be performed in a different order by the encoder. In some embodiments, one or more steps of process 200A may be combined into a single step. In some embodiments, a single step of process 200A may be divided into multiple steps. For example, the transformation step 212 and the quantization step 214 may be combined into a single step. In some embodiments, process 200A may include additional steps. In some embodiments, process 200A may omit one or more steps in Figure 2A.
[0042]
[0069] Figure 2B shows a schematic diagram of another exemplary encoding process 200B according to an embodiment of the present disclosure. Process 200B may be modified from process 200A. For example, process 200B may be used with an encoder compliant with a hybrid video encoding standard (e.g., the H.26x series). Compared to process 200A, the forward path of process 200B additionally includes a mode determination stage 230 and divides the prediction stage 204 into a spatial prediction stage 2042 and a temporal prediction stage 2044. The reconstruction path of process 200B additionally includes a loop filter stage 232 and a buffer 234.
[0043]
[0070] Generally, prediction techniques can be classified into two types: spatial prediction and temporal prediction. Spatial prediction (e.g., intra-picture prediction or "intra-prediction") can use pixels from one or more already encoded neighboring BPUs within the same picture to predict the current BPU. That is, the prediction criterion 224 in spatial prediction can include neighboring BPUs. Spatial prediction can reduce the inherent spatial redundancy of a picture. Temporal prediction (e.g., inter-picture prediction or "inter-prediction") can use regions from one or more already encoded pictures to predict the current BPU. That is, the prediction criterion 224 in temporal prediction can include encoded pictures. Temporal prediction can reduce the inherent temporal redundancy of a picture.
[0044]
[0071] Referring to process 200B, in the forward path, the encoder performs prediction calculations in the spatial prediction stage 2042 and the temporal prediction stage 2044. For example, in the spatial prediction stage 2042, the encoder may perform intra-prediction. For the original BPU of the picture being encoded, the prediction criterion 224 may include one or more neighboring BPUs encoded (in the forward path) and reconstructed (in the reconstruction path) within the same picture. The encoder may generate a prediction BPU 208 by extrapolating neighboring BPUs. The extrapolation technique may include, for example, linear extrapolation or interpolation, polynomial extrapolation or interpolation, or similar. In some embodiments, the encoder may perform extrapolation at the pixel level, for example, by extrapolating the values of the corresponding pixels for each pixel of the prediction BPU 208. The adjacent BPUs used for extrapolation can be positioned relative to the original BPU from various directions, such as vertically (e.g., above the original BPU), horizontally (e.g., to the left of the original BPU), diagonally (e.g., below left, below right, above left, or above right of the original BPU), or any direction defined in the video encoding standard used. For intra-prediction, the prediction data 206 may include, for example, the location (e.g., coordinates) of the adjacent BPUs used, the size of the adjacent BPUs used, the extrapolation parameters, the orientation of the adjacent BPUs used relative to the original BPU, or similar.
[0045]
[0072] As another example, in the temporal prediction stage 2044, the encoder can perform interpretation. For the original BPU of the current picture, the prediction criterion 224 may include one or more pictures (referred to as "reference pictures") that have been encoded (in the forward path) and reconstructed (in the reconstruction path). In some embodiments, the reference pictures may be encoded and reconstructed for each BPU. For example, the encoder may add the reconstructed residual BPU 222 to the prediction BPU 208 to generate a reconstructed BPU. When all reconstructed BPUs for the same picture have been generated, the encoder can generate the reconstructed picture as a reference picture. The encoder may perform a "motion estimation" operation to search for matching regions within the range of the reference picture (referred to as the "search window"). The location of the search window in the reference picture may be determined based on the location of the original BPU of the current picture. For example, the search window may have its center in the reference picture at a location with the same coordinates as the original BPU in the current picture and may extend outward over a predetermined distance. When the encoder identifies a region in the search window similar to the original BPU (for example, by using a pixel recursion algorithm, a block matching algorithm, or similar), the encoder can determine such a region to be a matching region. The matching region may have different dimensions from the original BPU (for example, smaller than, equal to, larger than, or of a different shape than the original BPU). Because the reference picture and the current picture are temporally separated in the timeline (for example, as shown in Figure 1), the matching region can be considered to "move" to the location of the original BPU over time. The encoder can record the direction and distance of such movement as a "motion vector". When multiple reference pictures are used (for example, as picture 106 in Figure 1), the encoder can search for a matching region for each reference picture and determine its associated motion vector. In some embodiments, the encoder can assign weights to the pixel values of the matching region for each matching reference picture.
[0046]
[0073] Motion estimation can be used to identify various types of motion, such as translation, rotation, zooming, or similar. For interpretation, the prediction data 206 may include, for example, the location of the matching region (e.g., coordinates), the motion vector associated with the matching region, the number of reference pictures, the weights associated with the reference pictures, or similar.
[0047]
[0074] To generate a predicted BPU 208, the encoder can perform a “motion compensation” operation. Motion compensation can be used to reconstruct the predicted BPU 208 based on prediction data 206 (e.g., motion vectors) and prediction criteria 224. For example, the encoder can move the matching region of a reference picture according to the motion vector, in which case the encoder can predict the original BPU of the current picture. When multiple reference pictures are used (e.g., picture 106 in Figure 1), the encoder can move the matching region of each reference picture according to its respective motion vector and average the pixel values of the matching region. In some embodiments, if the encoder weights the pixel values of the matching region of each matching reference picture, the encoder can add the weighted sum of the pixel values to the moved matching region.
[0048]
[0075] In some embodiments, interpretation can be unidirectional or bidirectional. Unidirectional interpretation can use one or more reference pictures in the same time direction relative to the current picture. For example, picture 104 in Figure 1 is a unidirectional interpretation picture in which the reference picture (i.e., picture 102) precedes picture 104. Bidirectional interpretation can use one or more reference pictures in both time directions relative to the current picture. For example, picture 106 in Figure 1 is a bidirectional interpretation picture in which the reference pictures (i.e., pictures 104 and 108) are in both time directions relative to picture 104.
[0049]
[0076] Still referring to the forward path of process 200B, after the spatial prediction 2042 and the temporal prediction stage 2044, in the mode determination stage 230, the encoder may select a prediction mode (e.g., either intra-prediction or inter-prediction) for the current iteration of process 200B. For example, the encoder may perform a rate-distortion optimization technique. In this technique, the encoder may select a prediction mode to minimize a value of a cost function that depends on the bit rate of the candidate prediction mode and the distortion of the reconstructed reference picture under such candidate prediction mode. Depending on the selected prediction mode, the encoder may generate the corresponding prediction BPU 208 and prediction data 206.
[0050]
[0077] If the intra-prediction mode is selected in the forward path within the reconstruction path of process 200B, after generating the prediction criterion 224 (e.g., the current BPU encoded and reconstructed in the current picture), the encoder can directly feed the prediction criterion 224 to the spatial prediction stage 2042 for later use (e.g., for extrapolation of the next BPU of the current picture). If the inter-prediction mode is selected in the forward path, after generating the prediction criterion 224 (e.g., the current picture with all BPUs encoded and reconstructed), the encoder can feed the prediction criterion 224 to the loop filter stage 232, where the encoder can apply a loop filter to the prediction criterion 224 to reduce or eliminate distortions (e.g., blocking artifacts) introduced by inter-prediction. The encoder can apply various loop filtering techniques in the loop filter stage 232, such as deblocking, sample-adaptive offset, adaptive loop filtering, or similar. Loop-filtered reference pictures may be stored in buffer 234 (or “Decoded Picture Buffer”) for later use (e.g., to be used as interprediction reference pictures for future pictures in video sequence 202). The encoder may store one or more reference pictures in buffer 234 for use in the temporal prediction stage 2044. In some embodiments, the encoder may encode loop filter parameters (e.g., loop filter strength) along with quantization transformation coefficients 216, prediction data 206, and other information in the binary coding stage 226.
[0051]
[0078] Figure 3A shows a schematic diagram of an exemplary decoding process 300A according to an embodiment of the present disclosure. Process 300A may be a decompression process corresponding to the compression process 200A in Figure 2A. In some embodiments, process 300A may be similar to the reconstruction path of process 200A. The decoder can decode the video bitstream 228 into a video stream 304 according to process 300A. The video stream 304 may be very similar to the video sequence 202. However, due to information loss in the compression and decompression processes (e.g., the quantization stage 214 in Figures 2A-2B), the video stream 304 is generally not identical to the video sequence 202. Similar to processes 200A and 200B in Figures 2A-2B, the decoder may perform process 300A at the level of a basic processing unit (BPU) for each picture encoded within the video bitstream 228. For example, the decoder can perform process 300A in an iterative manner, in which case the decoder can decode the basic processing unit in a single iteration of process 300A. In some embodiments, the decoder can perform process 300A in parallel for each region of picture encoded in the video bitstream 228 (e.g., regions 114-118).
[0052]
[0079] In Figure 3A, the decoder can supply a portion of the video bitstream 228 associated with the basic processing unit of the encoded picture (referred to as the "encoded BPU") to the binary decoding stage 302. In the binary decoding stage 302, the decoder can decode this portion into prediction data 206 and quantization conversion coefficients 216. The decoder can supply the quantization conversion coefficients 216 to the inverse quantization stage 218 and the inverse conversion stage 220 to generate the reconstructed residual BPU 222. The decoder can supply the prediction data 206 to the prediction stage 204 to generate the prediction BPU 208. The decoder can add the reconstructed residual BPU 222 to the prediction BPU 208 to generate the prediction criterion 224. In some embodiments, the prediction criterion 224 can be stored in a buffer (e.g., a decoded picture buffer in computer memory). The decoder can supply the prediction criterion 224 to the prediction stage 204 to perform the prediction calculation in the next iteration of process 300A.
[0053]
[0080] The decoder can iteratively perform process 300A to decode each encoding BPU of the encoded picture and generate a prediction criterion 224 for encoding the next encoding BPU of the encoded picture. After decoding all encoding BPUs of the encoded picture, the decoder can output the picture to the video stream 304 for display and proceed to decode the next encoded picture in the video bitstream 228.
[0054]
[0081] In the binary decoding stage 302, the decoder can perform the inverse operation of the binary coding technique used by the encoder (e.g., entropy coding, variable-length coding, arithmetic coding, Huffman coding, context-adaptive binary arithmetic coding, or any other lossless compression algorithm). In some embodiments, in addition to the prediction data 206 and quantization conversion coefficients 216, the decoder can decode other information in the binary decoding stage 302, such as the prediction mode, parameters of the prediction operation, type of conversion, parameters of the quantization process (e.g., quantization parameters), encoder control parameters (e.g., bitrate control parameters), or the like. In some embodiments, if the video bitstream 228 is transmitted in the form of packets over the network, the decoder can depacket the video bitstream 228 before supplying it to the binary decoding stage 302.
[0055]
[0082] Figure 3B shows a schematic diagram of another exemplary decoding process 300B according to an embodiment of the present disclosure. Process 300B may be modified from process 300A. For example, process 300B may be used with a decoder compliant with a hybrid video coding standard (e.g., the H.26x series). Compared to process 300A, process 300B further divides the prediction stage 204 into a spatial prediction stage 2042 and a temporal prediction stage 2044, and additionally includes a loop filter stage 232 and a buffer 234.
[0056]
[0083] In process 300B, the prediction data 206 decoded by the decoder from binary decoding stage 302 for the encoding base processing unit ("current BPU") of the encoded picture being decoded ("current picture") may include various types of data, depending on which prediction mode was used by the encoder to encode the current BPU. For example, if intra-prediction is used by the encoder to encode the current BPU, the prediction data 206 may include intra-prediction, parameters of the intra-prediction operation, or prediction mode indicators (e.g., flag values) indicating the same. Parameters of the intra-prediction operation may include, for example, the locations (e.g., coordinates) of one or more adjacent BPUs used as reference, the size of the adjacent BPUs, extrapolation parameters, the orientation of the adjacent BPUs relative to the original BPU, or the same. As another example, if inter-prediction is used by the encoder to encode the current BPU, the prediction data 206 may include inter-prediction, parameters of the inter-prediction operation, or prediction mode indicators (e.g., flag values) indicating the same. The parameters for the interpretation calculation may include, for example, the number of reference pictures associated with the current BPU, the weights associated with each reference picture, the locations (e.g., coordinates) of one or more matching regions within each reference picture, one or more motion vectors associated with each matching region, or similar.
[0057]
[0084] Based on the prediction mode indicator, the decoder can determine whether to perform a spatial prediction (e.g., intra-prediction) in the spatial prediction stage 2042 or a temporal prediction (e.g., inter-prediction) in the temporal prediction stage 2044. Details of performing such spatial or temporal predictions are shown in Figure 2B and will not be repeated below. After performing such spatial or temporal predictions, the decoder can generate a prediction BPU 208. The decoder can then add the prediction BPU 208 and the reconstructed residual BPU 222 to generate a prediction criterion 224, as described in Figure 3A.
[0058]
[0085] In process 300B, the decoder may supply the prediction criterion 224 to the spatial prediction stage 2042 or the temporal prediction stage 2044 in order to perform the prediction calculation in the next iteration of process 300B. For example, if the current BPU is decoded using intra-prediction in the spatial prediction stage 2042, after generating the prediction criterion 224 (e.g., the decoded current BPU), the decoder may supply the prediction criterion 224 directly to the spatial prediction stage 2042 for later use (e.g., for extrapolation of the next BPU of the current picture). If the current BPU is decoded using inter-prediction in the temporal prediction stage 2044, after generating the prediction criterion 224 (e.g., the reference picture with all BPUs decoded), the encoder may supply the prediction criterion 224 to the loop filter stage 232 so that distortion (e.g., blocking artifacts) can be reduced or eliminated. The decoder may apply the loop filter to the prediction criterion 224 in the manner described in Figure 2B. Loop-filtered reference pictures may be stored in buffer 234 (e.g., a decoded picture buffer in computer memory) for later use (e.g., to be used as inter-prediction reference pictures for future encoded pictures of the video bitstream 228). The decoder may store one or more reference pictures in buffer 234 for use in the temporal prediction stage 2044. In some embodiments, the prediction data 206 may further include loop filter parameters (e.g., loop filter strength) when a prediction mode indicator indicates that inter-prediction was used to encode the current BPU.
[0059]
[0086] Figure 4 is a block diagram of an exemplary device 400 for encoding or decoding video according to an embodiment of the present disclosure. As shown in Figure 4, the device 400 may include a processor 402. When the processor 402 executes instructions as described herein, the device 400 can become a specialized machine for video encoding or decoding. The processor 402 may be any type of circuit mechanism having the ability to manipulate or process information. For example, the processor 402 may include a central processing unit (or "CPU"), a graphics processing unit (or "GPU"), a neural processing unit ("NPU"), a microcontroller unit ("MCU"), an optical processor, a programmable logic controller, a microcontroller, a microprocessor, a digital signal processor, an intellectual property (IP) core, a programmable logic array (PLA), a programmable array logic (PAL), a generic array logic (GAL), a composite programmable logic unit (CPLD), a field-programmable gate array (FPGA), a system-on-a-chip (SoC), an application-specific integrated circuit (ASIC), or any number of such combinations. In some embodiments, the processor 402 may also be a set of processors grouped as a single logical component. For example, as shown in Figure 4, the processor 402 may include multiple processors, including processor 402a, processor 402b, and processor 402n.
[0060]
[0087] The device 400 may also include a memory 404 configured to store data (e.g., a set of instructions, computer code, intermediate data, or the like). For example, as shown in Figure 4, the stored data may include program instructions (e.g., program instructions for carrying out steps in processes 200A, 200B, 300A, or 300B) and data for processing (e.g., a video sequence 202, a video bitstream 228, or a video stream 304). The processor 402 can access the program instructions and data for processing (e.g., via a bus 410), execute the program instructions, and perform arithmetic or operations on the data for processing. The memory 404 may include a high-speed random-access storage device or a non-volatile storage device. In some embodiments, the memory 404 may include random-access memory (RAM), read-only memory (ROM), optical disks, magnetic disks, hard drives, solid-state drives, flash drives, security digital (SD) cards, memory sticks, CompactFlash® (CF) cards, or any number or any combination of the like. Memory 404 can also be a group of memories grouped together as a single logical component (not shown in Figure 4).
[0061]
[0088] Bus 410 may be a communication device that transfers data between internal components of the device 400, such as an internal bus (e.g., a CPU-memory bus), an external bus (e.g., a Universal Serial Bus port, a Peripheral Component Interconnect Express port), or similar.
[0062]
[0089] To facilitate explanation without creating ambiguity, the processor 402 and other data processing circuits are collectively referred to as “data processing circuits” in this disclosure. The data processing circuits may be implemented entirely as hardware, or as a combination of software, hardware, or firmware. In addition, the data processing circuits may be a single, standalone module, or may be fully or partially integrated with any other component of the device 400.
[0063]
[0090] The device 400 may further include a network interface 406 for providing wired or wireless communication to a network (e.g., the Internet, an intranet, a local area network, a mobile communication network, or the same). In some embodiments, the network interface 406 may include any number or any combination of a network interface controller (NIC), radio frequency (RF) module, transponder, transceiver, modem, router, gateway, wired network adapter, wireless network adapter, Bluetooth® adapter, infrared adapter, near-field communication ("NFC") adapter, cellular network chip, or the same.
[0064]
[0091] In some embodiments, the device 400 may optionally further include a peripheral interface 408 for providing connectivity to one or more peripheral devices. As shown in Figure 4, peripheral devices may include, but are not limited to, cursor control devices (e.g., mouse, touchpad or touchscreen), keyboards, displays (e.g., cathode ray tube displays, liquid crystal displays or light-emitting diode displays), video input devices (e.g., cameras or input interfaces coupled to video archives), or similar.
[0065]
[0092] It should be noted that the video codec (for example, the codec that performs processes 200A, 200B, 300A, or 300B) may be implemented as any combination of any software or hardware modules within the device 400. For example, some or all stages of processes 200A, 200B, 300A, or 300B may be implemented as one or more software modules of the device 400, such as program instructions that can be loaded into memory 404. As another example, some or all stages of processes 200A, 200B, 300A, or 300B may be implemented as one or more hardware modules of the device 400, such as special data processing circuits (e.g., FPGA, ASIC, NPU, or similar).
[0066]
[0093] In quantization and dequantization function blocks (e.g., quantization 214 and dequantization 218 in Figure 2A or 2B, and dequantization 218 in Figure 3A or 3B), the quantization parameter (QP) is used to determine the amount of quantization (and dequantization) applied to the prediction residual. The initial QP value used for encoding a picture or slice can be signaled at a high level, for example, using the init_qp_minus26 syntax element in the picture parameter set (PPS) and the slice_qp_delta syntax element in the slice header. Furthermore, the QP value can be adapted at a local level per CU using the delta QP value transmitted at the granularity of the quantization group.
[0067]
[0094] In the embodiments disclosed, a picture is divided into a sequence of coding tree units (CTUs) to encode a frame. Multiple CTUs may form a tile, slice, or subpicture. The picture is divided into a sequence of CTUs. In a picture having three sample arrays, the CTUs consist of N×N blocks of chroma samples along with two corresponding blocks of chroma samples. Figure 5 shows an example of a picture divided into multiple CTUs according to some embodiments of the disclosure.
[0068]
[0095] According to some embodiments, the maximum allowable size of a Luma block within the CTU is specified as 128 × 128 (whereas the maximum size of a Luma conversion block may be 64 × 64), and the minimum allowable size of a Luma block within the CTU is specified as 32 × 32.
[0069]
[0096] A picture is divided into one or more tile rows and one or more tile columns. A tile is a sequence of CTUs that cover a rectangular area of the picture. A slice contains an integer number of complete tiles or an integer number of consecutive complete CTU rows within the tiles of the picture. Two modes of slicing may be supported: raster scan slice mode and rectangular slice mode. In raster scan slice mode, a slice contains a sequence of complete tiles within the tile raster scan of the picture. In rectangular slice mode, a slice contains several complete tiles that collectively form a rectangular area of the picture, or several consecutive complete CTU rows of a single tile that collectively form a rectangular area of the picture. Tiles within a rectangular slice are scanned in the tile raster scan order within the rectangular area corresponding to that slice.
[0070]
[0097] A subpicture contains one or more slices that collectively cover the rectangular area of the picture.
[0071]
[0098] Figure 6 shows an example of a picture divided into tiles and raster scan slices according to some embodiments of the present disclosure. As shown in Figure 6, the picture is divided into 12 tiles (4 tile rows and 3 tile columns) and 3 raster scan slices.
[0072]
[0099] Figure 7 shows an example of a picture divided into tiles and rectangular slices according to some embodiments of the present disclosure. As shown in Figure 7, the picture is divided into 20 tiles (5 tile rows and 4 tile columns) and 9 rectangular slices.
[0073]
[0100] Figure 8 shows another example of a picture divided into tiles and rectangular slices according to some embodiments of the present disclosure. As shown in Figure 8, the picture is divided into four tiles (two tile rows and two tile columns) and four rectangular slices.
[0074]
[0101] Figure 9 shows an example of a picture divided into subpictures according to some embodiments of the present disclosure. As shown in Figure 9, the picture is divided into 20 tiles (5 tile columns and 4 tile rows), with 12 tiles on the left each covering one 4x4 CTU slice, and 8 tiles on the right each covering two vertically stacked 2x2 CTU slices, resulting in a total of 28 slices and 28 subpictures of various dimensions (each slice is a subpicture).
[0075]
[0102] According to some embodiments disclosed, subpicture splitting information is signaled within a sequence parameter set (SPS). Figure 10 shows exemplary Table 1 illustrating exemplary SPS syntax for subpicture splitting according to some embodiments of the present disclosure.
[0076]
[0103] In Table 1, the syntax element sps_num_subpics_minus1 plus 1 specifies the number of subpictures within a single picture; the syntax elements subpic_ctu_top_left_x[i] and subpic_ctu_top_left_y[i] specify the top-left CTU position of the i-th subpicture in CtbSizeY units; and the syntax elements subpic_width_minus1[i] plus 1 and subpic_height_minus1[i] plus 1 specify the width and height of the i-th subpicture in CtbSizeY units, respectively. The semantics of these syntax elements are as follows:
[0077]
[0104] A subpics_present_flag equal to 1 indicates that the subpicture parameter is within the SPS RBSP syntax, while a subpics_present_flag equal to 0 indicates that the subpicture parameter is not within the SPS RBSP syntax.
[0078]
[0105] sps_num_subpics_minus1 plus 1 specifies the number of subpics. The syntax element sps_num_subpics_minus1 is in the range of 0 to 254. If it is not within this range, the value of the syntax element sps_num_subpics_minus1 is inferred to be equal to 0.
[0079]
[0106] subpic_ctu_top_left_x[i] specifies the horizontal position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_x[i] is inferred to be equal to 0.
[0080]
[0107] subpic_ctu_top_left_y[i] specifies the vertical position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_y[i] is inferred to be equal to 0.
[0081]
[0108] subpic_width_minus1[i] plus 1 specifies the width of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not specified, the value of the syntax element subpic_width_minus1[i] is inferred to be equal to Ceil(pic_width_max_in_luma_samples÷CtbSizeY)-1.
[0082]
[0109] subpic_height_minus1[i] plus 1 specifies the height of the i-th subpicture in units of CtbSizeY. The length of this syntactic element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not specified, the value of the syntactic element subpic_height_minus1[i] is inferred to be equal to Ceil(pic_height_max_in_luma_samples÷CtbSizeY)-1.
[0083]
[0110] A subpic_treated_as_pic_flag[i] equal to 1 specifies that the i-th subpicture of each encoded picture in the Encoded Layer Video Sequence (CLVS) is treated as a picture in the decoding process that excludes in-loop filtering operations. A syntactic element subpic_treated_as_pic_flag[i] equal to 0 specifies that the i-th subpicture of each encoded picture in the CLVS is not treated as a picture in the decoding process that excludes in-loop filtering operations. If not present, the value of the syntactic element subpic_treated_as_pic_flag[i] is inferred to be equal to 0.
[0084]
[0111] A value of 1 for loop_filter_across_subpic_enabled_flag[i] indicates that in-loop filtering can be performed across the boundary of the i-th subpicture within each encoded picture in the CLVS. A value of 0 for the syntactic element loop_filter_across_subpic_enabled_flag[i] indicates that in-loop filtering is not performed across the boundary of the i-th subpicture within each encoded picture in the CLVS. If not present, the value of the syntactic element loop_filter_across_subpic_enabled_pic_flag[i] is inferred to be equal to 1.
[0085]
[0112] Each subpicture can be assigned an identifier according to the embodiments disclosed herein. Subpicture identifier information can be signaled within a sequence parameter set (SPS), picture parameter set (PPS), or picture header (PH). Figure 11 shows exemplary Table 2 illustrating exemplary SPS syntax for subpicture identifiers according to some embodiments of this disclosure. Figure 12 shows exemplary Table 3 illustrating exemplary PPS syntax for subpicture identifiers according to some embodiments of this disclosure. Figure 13 shows exemplary Table 4 illustrating exemplary PH syntax for subpicture identifiers according to some embodiments of this disclosure.
[0086]
[0113] As shown in Tables 2 to 4, the syntactic element sps_subpic_id_present_flag indicates whether the subpicture ID mapping is in SPS, the syntactic elements sps_subpic_id_signaling_present_flag, pps_subpic_id_signaling_present_flag, and ph_subpic_id_signaling_present_flag indicate whether the subpicture ID mapping is signaled in SPS, PPS, or PH respectively, the syntactic elements sps_subpic_id_len_minus1 plus 1, pps_subpic_id_len_minus1 plus 1, and ph_subpic_id_len_minus1 plus 1 specify the number of bits used to present the syntactic elements sps_subpic_id[i], pps_subpic_id[i], and ph_subpic_id[i] respectively, which are the subpicture IDs that are signaled in SPS, PPS, and PH respectively.
[0087]
[0114] The semantics of the above syntactic elements and the related bitstream conformance requirements are described below.
[0088]
[0115] A sps_subpic_id_present_flag equal to 1 indicates that the subpicture ID mapping is present in the SPS, while a sps_subpic_id_present_flag equal to 0 indicates that the subpicture ID mapping is not present in the SPS.
[0089]
[0116] A value of sps_subpic_id_signaling_present_flag equal to 1 indicates that the subpicture ID mapping is signaled within the SPS, while a value of sps_subpic_id_signaling_present_flag equal to 0 indicates that the subpicture ID mapping is not signaled within the SPS. If not present, the value of the sps_subpic_id_signaling_present_flag is inferred to be equal to 0.
[0090]
[0117] sps_subpic_id_len_minus1 plus 1 specifies the number of bits used to represent the syntactic element sps_subpic_id[i]. The value of the syntactic element sps_subpic_id_len_minus1 can be in the range of 0 to 15.
[0091]
[0118] sps_subpic_id[i] specifies the subpicture ID of the i-th subpicture. The length of the syntactic element sps_subpic_id[i] is sps_subpic_id_len_minus1+1 bits. If none exists and the syntactic element sps_subpic_id_present_flag is equal to 0, the value of the syntactic element sps_subpic_id[i] is inferred to be equal to i for each i in the range of 0 to sps_num_subpics_minus1.
[0092]
[0119] A pps_subpic_id_signaling_present_flag equal to 1 indicates that the subpicture ID mapping is signaled within the PPS. A pps_subpic_id_signaling_present_flag equal to 0 indicates that the subpicture ID mapping is not signaled within the PPS. If the pps_subpic_id_present_flag is 0 or equal to 1, then the pps_subpic_id_signaling_present_flag can be equal to 0.
[0093]
[0120] pps_num_subpics_minus1 plus 1 specifies the number of subpictures in the encoded picture that reference the PPS. The value of the syntax element pps_num_subpic_minus1 may be equal to the syntax element sps_num_subpics_minus1 as a requirement for bitstream conformance.
[0094]
[0121] pps_subpic_id_len_minus1 plus 1 specifies the number of bits used to represent the syntax element pps_subpic_id[i]. The value of the syntax element pps_subpic_id_len_minus1 is in the range of 0 to 15. It may be a bitstream conformance requirement that the value of the syntax element pps_subpic_id_len_minus1 be the same for all PPS referenced by the encoded picture in the CLVS.
[0095]
[0122] pps_subpic_id[i] specifies the subpicture ID of the i-th subpicture. The length of the syntax element pps_subpic_id[i] is pps_subpic_id_len_minus1+1 bits.
[0096]
[0123] A ph_subpic_id_signaling_present_flag equal to 1 indicates that the subpicture ID mapping is signaled within the PH, while a ph_subpic_id_signaling_present_flag equal to 0 indicates that the subpicture ID mapping is not signaled within the PH.
[0097]
[0124] ph_subpic_id_len_minus1 plus 1 specifies the number of bits used to represent the syntactic element ph_subpic_id[i]. The value of the syntactic element pic_subpic_id_len_minus1 can be in the range of 0 to 15. It may be a bitstream conformance requirement that the value of the syntactic element ph_subpic_id_len_minus1 be the same for all PHs referenced by encoded pictures in CLVS.
[0098]
[0125] ph_subpic_id[i] specifies the subpicture ID of the i-th subpicture. The length of the syntactic element ph_subpic_id[i] is ph_subpic_id_len_minus1+1 bits.
[0099]
[0126] After parsing these syntactic elements related to subpicture IDs, the subpicture ID list SubpicIdList is derived using the following syntax (1): for(i=0;i<=sps_num_subpics_minus1;i++) SubpicIdList[i]=sps_subpic_id_present_flag? Syntax (1) (sps_subpic_id_signaling_present_flag?sps_subpic_id[i]: (ph_subpic_id_signaling_present_flag?ph_subpic_id[i]:pps_subpic_id[i])):i
[0100]
[0127] However, the above signaling for subpicture partitioning has several problems. First, according to the semantics, both syntactic elements sps_subpic_id_present_flag and sps_subpic_id_signaling_present_flag specify whether a subpicture ID is in the SPS. According to Table 2, subpicture ID information is signaled in the SPS only if both of these two syntactic elements are true. Thus, the signaling is redundant. Second, even if sps_subpics_id_present_flag is true, if pps_subpic_id_signalling_present_flag or ph_subpic_id_signalling_present_flag is true, the subpicture ID may still be signaled in the PPS or PH. Third, according to syntax (1), if sps_subpic_id_present_flag is not true, a default ID equal to the subpicture index is assigned to each subpicture. If sps_subpic_id_present_flag is true, SubpicIdList[i] is derived as sps_subpic_id[i], ph_subpic_id[i], or pps_subpic_id[i]. However, if sps_subpic_id_present_flag is true, the subpicture ID may not be in SPS, PPS, or PH, in which case an undefined value for the syntax element pps_subpic_id is assigned to SubpicIdList. In syntax (1), if the syntax element sps_subpic_id_present_flag is true, both the syntax elements sps_subpic_id_signaling_present_flag and ph_subpic_id_signaling_present_flag are false, and regardless of the value of the syntax element pps_subpic_id_signaling_present_flag, the syntax element pps_subpic_id[i] is assigned to SubpicIdList[i]. If the syntax element ps_subpic_id_signaling_present_flag is false, the syntax element pps_subpic_id[i] is undefined.
[0101]
[0128] Furthermore, as shown in Table 1 of Figure 10, if the syntax element subpics_present_flag is true, the number of subpictures is signaled first, followed by the top-left position, width, and height of each subpicture, as well as two control flags, subpic_treated_as_pic_flag and loop_filter_across_subpic_enabled_flag. Even if there is only one subpicture (when the syntax element sps_num_subpics_minus1 is equal to 0), the top-left position, width, and height, as well as these two control flags, are signaled. However, if there is only one subpicture within a picture, these matters do not need to be indicated because the subpicture is equivalent to the picture, and therefore the information to be signaled can be derived from the picture itself.
[0102]
[0129] Furthermore, subpictures are obtained by dividing a picture, and the picture is formed by merging all subpictures. The position and size of the last subpicture can be derived from the size of the entire picture and the positions and sizes of all previous subpictures. Therefore, it is not necessary to signal the position, width, and height information of the last subpicture.
[0103]
[0130] Furthermore, as shown in Table 2 of Figure 11, the syntactic element sps_subpic_id_present_flag is always signaled regardless of the value of the syntactic element subpics_present_flag. Therefore, with the above signaling method, the subpicture identifier may still be signaled even if there are no subpictures, which is meaningless.
[0104]
[0131] This disclosure provides a signaling method for solving the above-mentioned problems. Several exemplary embodiments are described in detail below.
[0105]
[0132] In some embodiments of this disclosure, signaling of the subpicture ID in the picture header can be enforced if the syntax element sps_subpic_id_present_flag is true, but both the syntax elements sps_subpic_id_signaling_present_flag and pps_subpic_id_signaling_present_flag are false. This can avoid cases where the subpicture ID is not defined.
[0106]
[0133] For example, bitstream conformance constraints can be imposed in the following two ways. In the first way, the semantics for bitstream conformance constraints are as follows (emphasized in italics): A ph_subpic_id_signaling_present_flag equal to 1 specifies that the subpicture ID mapping is signaled within the PH. A syntactic element ph_subpic_id_signaling_present_flag equal to 0 specifies that the subpicture ID mapping is not signaled within the PH. If the syntactic element sps_subpic_id_present_flag is equal to 1, the syntactic element sps_subpic_id_signaling_present_flag is equal to 0, and the syntactic element pps_subpic_id_signaling_present_flag is equal to 0, then the value of the syntactic element ph_subpic_id_signaling_present_flag may be 1, which is a bitstream conformance requirement.
[0107]
[0134] In the second method, the semantics for bitstream conformance constraints are as follows (emphasized in italics): A ph_subpic_id_signaling_present_flag equal to 1 indicates that the subpicture ID mapping is signaled within the PH. A syntactic element ph_subpic_id_signaling_present_flag equal to 0 indicates that the subpicture ID mapping is not signaled within the PH. If the syntactic element sps_subpic_id_present_flag is equal to 1, the syntactic element sps_subpic_id_signaling_present_flag is equal to 0, and the syntactic element pps_subpic_id_signaling_present_flag is equal to 0 in all PPS referenced by the encoded picture in the CLVS, then the bitstream conformance requirement may be that there is at least one PH in all PHs referenced by the encoded picture in the CLVS reference, where the value of the syntactic element ph_subpic_id_signaling_present_flag is equal to 1.
[0108]
[0135] Figure 14 is a schematic diagram illustrating exemplary bitstream conformance constraints of this second method according to some embodiments of the present disclosure.
[0109]
[0136] The semantics of the syntactic element sps_subpic_id_present_flag are not explicitly defined in the current VVC draft and can be changed as follows (emphasized in italics):
[0110]
[0137] A sps_subpic_id_present_flag equal to 1 indicates that the subpicture ID mapping is in SPS, PPS, or PH. A sps_subpic_id_present_flag equal to 0 indicates that the subpicture ID mapping is not in SPS, PPS, or PH.
[0111]
[0138] This ensures that the syntactic element ph_subpic_id is signaled when the syntactic element sps_subpic_id_present_flag is true, but both the syntactic elements sps_subpic_id_signaling_present_flag and pps_subpic_id_signaling_present_flag are false. This syntax is shown in Tables 2 to 4 of Figures 11 to 13.
[0112]
[0139] In the above embodiment, if the subpicture ID presence flag is true (syntax element sps_subpic_id_present_flag=1), the subpicture ID to be used is signaled in the bitstream (in SPS, PPS, or PH), and no inference rules are required. Since the syntax element sps_subpic_id_present_flag indicates that the subpicture ID is present when the subpicture ID presence flag is true (syntax element sps_subpic_id_present_flag=1), forcing the signaling of the subpicture ID in one of the SPS, PPS, or PH may be better than deriving the subpicture ID using inference rules without signaling the subpicture ID in the bitstream.
[0113]
[0140] As another example, Figure 15 shows an exemplary Table 5 illustrating another exemplary PH syntax for a subpicture identifier according to some embodiments of the present disclosure. Table 5 shows modifications of the PH syntax shown in Table 4 (shown in box 1501 and highlighted in italics). Referring to Table 5, signaling of the syntax element ph_subpic_id is compelled by inferring that the syntax element ph_subpic_id_signaling_present_flag is true if the syntax element sps_subpic_id_present_flag is true, but both the syntax elements sps_subpic_id_signaling_present_flag and pps_subpic_id_signaling_present_flag are false.
[0114]
[0141] The syntactic element ph_subpic_id_signaling_present_flag can have the following two alternative semantics (emphasized in italics):
[0115]
[0142] The first semantics include the following (emphasized in italics): A ph_subpic_id_signaling_present_flag equal to 1 indicates that the subpicture ID mapping is signaled within the PH. A ph_subpic_id_signaling_present_flag equal to 0 indicates that the subpicture ID mapping is not signaled within the PH. If not present, the value of the ph_subpic_id_signaling_present_flag is inferred to be 1.
[0116]
[0143] The second semantics include the following (emphasized in italics): A ph_subpic_id_signaling_present_flag equal to 1 indicates that the subpicture ID mapping is signaled within the PH, while a ph_subpic_id_signaling_present_flag equal to 0 indicates that the subpicture ID mapping is not signaled within the PH. If none of the above conditions are met, and the ph_subpic_id_present_flag is equal to 1, then the value of the ph_subpic_id_signaling_present_flag is inferred to be 1.
[0117]
[0144] The semantics of the syntax element sps_subpic_id_present_flag can be changed as follows (emphasized in italics):
[0118]
[0145] A sps_subpic_id_present_flag equal to 1 indicates that the subpicture ID mapping is in SPS, PPS, or PH. A sps_subpic_id_present_flag equal to 0 indicates that the subpicture ID mapping is not in SPS, PPS, or PH.
[0119]
[0146] If the syntax element sps_subpic_id_present_flag is true, the subpicture ID inference rule is not required in some embodiments. Therefore, if the subpicture ID existence flag is true (syntax element sps_subpic_id_present_flag=1) but the subpicture ID is not signaled in the SPS or PPS (syntax elements sps_subpic_id_signaling_present_flag=0 and pps_subpic_id_signaling_present_flag=0), the signaling of the syntax element ph_subpic_id_signaling_present_flag is skipped. This saves 1 bit.
[0120]
[0147] As another example, if the syntax element sps_subpic_id_present_flag is true, but both the syntax elements sps_subpic_id_signaling_present_flag and pps_subpic_id_signaling_present_flag are false (i.e., the subpicture ID is not signaled in SPS or PPS), then the syntax element ph_subpic_id is forced to signal, which in this case means the subpicture ID is signaled in PH. If the syntax element pps_subpic_id is signaled (sps_subpic_id_present_flag is true, sps_subpic_id_signaling_present_flag is false, and pps_subpic_id_signaling_present_flag is true), then the syntax element ph_subpic_id cannot be signaled. Figure 16 shows an exemplary Table 6 illustrating another exemplary PH syntax for subpicture identifiers according to some embodiments of the present disclosure (emphasis is shown in box 1601 and highlighted in italics).
[0121]
[0148] The list SubpicIdList[i] is derived according to syntax (2) as follows: for(i=0;i<=sps_num_subpics_minus1;i++) SubpicIdList[i]=sps_subpic_id_present_flag? Syntax (2) (sps_subpic_id_signaling_present_flag?sps_subpic_id[i]: (pps_subpic_id_signaling_present_flag?pps_subpic_id[i]:ph_subpic_id[i])):i
[0122]
[0149] If the syntactic element sps_subpic_id_present_flag is true, the subpicture ID inference rule is not necessary in some embodiments. The syntactic element ph_subpic_id_signaling_present_flag can be removed. Thus, one bit is saved if the syntactic element sps_subpic_id_present_flag is equal to 1, the syntactic element sps_subpic_id_signaling_present_flag is equal to 0, and the syntactic element pps_subpic_id_signaling_present_flag is equal to 1.
[0123]
[0150] If the subpicture ID is already signaled within the PPS, some embodiments may give the encoder the option to override the subpicture ID in the PPS by signaling the subpicture ID again within the PH. This is a more flexible approach for the encoder.
[0124]
[0151] In some embodiments of this disclosure, inference rules are given to ensure that a subpicture ID list SubpicIdList can be derived.
[0125]
[0152] For example, if a subpicture ID is not signaled within an SPS, PPS, or PH, the inference rule for the syntax element pps_subpic_id is given to derive a subpicture ID list SubpicIdList using the default value of the syntax element pps_subpic_id which is inferred by the inference rule.
[0126]
[0153] The semantics of the syntactic element pps_subpic_id are as follows (emphasized in italics): pps_subpic_id[i] specifies the subpicture ID of the i-th subpicture. The length of the syntactic element pps_subpic_id[i] is pps_subpic_id_len_minus1+1 bits. If not specified, the value of the syntactic element pps_subpic_id[i] is inferred to be i for each i in the range of 0 to pps_num_subpics_minus1.
[0127]
[0154] As another example, the inference rules are given within the derivation process of SubpicIdList. If the syntactic element sps_subpic_id_present_flag is true and the syntactic elements sps_subpic_id_signaling_present_flag, pps_subpic_id_signaling_present_flag, and ph_subpic_id_signaling_present_flag are all false, the default value is set to SubpicIdList[i].
[0128]
[0155] The derivation of SubpicIdList follows the syntax (3) below (emphasized in italics): for(i=0;i<=sps_num_subpics_minus1;i++) SubpicIdList[i]=sps_subpic_id_present_flag? Syntax (3) (sps_subpic_id_signaling_present_flag?sps_subpic_id[i]: (ph_subpic_id_signaling_present_flag?ph_subpic_id[i]: (pps_subpic_id_signaling_present_flag?pps_subpic_id[i]:i))):i
[0129]
[0156] In some embodiments, by imposing an inference rule on any of the pps_subpic_ids, it is possible to ensure that the SubpicIdList can be derived even if the subpicture IDs are not signaled at all within the bitstream. Thus, bits that would otherwise be used for signaling the subpicture IDs can be saved.
[0130]
[0157] In some embodiments of this disclosure, the derivation rules for the subpicture ID list SubpicIdList can be modified to give higher priority to subpicture IDs signaled within the PPS than to PH. Thus, the syntax element pps_subpic_id_signaling_present_flag is checked before the syntax element ph_subpic_id_signaling_present_flag.
[0131]
[0158] As an example, the derivation rules for SubpicIdList follow the syntax (4) shown below (emphasized in italics), and the inference rules are given for the syntax element ph_subpic_id. for(i=0;i<=sps_num_subpics_minus1;i++) SubpicIdList[i]=sps_subpic_id_present_flag? Syntax (4) (sps_subpic_id_signaling_present_flag?sps_subpic_id[i]: (pps_subpic_id_signaling_present_flag?pps_subpic_id[i]:ph_subpic_id[i])):i
[0132]
[0159] The semantics (emphasized in italics) following syntax (4) are as follows: ph_subpic_id[i] specifies the subpicture ID of the i-th subpicture. The length of the syntactic element ph_subpic_id[i] is ph_subpic_id_len_minus1+1 bits. If not specified, the value of the syntactic element ph_subpic_id[i] is inferred to be i for each i in the range of 0 to ph_num_subpics_minus1.
[0133]
[0160] As another example, the derivation rule SubpicIDList follows the syntax (5) shown below (highlighted in italics), and in this example there are no additional inference rules for the syntax element ph_subpic_id. for(i=0;i<=sps_num_subpics_minus1;i++) SubpicIdList[i]=sps_subpic_id_present_flag? Syntax (5) (sps_subpic_id_signaling_present_flag?sps_subpic_id[i]: (pps_subpic_id_signaling_present_flag?pps_subpic_id[i]: (ph_subpic_id_signaling_present_flag?ph_subpic_id[i]:i))):i
[0134]
[0161] To ensure that SubpicIdList can be correctly derived even without signaling subpicture IDs within the bitstream, inference rules are provided for the derivation process of the syntax element ph_subpic_id or SubpicIdList. Therefore, if the default subpicture IDs inferred by the inference rules are used, bits that would otherwise be used for signaling subpictures can be saved.
[0135]
[0162] In some embodiments of the present disclosure, when the number of subpictures is equal to 1, redundant information signaled with respect to a subpicture can be removed.
[0136]
[0163] As an example, the SPS syntax is shown in Table 7A in Figure 17A (emphasis is shown in italics within boxes 1701-1702) or Table 7B in Figure 17B (emphasis is shown in italics within boxes 1711-1712). It should be understood that Tables 7A and 7B are equivalent. The semantics (emphasized in italics) that follow the syntax in Tables 7A and 7B are shown below.
[0137]
[0164] subpic_ctu_top_left_x[i] specifies the horizontal position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_x[i] is inferred to be equal to 0.
[0138]
[0165] subpic_ctu_top_left_y[i] specifies the vertical position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_y[i] is inferred to be equal to 0.
[0139]
[0166] subpic_width_minus1[i] plus 1 specifies the width of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not, the value of the syntax element subpic_width_minus1[i] is inferred to be equal to Ceil(pic_width_max_in_luma_samples÷CtbSizeY)-1, where "Ceil()" is a function for rounding up to the nearest integer. Thus, Ceil(pic_width_max_in_luma_samples÷CtbSizeY)-1 is equal to (pic_width_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1, where " / " is integer division.
[0140]
[0167] subpic_height_minus1[i] plus 1 specifies the height of the i-th subpicture in units of CtbSizeY. The length of this syntactic element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not, the value of the syntactic element subpic_height_minus1[i] is inferred to be equal to Ceil(pic_height_max_in_luma_samples÷CtbSizeY)-1, where "Ceil()" is a function for rounding up to the nearest integer. Thus, Ceil(pic_height_max_in_luma_samples÷CtbSizeY)-1 is equal to (pic_height_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1, where " / " is integer division.
[0141]
[0168] A subpic_treated_as_pic_flag[i] equal to 1 specifies that the i-th subpicture of each encoded picture in the CLVS is treated as a picture in the decoding process that excludes in-loop filtering operations. A subpic_treated_as_pic_flag[i] equal to 0 specifies that the i-th subpicture of each encoded picture in the CLVS is not treated as a picture in the decoding process that excludes in-loop filtering operations. If none of the above conditions are met, and the syntactic element subpics_present_flag is equal to 1 and the syntactic element sps_num_subpics_minus1 is equal to 0, then the value of the syntactic element subpic_treated_as_pic_flag[i] is inferred to be equal to 1; otherwise, the value of the syntactic element subpic_treated_as_pic_flag[i] is inferred to be equal to 0.
[0142]
[0169] A loop_filter_across_subpic_enabled_flag[i] equal to 1 specifies that in-loop filtering can be performed across the boundary of the i-th subpicture in each encoded picture within the CLVS. A loop_filter_across_subpic_enabled_flag[i] equal to 0 specifies that in-loop filtering is not performed across the boundary of the i-th subpicture in each encoded picture within the CLVS. If none of the above conditions are met, and the syntax element subpics_present_flag is equal to 1 and the syntax element sps_num_subpics_minus1 is equal to 0, then the value of the syntax element loop_filter_across_subpic_enabled_flag[i] is inferred to be equal to 0; otherwise, the value of the syntax element loop_filter_across_subpic_enabled_pic_flag[i] is inferred to be equal to 1.
[0143]
[0170] As another example, the SPS syntax is shown in Table 8 of Figure 18 (emphasis is shown in italics within boxes 1801-1802). The semantics (emphasized in italics) that follow the syntax in Table 8 are shown below.
[0144]
[0171] subpic_ctu_top_left_x[i] specifies the horizontal position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_x[i] is inferred to be equal to 0.
[0145]
[0172] subpic_ctu_top_left_y[i] specifies the vertical position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_y[i] is inferred to be equal to 0.
[0146]
[0173] subpic_width_minus1[i] plus 1 specifies the width of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not, the value of the syntax element subpic_width_minus1[i] is inferred to be equal to Ceil((pic_width_max_in_luma_samples÷CtbSizeY)-1, where "Ceil()" is a function for rounding up to the nearest integer. Thus Ceil(pic_width_max_in_luma_samples÷CtbSizeY)-1 is equal to (pic_width_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1, where " / " is integer division.
[0147]
[0174] subpic_height_minus1[i] plus 1 specifies the height of the i-th subpicture in units of CtbSizeY. The length of this syntactic element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not, the value of the syntactic element subpic_height_minus1[i] is inferred to be equal to Ceil(pic_height_max_in_luma_samples÷CtbSizeY)-1, where "Ceil()" is a function for rounding up to the nearest integer. Thus, Ceil(pic_height_max_in_luma_samples÷CtbSizeY)-1 is equal to (pic_height_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1, where " / " is integer division.
[0148]
[0175] In some embodiments of this disclosure, the position and / or size information of the last subpicture can be skipped and derived from the size of all pictures and the sizes and positions of all previous subpictures. Figure 19 shows an exemplary Table 9 illustrating another exemplary SPS syntax according to some embodiments of this disclosure. In Table 9 (emphasized in boxes 1901-1902 and highlighted in italics), the width and height of the last subpicture, which is the subpicture with an index equal to the syntax element sps_num_subpics_minus1, are skipped.
[0149]
[0176] The width and height of the last sub-picture are derived from the width and height of all pictures, as well as the top-left position of the last sub-picture.
[0150]
[0177] The following semantics (emphasized in italics) follow the semantics in Table 9.
[0151]
[0178] subpic_ctu_top_left_x[i] specifies the horizontal position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_x[i] is inferred to be equal to 0.
[0152]
[0179] subpic_ctu_top_left_y[i] specifies the vertical position of the top-left CTU of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not present, the value of the syntax element subpic_ctu_top_left_y[i] is inferred to be equal to 0.
[0153]
[0180] subpic_width_minus1[i] plus 1 specifies the width of the i-th subpicture in units of CtbSizeY. The length of this syntax element is Ceil(Log2(pic_width_max_in_luma_samples÷CtbSizeY)) bits. If not, the value of the syntax element subpic_width_minus1[i] is inferred to be equal to Ceil((pic_width_max_in_luma_samples)÷CtbSizeY)-1-(i==sps_num_subpics_minus1?subpic_ctu_top_left_x[sps_num_subpics_minus1]:0). Here, "Ceil()" is a function to round up to the nearest integer. In other words, Ceil(pic_width_max_in_luma_samples÷CtbSizeY) is equal to (pic_width_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1, where " / " is integer division. "sps_num_subpics_minus1" is the number of subpictures in the picture. With respect to the last subpicture in the picture, i is equal to sps_num_subpics_minus1, and in this case, it is inferred that subpic_width_minus1[i] is equal to (pic_width_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1-subpic_ctu_top_left_x[sps_num_subpics_minus1]. If there is only one subpicture in the picture, i can only be 0, and subpic_ctu_top_left_x[0] is 0. Therefore, it is inferred that subpic_width_minus1[i] is equal to (pic_width_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1-subpic_ctu_top_left_x[0] or (pic_width_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1.
[0154]
[0181] subpic_height_minus1[i] plus 1 specifies the height of the i-th subpict in units of CtbSizeY. The length of this syntactic element is Ceil(Log2(pic_height_max_in_luma_samples÷CtbSizeY)) bits. If not, the value of the syntactic element subpic_height_minus1[sps_num_subpics_minus1] is inferred to be equal to (Ceil(pic_height_max_in_luma_samples)÷CtbSizeY)-1-(i==sps_num_subpics_minus1?subpic_ctu_top_left_y[i]:0). Here, "Ceil()" is a function to round up to the nearest integer. In other words, Ceil(pic_height_max_in_luma_samples÷CtbSizeY) is equal to (pic_height_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1, where " / " is integer division. "sps_num_subpics_minus1" is the number of subpics in the picture. With respect to the last subpic in the picture, i is equal to sps_num_subpics_minus1, and in this case, it is inferred that subpic_height_minus1[i] is equal to (pic_height_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1-subpic_ctu_top_left_y[sps_num_subpics_minus1]. If there is only one subpic in the picture, i can only be 0, and subpic_ctu_top_left_y[0] is 0. Therefore, it is inferred that subpic_width_minus1[i] is equal to (pic_height_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1-subpic_ctu_top_left_y[0] or (pic_height_max_in_luma_samples+CtbSizeY-1) / CtbSizeY-1.
[0155]
[0182] In some embodiments of this disclosure, a subpicture ID is signaled only if a subpicture exists. Figure 20 shows an exemplary Table 10 illustrating another exemplary SPS syntax (emphasized in boxes 2001-2002 and highlighted in italics) according to some embodiments of this disclosure.
[0156]
[0183] Figure 21 shows a flowchart of an exemplary image processing method 2100 according to some embodiments of the present disclosure. Method 2100 may be executed by an encoder (e.g., by process 200A in Figure 2A or process 200B in Figure 2B), by a decoder (e.g., by process 300A in Figure 3A or process 300B in Figure 3B), or by one or more software or hardware components of a device (e.g., device 400 in Figure 4). For example, a processor (e.g., processor 402 in Figure 4) can execute Method 2100. In some embodiments, Method 2100 may be implemented by a computer program product embodied in a computer-readable medium that includes computer-executable instructions, such as program code, executed by a computer (e.g., device 400 in Figure 4).
[0157]
[0184] In step 2101, it can be determined whether a subpicture ID mapping is present in the bitstream. In some embodiments, method 2100 may include signaling a flag indicating whether a subpicture ID mapping is present in the bitstream. For example, this flag may be sps_subpic_id_present_flag as shown in Table 2 of Figure 11, Table 4 of Figure 13, Table 5 of Figure 15, or Table 6 of Figure 16.
[0158]
[0185] In step 2103, it can be determined whether one or more subpicture IDs are signaled within the first or second syntax. In step 2105, in response to the determination that a subpicture ID mapping exists and that one or more subpicture IDs are not signaled within the first and second syntaxes, one or more subpicture IDs are signaled within the third syntax. The first, second, or third syntax is one of SPS, PPS, and PH. For example, the first, second, and third syntaxes are SPS, PPS, and PH, respectively. Therefore, if a subpicture ID mapping exists (e.g., sps_subpic_id_present_flag=1), the subpicture ID may be forced to be signaled within SPS, PPS, or PH.
[0159]
[0186] In some embodiments, method 2100 may include signaling a first flag (e.g., sps_subpic_id_signaling_present_flag, shown in Table 2 of Figure 11, Table 5 of Figure 15, or Table 6 of Figure 16) indicating that one or more subpicture IDs are signaled within a first syntax (e.g., SPS, shown in Table 2 of Figure 11). In some embodiments, method 2100 may include signaling a second flag (e.g., pps_subpic_id_signaling_present_flag, shown in Table 3 of Figure 12, Table 5 of Figure 15, or Table 6 of Figure 16) indicating that one or more subpicture IDs are signaled within a second syntax (e.g., PPS, shown in Table 3 of Figure 12). In some embodiments, method 2100 may include signaling a third flag (e.g., ph_subpic_id_signaling_present_flag, shown in Table 5 of Figure 15) that indicates that one or more subpicture IDs are signaled within a third syntax (e.g., PH, shown in Table 5 of Figure 15).
[0160]
[0187] In some embodiments, method 2100 may include determining whether the bitstream contains a third flag indicating that one or more subpicture IDs are signaled in a third syntax, and, in response to the bitstream not containing the third flag, signaling one or more subpicture IDs in the third syntax. For example, the third syntax may be PH, and the third flag may be ph_subpic_id_signaling_present_flag. If ph_subpic_id_signaling_present_flag is not signaled in PH, then ph_subpic_id_signaling_present_flag can be inferred to be 1, and one or more subpicture IDs are signaled in PH.
[0161]
[0188] In some embodiments, method 2100 may include signaling one or more subpicture IDs in a second and third syntax (e.g., Table 5 in Figure 15) in response to determining that one or more subpicture IDs are not signaled in a first syntax.
[0162]
[0189] Figure 22 shows a flowchart of an exemplary image processing method 2200 according to some embodiments of the present disclosure. Method 2200 may be executed by an encoder (e.g., by process 200A in Figure 2A or process 200B in Figure 2B), by a decoder (e.g., by process 300A in Figure 3A or process 300B in Figure 3B), or by one or more software or hardware components of a device (e.g., device 400 in Figure 4). For example, a processor (e.g., processor 402 in Figure 4) can execute Method 2200. In some embodiments, Method 2200 may be implemented by a computer program product embodied in a computer-readable medium containing computer-executable instructions, such as program code executed by a computer (e.g., device 400 in Figure 4).
[0163]
[0190] In step 2201, it can be determined whether one or more subpicture IDs are signaled in at least one of the SPS, PH, or PPS. In some embodiments, method 2200 may include determining whether one or more subpicture IDs are signaled in the PH before determining whether one or more subpicture IDs are signaled in the PPS. In some embodiments, method 2200 may include determining whether one or more subpicture IDs are signaled in the PPS before determining whether one or more subpicture IDs are signaled in the PH.
[0164]
[0191] In step 2203, in response to the determination that one or more subpicture IDs are not signaled within SPS, PH, and PPS, it can be determined that one or more subpicture IDs have default values.
[0165]
[0192] Figure 23 shows a flowchart of an exemplary image processing method 2300 according to some embodiments of the present disclosure. Method 2300 may be executed by an encoder (e.g., by process 200A in Figure 2A or process 200B in Figure 2B), by a decoder (e.g., by process 300A in Figure 3A or process 300B in Figure 3B), or by one or more software or hardware components of a device (e.g., device 400 in Figure 4). For example, a processor (e.g., processor 402 in Figure 4) can execute Method 2300. In some embodiments, Method 2300 may be implemented by a computer program product embodied in a computer-readable medium containing computer-executable instructions, such as program code, which is executed by a computer (e.g., device 400 in Figure 4).
[0166]
[0193] In step 2301, it can be determined whether the number of subpictures in the encoded picture is equal to 1. For example, as shown in Table 8 of Figure 18, it can be determined whether sps_num_subpic_minus1 is greater than 0.
[0167]
[0194] In step 2303, in response to determining that the number of subpictures is equal to 1, the subpictures of the encoded picture can be treated as pictures in the decoding process. For example, in response to determining that the number of subpictures is equal to 1, the flag subpic_treated_as_pic_flag[i] can be inferred to be equal to 1. In some embodiments, in response to determining that the number of subpictures is equal to 1, the in-loop filtering operation can be excluded.
[0168]
[0195] Figure 24 shows a flowchart of an exemplary image processing method 2400 according to some embodiments of the present disclosure. Method 2400 may be executed by an encoder (e.g., by process 200A in Figure 2A or process 200B in Figure 2B), by a decoder (e.g., by process 300A in Figure 3A or process 300B in Figure 3B), or by one or more software or hardware components of a device (e.g., device 400 in Figure 4). For example, a processor (e.g., processor 402 in Figure 4) can execute Method 2400. In some embodiments, Method 2400 may be implemented by a computer program product embodied in a computer-readable medium containing computer-executable instructions, such as program code, which is executed by a computer (e.g., device 400 in Figure 4).
[0169]
[0196] In step 2401, it can be determined whether the subpicture is the last subpicture of the picture. In step 2403, in response to the determination that the subpicture is the last subpicture, information about the position or size of the subpicture can be derived from the size of the picture and the size and position of the subpicture preceding the picture.
[0170]
[0197] Figure 25 shows a flowchart of an exemplary image processing method 2500 according to some embodiments of the present disclosure. Method 2500 may be executed by an encoder (e.g., by process 200A in Figure 2A or process 200B in Figure 2B), by a decoder (e.g., by process 300A in Figure 3A or process 300B in Figure 3B), or by one or more software or hardware components of a device (e.g., device 400 in Figure 4). For example, a processor (e.g., processor 402 in Figure 4) can execute Method 2500. In some embodiments, Method 2500 may be implemented by a computer program product embodied in a computer-readable medium containing computer-executable instructions, such as program code executed by a computer (e.g., device 400 in Figure 4).
[0171]
[0198] In step 2501, it can be determined whether a subpicture exists within the picture. For example, this determination can be made based on a flag (e.g., subpics_present_flag shown in Table 10 of Figure 20).
[0172]
[0199] In step 2503, in response to the determination that one or more subpictures are present in the picture, a first flag can be signaled to indicate whether a subpicture ID mapping is present in the SPS. For example, the first flag may be sps_subpic_id_present_flag, as shown in Table 10 of Figure 20.
[0173]
[0200] In some embodiments, method 2500 may include signaling a second flag indicating whether the subpicture ID mapping is signaled in the SPS in response to a first flag indicating that the subpicture ID mapping is in the SPS. In response to the second flag indicating that the subpicture ID mapping is signaled in the SPS, the subpicture IDs of one or more subpictures may be signaled in the SPS. For example, the second flag may be sps_subpic_id_signaling_present_flag shown in Table 10 of Figure 20. If sps_subpic_id_signaling_present_flag is true, sps_subpic_id[i] may be signaled.
[0174]
[0201] Embodiments can be further described using the following clauses: 1. A video processing method, To determine whether a subpicture ID mapping exists within the bitstream, To determine whether one or more subpicture IDs are signaled within the first syntax or the second syntax, and In response to the determination that a subpicture ID mapping exists and that one or more subpicture IDs are not signaled in the first and second syntaxes, signal one or more subpicture IDs in the third syntax. A video processing method that includes [specific details omitted]. 2. The method described in Clause 1, wherein the first syntax, the second syntax, or the third syntax is one of a sequence parameter set (SPS), a picture parameter set (PPS), and a picture header (PH). 3. Signaling a first flag indicating that one or more subpicture IDs are signaled within the first syntax, or Signaling a second flag that indicates one or more subpicture IDs are signaled within the second syntax. The method described in clauses 1 and 2, further including the method described in clauses 1 and 2. 4. Signal a third flag indicating that one or more subpicture IDs are signaled within the third syntax. The method described in any one of the clauses 1 to 3, further including the method described in any one of the clauses 1 to 3. 5. Determine whether the bitstream contains a third flag indicating that one or more subpicture IDs are signaled within the third syntax, and In response to the bitstream not containing a third flag, signal one or more subpicture IDs within the third syntax. The method described in any one of the clauses 1 to 4, further including the method described in any one of the clauses 1 to 4. 6. In response to the determination that one or more subpicture IDs are not signaled within the first syntax, signal one or more subpicture IDs within the second and third syntaxes. The method described in any one of the clauses 1 to 5, further including the method described in any one of the clauses 1 to 5. 7. Signal a fourth flag indicating whether a subpicture ID mapping exists within the bitstream. The method described in any one of the clauses 1 to 6, further including the method described in any one of the clauses 1 to 6. 8. Video processing equipment, At least one memory for storing instructions, It includes at least one processor, and at least one processor is To determine whether a subpicture ID mapping exists within the bitstream, To determine whether one or more subpicture IDs are signaled within the first syntax or the second syntax, and In response to the determination that a subpicture ID mapping exists and that one or more subpicture IDs are not signaled in the first and second syntaxes, signal one or more subpicture IDs in the third syntax. A video processing device configured to execute commands to cause a device to perform a certain action. 9. The first syntax, the second syntax, or the third syntax is one of the sequence parameter set (SPS), picture parameter set (PPS), and picture header (PH) of the equipment described in Clause 8. 10. At least one processor, Signaling a first flag indicating that one or more subpicture IDs are signaled within the first syntax, or Signaling a second flag that indicates one or more subpicture IDs are signaled within the second syntax. The equipment described in clauses 8 and 9, configured to execute instructions in order to cause the equipment to perform the following actions. 11. At least one processor, Signaling a third flag that indicates one or more subpicture IDs are signaled within a third syntax. A device as described in any one of clauses 8 to 10, configured to execute instructions to cause the device to perform the action. 12. At least one processor, Determine whether the bitstream contains a third flag indicating that one or more subpicture IDs are signaled within the third syntax, and In response to the bitstream not containing a third flag, signal one or more subpicture IDs within the third syntax. A device as described in any one of clauses 8 to 11, configured to execute instructions in order to cause the device to perform the action. 13. At least one processor, In response to the determination that one or more subpicture IDs are not signaled within the first syntax, signal one or more subpicture IDs within the second and third syntaxes. A device as described in any one of clauses 8 to 12, configured to execute instructions to cause the device to perform the action. 14. At least one processor, Signaling a fourth flag indicating whether a subpicture ID mapping exists within the bitstream. A device as described in any one of clauses 8 to 13, configured to execute instructions in order to cause the device to perform the action. 15. A non-temporary computer-readable storage medium for storing a set of instructions, wherein the set of instructions is: To determine whether a subpicture ID mapping exists within the bitstream, To determine whether one or more subpicture IDs are signaled within the first syntax or the second syntax, and In response to the determination that a subpicture ID mapping exists and that one or more subpicture IDs are not signaled in the first and second syntaxes, signal one or more subpicture IDs in the third syntax. A non-temporary computer-readable storage medium that can be executed by one or more processing units to cause a video processing device to perform a method including the above. 16. The first syntax, the second syntax, or the third syntax is one of the sequence parameter set (SPS), picture parameter set (PPS), and picture header (PH) of the non-temporary computer-readable storage medium described in Clause 15. 17. The set of instructions is, Signaling a first flag indicating that one or more subpicture IDs are signaled within the first syntax, or Signaling a second flag that indicates one or more subpicture IDs are signaled within the second syntax. A non-temporary computer-readable storage medium as described in Clauses 15 and 16, which can be executed by one or more processing units to cause a video processing device to perform the same action. 18. The set of instructions is, Signaling a third flag that indicates one or more subpicture IDs are signaled within a third syntax. A non-temporary computer-readable storage medium as described in any one of clauses 15 to 17, which can be executed by one or more processing units to cause a video processing device to perform the same action. 19. The set of instructions is, Determine whether the bitstream contains a third flag indicating that one or more subpicture IDs are signaled within the third syntax, and In response to the bitstream not containing a third flag, signal one or more subpicture IDs within the third syntax. A non-temporary computer-readable storage medium as described in any one of clauses 15 to 18, which can be executed by one or more processing units to cause a video processing device to perform the same action. 20. The set of instructions is, In response to the determination that one or more subpicture IDs are not signaled within the first syntax, signal one or more subpicture IDs within the second and third syntaxes. A non-temporary computer-readable storage medium as described in any one of clauses 15 to 19, which can be executed by one or more processing units to cause a video processing device to perform the same action. 21. The set of instructions is, Signaling a fourth flag indicating whether a subpicture ID mapping exists within the bitstream. A non-temporary computer-readable storage medium as described in any one of clauses 15 to 20, which can be executed by one or more processing units to cause a video processing device to perform the same action. 22. A method for processing images, In at least one of the sequence parameter set (SPS), picture header (PH), or picture parameter set (PPS), determine whether one or more subpicture IDs are signaled, and In response to the determination that one or more subpicture IDs are not signaled within SPS, PH, and PPS, it is determined that one or more subpicture IDs have a default value. A video processing method that includes [specific details omitted]. 23. The method according to Clause 22, wherein it is determined whether one or more subpicture IDs are signaled in PH before determining whether one or more subpicture IDs are signaled in PPS. 24. The method according to Clause 22, wherein it is determined whether one or more subpicture IDs are signaled in the PPS before determining whether one or more subpicture IDs are signaled in the PH. 25. A method for processing images, To determine whether the number of subpictures in an encoded picture is equal to 1, and In response to determining that the number of subpictures is equal to 1, treat the subpictures of the encoded picture as pictures within the decoding process. A video processing method that includes [specific details omitted]. 26. Exclude the in-loop filtering operation in response to the determination that the number of subpictures is equal to 1. The method described in Clause 25, further including the method described in Clause 25. 27. A method for processing images, To determine whether a subpicture is the last subpicture of a picture, and In response to determining that a sub-picture is the last sub-picture, the position or size information of the sub-picture is derived from the size of the picture and the size and position of the sub-picture preceding the picture. A video processing method that includes [specific details omitted]. 28. A method for processing images, To determine whether a subpicture is located within a picture, and In response to the determination that one or more subpictures are present within a picture, a first flag is signaled indicating whether a subpicture ID mapping is present in the Sequence Parameter Set (SPS). A video processing method that includes [specific details omitted]. 29. In response to the first flag indicating that the subpicture ID mapping is in the SPS, signal a second flag indicating whether the subpicture ID mapping is signaled in the SPS. The method described in Article 28, further including the method described in Article 28. 30. In response to the second flag indicating that subpicture ID mapping is signaled within the SPS, signal the subpicture IDs of one or more subpictures within the SPS. The method described in Clause 29, further including the method described in Clause 29. 31. Video processing equipment, At least one memory for storing instructions, It includes at least one processor, and at least one processor is In at least one of the sequence parameter set (SPS), picture header (PH), or picture parameter set (PPS), determine whether one or more subpicture IDs are signaled, and In response to the determination that one or more subpicture IDs are not signaled within SPS, PH, and PPS, it is determined that one or more subpicture IDs have a default value. A video processing device configured to execute commands to cause a device to perform a certain action. 32. At least one processor, Before determining whether one or more subpicture IDs are signaled within PPS, determine whether one or more subpicture IDs are signaled within PH. The equipment described in Clause 31, configured to execute instructions in order to cause the equipment to perform the following. 33. At least one processor, Before determining whether one or more subpicture IDs are signaled within PH, determine whether one or more subpicture IDs are signaled within PPS. The equipment described in Clause 31, configured to execute instructions in order to cause the equipment to perform the following. 34. Video processing equipment, At least one memory for storing instructions, It includes at least one processor, and at least one processor is To determine whether the number of subpictures in an encoded picture is equal to 1, and In response to determining that the number of subpictures is equal to 1, treat the subpictures of the encoded picture as pictures within the decoding process. A video processing device configured to execute commands to cause a device to perform a certain action. 35. At least one processor, In response to determining that the number of subpictures is equal to 1, exclude the in-loop filtering operation. The equipment described in Clause 34, configured to execute instructions in order to cause the equipment to perform the following. 36. Video processing equipment, At least one memory for storing instructions, It includes at least one processor, and at least one processor is To determine whether a subpicture is the last subpicture of a picture, and In response to determining that a sub-picture is the last sub-picture, the position or size information of the sub-picture is derived from the size of the picture and the size and position of the sub-picture preceding the picture. A video processing device configured to execute commands to cause a device to perform a certain action. 37. Video processing equipment, At least one memory for storing instructions, It includes at least one processor, and at least one processor is To determine whether a subpicture is located within a picture, and In response to the determination that one or more subpictures are present within a picture, a first flag is signaled indicating whether a subpicture ID mapping is present in the Sequence Parameter Set (SPS). A video processing device configured to execute commands to cause a device to perform a certain action. 38. At least one processor, In response to the first flag indicating that the subpicture ID mapping is within the SPS, a second flag is signaled indicating whether the subpicture ID mapping is signaled within the SPS. The equipment described in Clause 37, configured to execute instructions in order to cause the equipment to perform the following. 39. At least one processor, In response to the second flag indicating that subpicture ID mapping is signaled within the SPS, signal the subpicture IDs of one or more subpictures within the SPS. The equipment described in Clause 38, configured to execute instructions in order to cause the equipment to perform the following. 40. A non-temporary computer-readable storage medium for storing a set of instructions, wherein the set of instructions is: In at least one of the sequence parameter set (SPS), picture header (PH), or picture parameter set (PPS), determine whether one or more subpicture IDs are signaled, and In response to the determination that one or more subpicture IDs are not signaled within SPS, PH, and PPS, it is determined that one or more subpicture IDs have a default value. A non-temporary computer-readable storage medium that can be executed by one or more processing units to cause a video processing device to perform a method including the above. 41. The set of instructions is, Before determining whether one or more subpicture IDs are signaled within PPS, determine whether one or more subpicture IDs are signaled within PH. A non-temporary computer-readable storage medium as described in Clause 40, which can be executed by one or more processing units to cause a video processing device to perform the same action. 42. The set of instructions is, Before determining whether one or more subpicture IDs are signaled within PH, determine whether one or more subpicture IDs are signaled within PPS. A non-temporary computer-readable storage medium as described in Clause 40, which can be executed by one or more processing units to cause a video processing device to perform the same action. 43. A non-temporary computer-readable storage medium for storing a set of instructions, wherein the set of instructions is: To determine whether the number of subpictures in an encoded picture is equal to 1, and In response to determining that the number of subpictures is equal to 1, treat the subpictures of the encoded picture as pictures within the decoding process. A non-temporary computer-readable storage medium that can be executed by one or more processing units to cause a video processing device to perform a method including the above. 44. The set of instructions is, In response to determining that the number of subpictures is equal to 1, exclude the in-loop filtering operation. A non-temporary computer-readable storage medium as described in Clause 43, which can be executed by one or more processing units to cause a video processing device to perform the same action. 45. A non-temporary computer-readable storage medium for storing a set of instructions, wherein the set of instructions is: To determine whether a subpicture is the last subpicture of a picture, and In response to determining that a sub-picture is the last sub-picture, the position or size information of the sub-picture is derived from the size of the picture and the size and position of the sub-picture preceding the picture. A non-temporary computer-readable storage medium that can be executed by one or more processing units to cause a video processing device to perform a method including the above. 46. A non-temporary computer-readable storage medium for storing a set of instructions, wherein the set of instructions is: To determine whether a subpicture is located within a picture, and In response to the determination that one or more subpictures are present within a picture, a first flag is signaled indicating whether a subpicture ID mapping is present in the Sequence Parameter Set (SPS). A non-temporary computer-readable storage medium that can be executed by one or more processing units to cause a video processing device to perform a method including the above. 47. The set of instructions is, In response to the first flag indicating that the subpicture ID mapping is within the SPS, a second flag is signaled indicating whether the subpicture ID mapping is signaled within the SPS. A non-temporary computer-readable storage medium as described in Clause 46, which can be executed by one or more processing units to cause a video processing device to perform the same action. 48. The set of instructions is, In response to the second flag indicating that subpicture ID mapping is signaled within the SPS, signal the subpicture IDs of one or more subpictures within the SPS. A non-temporary computer-readable storage medium as described in Clause 47, which can be executed by one or more processing units to cause a video processing device to perform the same action. 49. A method for processing images, Determine whether the bitstream contains subpicture information according to the subpicture information presence flag signaled within the bitstream, and In response to the bitstream containing subpicture information, The number of sub-pictures within a picture, Target subpicture width, height, position and identifier (ID) mapping, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag Signaling at least one of these within the bitstream A video processing method that includes [specific details omitted]. 50. At least one signaling of the width, height, and position of a target subpicture is provided based on the number of subpictures in the picture, as described in Clause 49. 51. If there are at least two subpictures within a picture, signal the subpic_treated_as_pic_flag, loop_filter_across_subpic_enabled_flag, and at least one of the width, height, and position of the target subpicture. It further includes, If a picture contains only one subpicture, at least one signaling of the width, height, and position of the target subpicture will be skipped. The subpic_treated_as_pic_flag indicates whether the subpictures of each encoded picture in the encoded layer video sequence (CLVS) are treated as pictures in the decoding process that exclude in-loop filtering operations, and The loop_filter_across_subpic_enabled_flag indicates whether in-loop filtering operations across subpicture boundaries are enabled across subpicture boundaries for each encoded picture in the CLVS, as described in Clause 50. 52. If the width of the target subpicture is not signaled, the width value of the target subpicture will be determined as the width of the picture, and If the height of the target subpicture is not signaled, the height value of the target subpicture will be used as the height of the picture. The method described in Clause 51, further including the method described in Clause 51. 53. If the width of the target subpicture is not signaled, the width of the target subpicture in units of the encoded tree block (CTB) size shall be determined as the width of the picture in units of the CTB size, and If the height of the target subpicture is not signaled, the height value of the target subpicture in CTB size units will be used as the height of the picture in CTB size units. The method described in Clause 52, further including the method described in Clause 52. 54. If subpic_treated_as_pic_flag is not signaled in the bitstream, determine that subpic_treated_as_pic_flag has a value of 1, and If loop_filter_across_subpic_enabled_flag is not signaled in the bitstream, determine that loop_filter_across_subpic_enabled_flag has a value of 0. The method described in Clause 51, further including the method described in Clause 51. 55. If the target subpicture is the last subpicture in the picture, skip at least one signaling for the width and height of the target subpicture. The method described in Clause 49, further including the method described in Clause 49. 56. If the width of the target subpicture is not signaled, the width of the target subpicture in units of the coded tree block (CTB) size shall be determined by subtracting the horizontal position of the upper-left coded tree unit (CTU) of the target subpicture in units of the CTB size from the width of the picture in units of the CTB size, or by subtracting the horizontal position of the upper-left coded tree unit (CTU) of the target subpicture from the width of the picture, and If the height of the target subpicture is not signaled, the height of the target subpicture in CTB size units is determined by subtracting the vertical position of the top-left CTU of the target subpicture in CTB size units from the height of the picture in CTB size units, or by subtracting the vertical position of the top-left CTU of the target subpicture from the height of the picture. The method described in Clause 55, further including the method described in Clause 55. 57. Signaling of ID mapping for target subpictures is, Signaling a first flag within the bitstream, and In response to the first flag being equal to 1, signaling the ID mapping of the target subpicture within the first or second data unit, It further includes, A first flag equal to 0 indicates that the ID mapping of the target subpicture is not signaled in the bitstream, as described in Clause 49. 58. In response that the first flag is equal to 1 and the ID mapping of the target subpicture is not signaled within the first data unit, signal the ID mapping of the target subpicture within the second data unit, or Skipping signaling the ID mapping of the target subpicture in the second data unit in response to the first flag being equal to 0, or the ID mapping of the target subpicture being signaled in the first data unit. The method described in Article 57, further including the method described in Article 57. 59. The method according to Clause 58, wherein each of the first and second data units is a sequence parameter set (SPS), a picture parameter set (PPS), or a picture header (PH). 60. Video processing equipment, At least one memory for storing instructions, It includes at least one processor, and at least one processor is Determine whether the bitstream contains subpicture information according to the subpicture information presence flag signaled within the bitstream, and In response to the bitstream containing subpicture information, The number of sub-pictures within a picture, Target subpicture width, height, position and identifier (ID) mapping, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag Signaling at least one of these within the bitstream A video processing device configured to execute commands to cause a device to perform a certain action. 61. At least one signaling of the width, height, and position of a target subpicture is based on the number of subpictures in the picture, as described in Clause 60. 62. At least one processor, If there are at least two subpictures within a picture, signal the subpic_treated_as_pic_flag, loop_filter_across_subpic_enabled_flag, and at least one of the width, height, and position of the target subpicture. It is configured to execute commands to cause the device to perform the following actions: If a picture contains only one subpicture, at least one signaling of the width, height, and position of the target subpicture will be skipped. The subpic_treated_as_pic_flag indicates whether the subpictures of each encoded picture in the encoded layer video sequence (CLVS) are treated as pictures in the decoding process that exclude in-loop filtering operations, and The loop_filter_across_subpic_enabled_flag indicates whether in-loop filtering operations across subpicture boundaries are enabled for each encoded picture in the CLVS, as described in Clause 61. 63. At least one processor, If the width of the target subpicture is not signaled, the width value of the target subpicture will be determined as the width of the picture, and If the height of the target subpicture is not signaled, the height value of the target subpicture will be used as the height of the picture. The equipment described in Clause 62, configured to execute instructions in order to cause the equipment to perform the following. 64. At least one processor, If the width of the target subpicture is not signaled, the width of the target subpicture in units of the encoded tree block (CTB) size is determined as the width of the picture in units of the CTB size, and If the height of the target subpicture is not signaled, the height value of the target subpicture in CTB size units will be used as the height of the picture in CTB size units. The equipment described in Clause 63, configured to execute instructions in order to cause the equipment to perform the following. 65. At least one processor, If subpic_treated_as_pic_flag is not signaled in the bitstream, determine that subpic_treated_as_pic_flag has a value of 1, and If loop_filter_across_subpic_enabled_flag is not signaled in the bitstream, determine that loop_filter_across_subpic_enabled_flag has a value of 0. The equipment described in Clause 62, configured to execute instructions in order to cause the equipment to perform the following. 66. At least one processor, If the target subpicture is the last subpicture in the picture, skip at least one signaling for the width and height of the target subpicture. The equipment described in Clause 60, configured to execute instructions in order to cause the equipment to perform the following. 67. At least one processor, If the width of the target subpicture is not signaled, the width of the target subpicture in units of the coded tree block (CTB) size is determined by subtracting the horizontal position of the top-left coded tree unit (CTU) of the target subpicture in units of the CTB size from the width of the picture in units of the CTB size, or the width of the target subpicture is determined by subtracting the horizontal position of the top-left coded tree unit (CTU) of the target subpicture from the width of the picture, and If the height of the target subpicture is not signaled, the height of the target subpicture in CTB size units is determined by subtracting the vertical position of the top-left CTU of the target subpicture in CTB size units from the height of the picture in CTB size units, or by subtracting the vertical position of the top-left CTU of the target subpicture from the height of the picture. The equipment described in Clause 66, configured to execute instructions in order to cause the equipment to perform the following. 68. A non-temporary computer-readable storage medium for storing a set of instructions, wherein the set of instructions is: Determine whether the bitstream contains subpicture information according to the subpicture information presence flag signaled within the bitstream, and In response to the bitstream containing subpicture information, The number of sub-pictures within a picture, Target subpicture width, height, position and identifier (ID) mapping, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag Signaling at least one of these within the bitstream A non-temporary computer-readable storage medium that can be executed by one or more processing units to cause a video processing device to perform a method including the above.
[0175]
[0202] In some embodiments, non-temporary computer-readable storage media containing instructions are also provided, which can be executed by a device (such as an encoder and decoder of this disclosure) to carry out the methods described above. Common forms of non-temporary media include, for example, floppy disks, flexible disks, hard disks, solid-state drives, magnetic tapes, or any other magnetic data storage media, CD-ROMs, any other optical data storage media, any physical media having a pattern of holes, RAM, PROMs and EPROMs, FLASH®-EPROMs, or any other flash memory, NVRAMs, caches, registers, any other memory chips, or cartridges, and networked versions thereof. A device may include one or more processors (CPUs), input / output interfaces, network interfaces, and / or memory.
[0176]
[0203] It should be noted that relational terms in this specification, such as "First" and "Second," are used solely to distinguish one entity or action from another, and do not imply or require any actual relationship or order between these entities or actions. Furthermore, the words "contain," "have," "contain," and "include," as well as other similar forms, are intended to be synonymous and open-ended in that any element or group of elements following any of these words is not meant to be an exhaustive enumeration of such elements or groups of elements, nor is it meant to be limited to only the enumerated elements or groups of elements.
[0177]
[0204] As used herein, unless otherwise specifically stated, the term "or" includes all possible combinations except when it is not feasible. For example, if it is stated that a database can include A or B, then, unless otherwise specifically stated or not feasible, the database can include A or B or A and B. As a second example, if it is stated that a database can include A, B, or C, then, unless otherwise specifically stated or not feasible, the database can include A, B, or C, or A and B, A and C or B and C, or A and B and C.
[0178]
[0205] It is understood that the above-described embodiments can be implemented by hardware or software (program code) or a combination of hardware and software. When implemented by software, it can be stored in the above-described computer-readable medium. The software can perform the method of the present disclosure when executed by a processor. The computing unit and other functional units described in the present disclosure can be implemented by hardware or software or a combination of hardware and software. Those skilled in the art will also understand that a plurality of the above-described modules / units can be combined as one module / unit, and each of the above-described modules / units can be further divided into a plurality of sub-modules / sub-units.
[0179]
[0206] In the above specification, embodiments have been described with reference to many specific details that may vary for each implementation form. Specific adaptations and modifications of the above embodiments may be made. From the consideration of this specification and the implementation of the invention disclosed herein, other embodiments may become apparent to those skilled in the art. This specification and the examples are intended to be considered only as examples, and the true scope and spirit of the invention are indicated by the appended claims. Also, the arrangement of steps shown in the figures is merely for illustrative purposes and is not intended to be limited to any particular arrangement of steps. Therefore, those skilled in the art can understand that these steps can be performed in different orders while implementing the same method.
[0180]
[0207] In the drawings and this specification, exemplary embodiments have been disclosed. However, many variations and modifications can be made to these embodiments. Therefore, even if specific terms are adopted, they are merely used in the general sense of explanation and are not used for the purpose of limitation.
Claims
1. A video processing method, Determine whether the bitstream contains subpicture information according to a subpicture information presence flag signaled within the bitstream, and In response to the bitstream containing the subpicture information, The number of sub-pictures within a picture, Target subpicture width, height, position and identifier (ID) mapping, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag Signaling at least one of the above within the bitstream A video processing method that includes [specific details omitted].
2. The method according to claim 1, wherein at least one of the signalings of the width, height, and position of the target subpicture is based on the number of subpictures in the picture.
3. If the picture contains at least two subpictures, signal the subpic_treated_as_pic_flag, the loop_filter_across_subpic_enabled_flag, and at least one of the width, height, and position of the target subpicture. It further includes, If there is only one subpicture within the picture, the at least one of the signalings for the width, height, and position of the target subpicture is skipped. The aforementioned subpic_treated_as_pic_flag indicates whether the subpictures of each encoded picture in the encoded layer video sequence (CLVS) are treated as pictures in the decoding process that exclude in-loop filtering operations, and The method according to claim 2, wherein the loop_filter_across_subpic_enabled_flag indicates whether an in-loop filtering operation across subpicture boundaries is possible across the subpicture boundaries of each encoded picture in the CLVS.
4. If the width of the target subpicture is not signaled, the value of the width of the target subpicture is determined as the width of the picture, and If the height of the target subpicture is not signaled, the value of the height of the target subpicture is determined as the height of the picture. The method according to claim 3, further comprising:
5. If the width of the target subpicture is not signaled, the value of the width of the target subpicture in units of the coded tree block (CTB) size is determined as the width of the picture in units of the CTB size, and If the height of the target subpicture is not signaled, the height value of the target subpicture in CTB size units is determined as the height of the picture in CTB size units. The method according to claim 4, further comprising:
6. If the subpic_treated_as_pic_flag is not signaled within the bitstream, it is determined that the subpic_treated_as_pic_flag has a value of 1, and If the loop_filter_across_subpic_enabled_flag is not signaled within the bitstream, it is determined that the loop_filter_across_subpic_enabled_flag has a value of 0. The method according to claim 3, further comprising:
7. If the target subpicture is the last subpicture in the picture, skip at least one signaling of the width and height of the target subpicture. The method according to claim 1, further comprising:
8. If the width of the target subpicture is not signaled, the value of the width of the target subpicture in units of the coded tree block (CTB) size is determined by subtracting the horizontal position of the upper-left coded tree unit (CTU) of the target subpicture in units of the CTB size from the width of the picture in units of the CTB size, or the value of the width of the target subpicture is determined by subtracting the horizontal position of the upper-left coded tree unit (CTU) of the target subpicture from the width of the picture, and If the height of the target subpicture is not signaled, the value of the height of the target subpicture in CTB size units is determined by subtracting the vertical position of the upper left CTU of the target subpicture in CTB size units from the height of the picture in CTB size units, or the value of the height of the target subpicture is determined by subtracting the vertical position of the upper left CTU of the target subpicture from the height of the picture. The method according to claim 7, further comprising:
9. Signaling the ID mapping of the target subpicture is Signaling a first flag within the bitstream, and In response to the first flag being equal to 1, signaling the ID mapping of the target subpicture within the first or second data unit. It further includes, The method according to claim 1, wherein the first flag equal to 0 indicates that the ID mapping of the target subpicture is not signaled in the bitstream.
10. In response to the first flag being equal to 1 and the ID mapping of the target subpicture not being signaled within the first data unit, the ID mapping of the target subpicture is signaled within the second data unit, or Skipping signaling the ID mapping of the target subpicture within the second data unit in response to the first flag being equal to 0 or the ID mapping of the target subpicture being signaled within the first data unit. The method according to claim 9, further comprising:
11. The method according to claim 10, wherein each of the first data unit and the second data unit is one of a sequence parameter set (SPS), a picture parameter set (PPS), or a picture header (PH).
12. It is a video processing device, At least one memory for storing instructions, It includes at least one processor, and the at least one processor is Determine whether the bitstream contains subpicture information according to a subpicture information presence flag signaled within the bitstream, and In response to the bitstream containing the subpicture information, The number of sub-pictures within a picture, Target subpicture width, height, position and identifier (ID) mapping, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag Signaling at least one of the above within the bitstream A video processing device configured to execute the command in order to cause the aforementioned device to perform the aforementioned action.
13. The apparatus according to claim 12, wherein signaling at least one of the width, height, and position of the target subpicture is based on the number of subpictures in the picture.
14. The aforementioned at least one processor is If the picture contains at least two subpictures, signal the subpic_treated_as_pic_flag, the loop_filter_across_subpic_enabled_flag, and at least one of the width, height, and position of the target subpicture. The device is configured to execute the command in order to cause the device to perform the following: If there is only one subpicture within the picture, the at least one of the signalings for the width, height, and position of the target subpicture is skipped. The aforementioned subpic_treated_as_pic_flag indicates whether the subpictures of each encoded picture in the encoded layer video sequence (CLVS) are treated as pictures in the decoding process that exclude in-loop filtering operations, and The apparatus according to claim 13, wherein the loop_filter_across_subpic_enabled_flag indicates whether in-loop filtering operations across subpicture boundaries are possible across the subpicture boundaries of each encoded picture in the CLVS.
15. The aforementioned at least one processor is If the width of the target subpicture is not signaled, the value of the width of the target subpicture is determined as the width of the picture, and If the height of the target subpicture is not signaled, the value of the height of the target subpicture is determined as the height of the picture. The apparatus according to claim 14, configured to execute the command in order to cause the apparatus to perform the above.
16. The aforementioned at least one processor is If the width of the target subpicture is not signaled, the value of the width of the target subpicture in units of the coded tree block (CTB) size is determined as the width of the picture in units of the CTB size, and If the height of the target subpicture is not signaled, the value of the height of the target subpicture in CTB size units is determined as the height of the picture in CTB size units. The apparatus according to claim 15, configured to execute the command in order to cause the apparatus to perform the above.
17. The aforementioned at least one processor is If the subpic_treated_as_pic_flag is not signaled within the bitstream, it is determined that the subpic_treated_as_pic_flag has a value of 1, and If the loop_filter_across_subpic_enabled_flag is not signaled within the bitstream, it is determined that the loop_filter_across_subpic_enabled_flag has a value of 0. The apparatus according to claim 14, configured to execute the command in order to cause the apparatus to perform the above.
18. The aforementioned at least one processor is If the target subpicture is the last subpicture in the picture, skip at least one signaling of the width and height of the target subpicture. The apparatus according to claim 12, configured to execute the command in order to cause the apparatus to perform the above.
19. The aforementioned at least one processor is If the width of the target subpicture is not signaled, the value of the width of the target subpicture in units of the coded tree block (CTB) size is determined by subtracting the horizontal position of the upper-left coded tree unit (CTU) of the target subpicture in units of the CTB size from the width of the picture in units of the CTB size, or the value of the width of the target subpicture is determined by subtracting the horizontal position of the upper-left coded tree unit (CTU) of the target subpicture from the width of the picture, and If the height of the target subpicture is not signaled, the value of the height of the target subpicture in CTB size units is determined by subtracting the vertical position of the upper left CTU of the target subpicture in CTB size units from the height of the picture in CTB size units, or the value of the height of the target subpicture is determined by subtracting the vertical position of the upper left CTU of the target subpicture from the height of the picture. The apparatus according to claim 18, configured to execute the command in order to cause the apparatus to perform the above.
20. A non-temporary computer-readable storage medium for storing a set of instructions, wherein the set of instructions is: Determine whether the bitstream contains subpicture information according to a subpicture information presence flag signaled within the bitstream, and In response to the bitstream containing the subpicture information, The number of sub-pictures within a picture, Target subpicture width, height, position and identifier (ID) mapping, subpic_treated_as_pic_flag, and loop_filter_across_subpic_enabled_flag Signaling at least one of the above within the bitstream A non-temporary computer-readable storage medium that can be executed by one or more processing units to cause a video processing device to perform a method including the above.