Video encoder, video decoder, and corresponding method
The use of identifier-based addressing in video coding systems addresses inefficiencies in handling sub-pictures by allowing sub-bitstream extraction with reduced resource usage and processing overhead, improving coding efficiency for applications like virtual reality.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2024-10-23
- Publication Date
- 2026-07-01
- Estimated Expiration
- Not applicable · inactive patent
AI Technical Summary
Existing video coding systems face inefficiencies when addressing sub-pictures in applications like virtual reality, as index-based addressing schemes fail to correctly identify the top-left corner of sub-pictures, requiring processor-intensive dynamic rewriting of slice headers for each user request.
An addressing scheme that uses identifiers instead of indices to map slice addresses from picture-based to sub-picture-based locations, allowing extraction of sub-bitstreams without rewriting slice headers, reducing resource usage and processing overhead.
This approach enhances coding efficiency by enabling sub-bitstream extraction with reduced network, memory, and processing resources, without the need for dynamic slice header rewriting.
Smart Images

Figure 0007883558000007 
Figure 0007883558000008 
Figure 0007883558000009
Abstract
Description
Technical Field
[0001] The present disclosure generally relates to video coding, and more particularly, to address management when extracting sub-pictures from a picture in video coding.
Background Art
[0002] The amount of video data required to represent even relatively short videos can be substantial, which can cause difficulties when the data is to be streamed or otherwise communicated across a communication network with limited bandwidth capacity. Thus, video data is generally compressed before being communicated across today's telecommunications networks. The size of a video can also be an issue when the video is stored on a storage device, since memory resources may be limited. Video compression devices often use software and / or hardware at the source to encode the video data prior to transmission or storage, thereby reducing the amount of data required to represent the digital video image. The compressed data is then received at the destination by a video decompression device that decodes the video data. Due to limited network resources and the ever increasing demand for even higher video quality, improved compression and decompression techniques that improve the compression ratio without sacrificing picture quality at all or substantially are desirable.
Summary of the Invention
[0003] In embodiments, the disclosure includes a method implemented in a decoder, comprising: the decoder's receiver receiving a subbitstream including subpictures of a picture partitioned into multiple slices including a first slice, a parameter set associated with the picture and subpictures, and a slice header associated with the first slice; the decoder's processor parsing the parameter set to obtain an identifier and the length of the slice address of the first slice; the processor determining the slice address of the first slice from the slice header based on the identifier and the length of the slice address; the processor decoding the subbitstream to generate a video sequence of the subpictures including the first slice; and the processor transferring the video sequence of the subpictures for display. In some video coding systems, slices (also known as tile groups) may be addressed based on a set of indices. Such indices may increment in a raster scan order, starting at index zero in the upper-left corner of the picture and ending at index N in the lower-right corner of the picture, where N is the number of indices minus 1. Such systems work well for most applications. However, certain applications, such as virtual reality (VR), only render sub-pictures of a picture. When a sub-bitstream contains the sub-pictures to be rendered, some systems may improve coding efficiency when streaming VR content by only transmitting sub-bitstreams of the bitstream to the decoder. In such cases, index-based addressing schemes may not work correctly because the top-left corner of the sub-pictures received by the decoder is generally some index other than zero. To address such concerns, encoders (or associated slicers) may be required to rewrite each slice header to change the index of the sub-pictures so that the top-left index starts at zero and the remaining sub-picture slices are adjusted accordingly.Dynamically rewriting the slice header (e.g., for each user request) can be very processor-intensive. The disclosed system employs an addressing scheme that allows for the extraction of sub-bitstreams containing sub-pictures without the need to rewrite the slice header. Each slice is addressed based on an identifier (ID) other than an index (e.g., a sub-picture ID). In this way, the decoder can consistently determine all relational addresses regardless of which sub-picture is received and regardless of the position of the received sub-picture relative to the upper-left corner of the complete picture. Since the ID is defined as appropriate (e.g., selected by the encoder), the ID is encoded in a variable-length field. Thus, the length of the slice address is also signaled. IDs associated with sub-pictures are also signaled. The length is used to interpret the slice address, and the sub-picture ID is used to map the slice address from a picture-based position to a sub-picture-based position. By employing these mechanisms, the encoder, decoder, and / or associated slices can be improved. For example, a sub-bitstream may be extracted and transmitted instead of the entire bitstream, which reduces the use of network resources, memory resources, and / or processing resources. Furthermore, such extraction of sub-bitstreams can be performed without rewriting each slice header for each user request, which further reduces the use of network resources, memory resources, and / or processing resources.
[0004] Optionally, in any of the above embodiments, other embodiments of the embodiment provide that an identifier is associated with a subpicture.
[0005] Optionally, in any of the above embodiments, other embodiments of the embodiment provide that the length of the slice address indicates the number of bits contained in the slice address.
[0006] Optionally, in any of the above embodiments, another embodiment of the embodiment provides that determining the slice address of a first slice comprises the processor using a length from a parameter set to determine a bit boundary for interpreting the slice address from the slice header, and the processor using an identifier and the slice address to map the slice address from a picture-based location to a subpicture-based location.
[0007] Optionally, in any of the above embodiments, other embodiments of the embodiment further comprises the processor parsing a parameter set to obtain an identifier (ID) flag, the ID flag indicating that a mapping is available to map slice addresses from picture-based locations to sub-picture-based locations.
[0008] Optionally, in any of the above embodiments, other embodiments of the embodiment provide a mapping between picture-based positions and sub-picture-based positions that aligns the slice header to the sub-picture without requiring the slice header to be rewritten.
[0009] Optionally, in any of the above embodiments, other embodiments of the embodiment provide that the slice address has a defined value and does not have an index.
[0010] In embodiments, the disclosure includes a method implemented in an encoder, comprising: encoding a picture into a bitstream by the encoder's processor, the picture having a plurality of slices, including a first slice; encoding a slice header in the bitstream, including the slice address of the first slice; encoding a parameter set in the bitstream, including the identifier and length of the slice address of the first slice; extracting a sub-bitstream of the bitstream by extracting the first slice based on the slice address, length of the slice address and identifier of the first slice without rewriting the slice header; and storing the sub-bitstream in the encoder's memory for communication to the decoder. In some video coding systems, slices (also known as tile groups) may be addressed based on a set of indices. Such indices may increase in a raster scan order, starting at index zero in the upper-left corner of the picture and ending at index N in the lower-right corner of the picture, where N is the index number minus 1. Such systems work well for most applications. However, certain applications, such as virtual reality (VR), only render sub-pictures of a picture. Some systems can improve coding efficiency when streaming VR content by transmitting only sub-bitstreams of the bitstream to the decoder, where the sub-bitstream contains the sub-picture to be rendered. In such cases, index-based addressing schemes may not work correctly because the top-left corner of the sub-picture received by the decoder is typically some non-zero index.To address such concerns, the encoder (or associated slicer) may be required to rewrite each slice header to change the index of the subpictures so that the top-left index starts at zero and the remaining subpicture slices are adjusted accordingly. Dynamically rewriting the slice header (e.g., for each user request) can be very processor-intensive. The disclosed system employs an addressing scheme that allows the extraction of subbitstreams containing subpictures without the need to rewrite the slice header. Each slice is addressed based on an identifier (ID) other than the index (e.g., subpicture ID). In this way, the decoder can consistently determine all relational addresses regardless of which subpicture is received and regardless of the position of the received subpicture relative to the top-left corner of the complete picture. Since the ID is defined as appropriate (e.g., selected by the encoder), the ID is encoded in a variable-length field. Thus, the length of the slice address is also signaled. The ID associated with the subpicture is also signaled. The length is used to interpret the slice address, and the subpicture ID is used to map the slice address from the picture-based location to the subpicture-based location. By employing these mechanisms, encoders, decoders, and / or associated slices can be improved. For example, a subbitstream may be extracted and transmitted instead of the entire bitstream, which reduces the use of network resources, memory resources, and / or processing resources. Furthermore, such subbitstream extraction can be performed without rewriting each slice header for each user request, which further reduces the use of network resources, memory resources, and / or processing resources.
[0011] Optionally, in any of the above embodiments, other embodiments of the embodiment provide that an identifier is associated with a subpicture.
[0012] Optionally, in any of the above embodiments, other embodiments of the embodiment provide that the length of the slice address indicates the number of bits contained in the slice address.
[0013] Optionally, in any of the above embodiments, other embodiments of the embodiment provide that the length in the parameter set contains enough data to interpret the slice address from the slice header, and the identifier contains enough data to map the slice address from a picture-based location to a sub-picture-based location.
[0014] Optionally, in any of the above embodiments, another embodiment of the embodiment further comprises the processor encoding an identifier (ID) flag in the parameter set indicating that a mapping is available to map a slice address from a picture-based location to a sub-picture-based location.
[0015] Optionally, in any of the above embodiments, other embodiments of the embodiment provide that the slice address has a defined value and does not have an index.
[0016] Optionally, in any of the above embodiments, another embodiment of the embodiment provides that extracting a sub-bitstream of a bitstream includes extracting a sub-picture of a picture, wherein the sub-picture includes a first slice, and the sub-bitstream has the sub-picture, a slice header, and a parameter set.
[0017] In embodiments, the disclosure includes a video coding device having a processor, memory, a receiver coupled to the processor, and a transmitter coupled to the processor, wherein the processor, memory, receiver, and transmitter are configured to perform any of the methods described above.
[0018] In embodiments, the Disclosure includes a non-temporary computer-readable medium having a computer program product used by a video coding device, wherein the computer program product has computer-executable instructions stored in the non-temporary computer-readable medium to cause the video coding device to perform any of the methods described above when executed by a processor.
[0019] In embodiments, the disclosure includes a decoder having: receiving means for receiving a subbitstream including subpictures of a picture partitioned into a plurality of slices including a first slice, a parameter set associated with the picture and subpictures, and a slice header associated with the first slice; parsing means for parsing the parameter set to obtain an identifier and the length of the slice address of the first slice; determining means for determining the slice address of the first slice from the slice header based on the identifier and the length of the slice address; decoding means for decoding the subbitstream to generate a video sequence of the subpictures including the first slice; and transferring means for transferring the video sequence of the subpictures for display.
[0020] Optionally, in any of the above embodiments, another embodiment of the embodiment provides that the decoder is further configured to perform any of the methods of the above embodiments.
[0021] In an embodiment, the present disclosure encodes a picture into a bitstream, the picture having a plurality of slices including a first slice, encodes a slice header including the slice address of the first slice into the bitstream, and encodes a parameter set including the identifier and the length of the slice address of the first slice into the bitstream, and includes encoding means; extraction means for extracting a sub-bitstream of the bitstream by extracting the first slice based on the slice address of the first slice, the length of the slice address, and the identifier without rewriting the slice header; and storage means for storing the sub-bitstream for communication to a decoder.
[0022] Optionally, in any of the above aspects, another implementation of the aspect provides that the encoder is further configured to perform the method of any of the above aspects.
[0023] For clarity, any one of the above embodiments may be combined with any one or more of the other above embodiments to bring about new embodiments within the scope of the present disclosure.
[0024] These and other features will be more clearly understood from the following detailed description read in conjunction with the accompanying drawings and the claims.
[0025] For a more complete understanding of the present disclosure, reference is now made to the following brief description, read in conjunction with the accompanying drawings and detailed description. The same reference numbers represent the same parts.
Brief Description of the Drawings
[0026] [Figure 1] It is a flowchart of an example of a method for coding a video signal. [Figure 2] It is a schematic diagram of an example of a coding and decoding (codec) system for video coding. [Figure 3]It is a schematic diagram showing an example of a video encoder. [Figure 4] It is a schematic diagram showing an example of a video decoder. [Figure 5] It is a schematic diagram showing an example of a sub-bitstream extracted from a bitstream. [Figure 6] It is a schematic diagram showing an example of a picture partitioned for coding. [Figure 7] It is a schematic diagram showing an example of a sub-picture extracted from a picture. [Figure 8] It is a schematic diagram of an exemplary video coding device. [Figure 9] It is a flowchart of an example of a method for encoding a bitstream of a picture to assist in extracting a sub-bitstream of a sub-picture without rewriting a slice header by adopting explicit address signaling. [Figure 10] It is a flowchart of an example of a method for decoding a sub-bitstream of a sub-picture by adopting explicit address signaling. [Figure 11] It is a schematic diagram of an example of a system for transmitting a sub-bitstream of a sub-picture by adopting explicit address signaling. **Modes for Carrying Out the Invention**
[0027] Exemplary implementations of one or more embodiments are given below. It should first be understood that the disclosed system and / or method may be implemented using any number of techniques, regardless of whether currently known or existing. The disclosure should in no way be limited to the exemplary implementations, drawings, and techniques described below, including the exemplary designs and implementations illustrated and described herein, but may be varied within the full scope of the appended claims and their equivalents.
[0028] Various acronyms are used here, including coding tree block (CTB), coding tree unit (CTU), coding unit (CU), coded video sequence (CVS), Joint Video Experts Team (JVET), motion constrained tile set (MCTS), maximum transfer unit (MTU), network abstraction layer (NAL), picture order count (POC), raw byte sequence payload (RBSP), sequence parameter set (SPS), versatile video coding (VVC), and working draft (WD).
[0029] Many video compression techniques can be used to reduce the size of video files with minimal data loss. For example, video compression techniques may include performing spatial (e.g., intra-picture) prediction and / or temporal (e.g., inter-picture) prediction to reduce or eliminate data redundancy within a video sequence. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be called tree blocks, coding tree blocks (CTBs), coding tree units (CTUs), coding units (CUs), and / or coding nodes. Video blocks in an intra-coded (I) slice of a picture are coded using spatial prediction with respect to reference samples in adjacent blocks within the same picture. Video blocks in an intercoded (P) or bidirectional (B) slice of a picture may be coded using spatial prediction with respect to reference pictures in adjacent blocks within the same picture, or using temporal prediction with respect to reference samples in other reference pictures. A picture may be called a frame and / or image, and a reference picture may be called a reference frame and / or reference image. Spatial or temporal prediction yields predicted blocks representing image blocks. Residual data represents the pixel difference between the original image blocks and the predicted blocks. Thus, the intercoded blocks are encoded according to motion vectors pointing to the reference sample blocks forming the predicted blocks, and residual data showing the difference between the coded blocks and the predicted blocks. Intracoded blocks are encoded according to the intracoded mode and residual data. For further compression, the residual data may be transformed from pixel regions to transformation regions. These yield residual transformation coefficients, which may be quantized. The quantized transformation coefficients may initially be arranged in a two-dimensional array. The quantized transformation coefficients may be scanned to generate a one-dimensional vector of transformation coefficients. Entropy coding may be applied to achieve further compression.These video compression techniques will be explained in more detail below.
[0030] To ensure that encoded video can be accurately decoded, video is encoded and decoded according to the corresponding video coding standard. Video coding standards include Advanced Video Coding (AVC), also known as ITU-T H.261, ITU-T H.262 or ISO / IEC MPEG-2 Part 2, ITU-T H.263, ISO / IEC MPEG-4 Part 2, ITU-T H.264 or ISO / IEC MPEG-4 Part 10, and High Efficiency Video Coding (HEVC), also known as ITU-T H.265 or MPEG-H Part 2. AVC includes extensions such as Scalable Video Coding (SVC), Multiview Video Coding (MVC), Multiview Video Coding Plus Depth (MVC+D), and 3D AVC (3D-AVC). HEVC includes extensions such as Scalable HEVC (SHVC), Multiview HEVC (MV-HEVC), and 3D HEVC (3D-HEVC). The Joint Video Expert Committee (JVET) of ITU-T and ISO / IEC has begun developing a video coding standard called Versatile Video Coding (VVC). VVC is included in working drafts (WDs), including JVET-L1001.
[0031] To code a video image, the image is first partitioned, and the partitions are coded into the bitstream. Various picture partitioning schemes are available. For example, an image may be partitioned according to regular slices, dependent slices, tiles, and / or wavefront parallel processing (WPP). For simplicity, HEVC restricts the encoder so that only regular slices, dependent slices, tiles, WPP, and combinations thereof can be used when partitioning slices into groups of CTBs for video coding. Such partitioning may be applied to support matching Maximum Transmission Unit (MTU) sizes, parallel processing, and reduced end-to-end latency. The MTU represents the maximum amount of data that can be transmitted in a single packet. If a packet payload exceeds the MTU, the payload is split into two packets through a process called fragmentation.
[0032] Regular slices, also simply called slices, are partitioned portions of an image that can be reconstructed independently of other regular slices within the same picture, despite some interdependence due to loop filtering operations. Each regular slice is encapsulated in its own Network Abstraction Layer (NAL) unit for transmission. Furthermore, in-picture predictions (intra-sample predictions, motion information predictions, coding mode predictions) and entropy coding dependencies across slice boundaries may be disabled to support independent reconstruction. Such independent reconstructions support parallelization. For example, parallelization based on regular slices uses minimal inter-processor or inter-core communication. However, because each regular slice is independent, each slice is associated with a separate slice header. The use of regular slices can result in considerable coding overhead due to the bit cost of the slice header for each slice and the lack of predictions across slice boundaries. Furthermore, regular slices may be used to support the requirement of matching MTU sizes. Specifically, regular slices can be encapsulated in separate NAL units and coded independently, meaning each regular slice must be smaller than the MTU in the MTU scheme to avoid splitting the slice into multiple packets. As such, the goals of parallelization and matching MTU sizes can present conflicting requirements for the slice layout within the picture.
[0033] Dependent slices are similar to regular slices but have a shortened slice header and allow partitioning of image tree block boundaries without interrupting in-picture prediction. Thus, dependent slices allow regular slices to be fragmented into multiple NAL units, which results in reduced end-to-end delay by allowing parts of the regular slice to be sent out before the encoding of the entire regular slice is complete.
[0034] A tile is a partitioned portion of an image, formed by horizontal and vertical boundaries that create rows and columns of tiles. Tiles may be coded in raster scan order (right to left and top to bottom). The scan order of CTBs is local within a tile. Therefore, CTBs in the first tile are coded in raster scan order before processing of CTBs in the next tile. Similar to regular slices, tiles break both entropy decoding dependencies and in-picture prediction dependencies. However, tiles do not have to be contained within individual NAL units, and therefore tiles do not have to be used for MTU size matching. Each tile can be processed by one processor / core, and inter-processor / inter-core communication used for in-picture prediction between processing units decoding adjacent tiles may be limited to carrying a shared slice header (when adjacent tiles are in the same slice) and performing sharing related to loop filtering of reconstructed samples and metadata. If a slice contains more than one tile, the entry point byte offsets for each tile, other than the first entry point offset in the slice, may be signaled in the slice header. For each slice and tile, at least one of the following conditions must be met: 1) all coded tree blocks within a slice belong to the same tile, and 2) all coded tree blocks within a tile belong to the same slice.
[0035] In WPP, an image is partitioned into single rows of CTBs. The entropy decoding and prediction mechanism may use data from CTBs in other rows. Parallel processing is enabled through parallel decoding of CTB rows. For example, the current row may be decoded in parallel with the preceding row. However, the decoding of the current row is delayed by only two CTBs from the decoding process of the preceding row. This delay ensures that data about the CTB above and to the right of the current CTB in the current row is available before the current CTB is coded. This approach appears as a wavefront when represented graphically. This time-delayed start allows for parallelization with no more than the same number of processors / cores as the number of CTB rows contained in the image. Since intra-picture prediction between adjacent tree block rows within a picture is allowed, the amount of inter-processor / inter-core communication required to enable intra-picture prediction can be considerable. WPP partitioning does not consider NAL unit size. Therefore, WPP does not support MTU size matching. However, regular slicing can be used with WPP, albeit with certain coding overhead, to implement MTU size matching as desired.
[0036] Tiles may also include motion-constrained tile sets. A motion-constrained tile set (MCTS) is a tile set designed such that associated motion vectors are restricted to pointing to full sample positions within the MCTS and fractional sample positions that require only full sample positions within the MCTS for interpolation. Furthermore, the use of motion vector candidates for time motion vector prediction derived from blocks outside the MCTS is not permitted. In this way, each MCTS can be decoded independently without the presence of tiles not included in the MCTS. Time MCTS supplemental enhancement information (SEI) messages may be used to indicate the presence of MCTS in the bitstream and to signal the MCTS. MCTS SEI messages provide supplemental information that may be used in MCTS sub-bitstream extraction (specified as part of the semantics of the SEI message) to generate a fitted bitstream for the MCTS set. The information includes numerous extraction information sets, each defining numerous MCTS sets and containing raw-byte sequence payload (RBSP) bytes for the replaced video parameter set (VPS), sequence parameter set (SPS), and picture parameter set (PPS) used during the MCTS sub-bitstream extraction process. When extracting sub-bitstreams according to the MCTS sub-bitstream extraction process, the parameter sets (VPS, SPS, and PPS) may be rewritten or replaced, and the slice header may be updated because one or all of the syntax elements related to the slice address (including first_slice_segment_in_pic_flag and slice_segment_address) may use different values in the extracted sub-bitstream.
[0037] The above scheme may contain some problems. In some systems, when a picture has more than one tile / slice, the address of a tile group may be signaled as an index in the tile group header by using a syntax element such as tile_group_address. tile_group_address identifies the tile address of the first tile in the tile group. The length of tile_group_address may be determined to be Ceil(Log2(NumTilesInPic)) bits, where NumTilesInPic contains the number of tiles in the picture. The value of tile_group_address may be between zero and NumTilesInPic-1, and the value of tile_group_address may not be equal to the value of tile_group_address of any other coded tile group NAL unit in the same coded picture. tile_group_address may be inferred to be zero if it does not exist in the bitstream. The tile address described above includes the tile index. However, using the tile index as the address for each tile group can lead to some coding inefficiencies.
[0038] For example, certain use cases may require modification of the AVC or HEVC slice segment header between encoding and decoding, either on the client side or by some network-based media processing entity, just before passing the bitstream to the decoder. One example of such a use case is tiled streaming. In tiled streaming, panoramic video is encoded using HEVC tiles, but the decoder decodes only a portion of these tiles. By rewriting the HEVC slice segment header (SSH) along with the SPS / PPS, the bitstream can be manipulated to change a subset of the tiles being decoded and their spatial arrangement within the decoded video frame. One reason for this CPU cost is the fact that AVC and HEVC slice segment headers use variable-length fields and have a byte alignment field at the end. This means that any change in any field within the SSH may affect the byte alignment field at the end of the SSH, which will also be rewritten in that case. And since all fields are encoded with variable length, the only way to know the position of the byte alignment field is to parse all the preceding fields. This results in significant processing overhead, especially when tiles are used and a video equivalent to one second may contain hundreds of NALs. Some systems support explicitly signaling tile identifiers (IDs). However, some syntax elements may not be optimized and may contain unnecessary and / or redundant bits during signaling. Furthermore, some constraints associated with explicit tile ID signaling are not identified.
[0039] For example, the above mechanism allows a picture to be partitioned and compressed. For instance, a picture may be partitioned into slices, tiles, and / or tile groups. In some examples, a tile group may be used synonymously with a slice. Such slices and / or tile groups may be addressed based on a set of indices. Such indices may increment in a raster scan order, starting at index zero at the top-left corner of the picture and ending at index N at the bottom-right corner of the picture, where N is the number of indices minus 1. Such a system works well for most applications. However, certain applications, such as virtual reality (VR), only render subpictures of a picture. Such subpictures are sometimes called regions of interest in some contexts. When a subbitstream contains subpictures to be rendered, some systems can improve coding efficiency when streaming VR content by transmitting only subbitstreams of the bitstream to the decoder. In such cases, an index-based addressing scheme may not work correctly because the top-left corner of the subpicture received by the decoder is generally some index other than zero. To address such concerns, the encoder (or associated slicer) may be required to rewrite each slice header to change the index of the subpictures so that the top-left index starts at zero and the remaining subpicture slices are adjusted accordingly. Dynamically rewriting slice headers (e.g., for each user request) can be very processor-intensive.
[0040] This document discloses various mechanisms for improving coding efficiency and reducing processing overhead when extracting a sub-bitstream containing sub-pictures from an encoded bitstream containing pictures. The disclosed system employs an addressing scheme that enables the extraction of a sub-bitstream containing sub-pictures without the need to rewrite the slice header. Each slice / tile group is addressed based on an ID other than an index. For example, a slice may be addressed by a value that can be mapped to an index and stored in the slice header. This allows the decoder to read the slice address from the slice header and map the address from a picture-based location to a sub-picture-based location. Since the slice address is not a predefined index, the slice address is encoded in a variable-length field. Therefore, the length of the slice address is also signaled. IDs associated with sub-pictures are also signaled. The sub-picture ID and length may be signaled with PPS. A flag may also be signaled with PPS to indicate that an explicit addressing scheme is employed. By reading the flags, the decoder can obtain the length and subpicture ID. The length is used to interpret the slice address from the slice header. The subpicture ID is used to map the slice address from the picture-based position to the subpicture-based position. In this way, the decoder can consistently determine all relational addresses regardless of which subpicture is received and regardless of the position of the received subpicture relative to the upper-left corner of the complete picture. Furthermore, this mechanism allows such determinations to be made without rewriting the slice header to change the slice address value and / or without changing the byte alignment field associated with the slice address. By employing the above mechanism, the encoder, decoder, and / or associated slices can be improved.For example, a sub-bitstream may be extracted and transmitted instead of the entire bitstream, which reduces the use of network resources, memory resources, and / or processing resources. Furthermore, such extraction of sub-bitstreams can be performed without rewriting each slice header for each user request, which further reduces the use of network resources, memory resources, and / or processing resources.
[0041] Figure 1 is a flowchart of an example operation method 100 for coding a video signal. Specifically, the video signal is encoded by an encoder. The encoding process compresses the video signal by employing various mechanisms to reduce the video file size. A smaller file size allows the compressed video file to be transmitted to the user while reducing the associated bandwidth overhead. The decoder then decodes the compressed video file to reconstruct the original video signal for display to the end user. The decoding process generally mirrors the encoding process to enable the decoder to consistently reconstruct the video signal.
[0042] In step 101, the video signal is input to the encoder. For example, the video signal may be an uncompressed video file stored in memory. In another example, the video file may be captured by a video capture device such as a video camera and encoded to support live streaming of the video. The video file may contain both audio and video components. The video component contains a series of image frames that, when viewed sequentially, give a visual impression of motion. Each frame contains pixels, which are represented with respect to light, here called the lumen component (or lumen sample), and color, here called the chromen component (or color sample). In some examples, the frames may also contain depth values to support three-dimensional display.
[0043] In step 103, the video is partitioned into blocks. Partitioning involves subdividing the pixels within each frame into square and / or rectangular blocks for compression. For example, in High Efficiency Video Coding (HEVC) (known as H.265 and MPEG-H Part 2), a frame may first be divided into coding tree units (CTUs), which are blocks of a predefined size (e.g., 64x64 pixels). A CTU contains both lumens and chroma samples. The coding tree may be used to divide the CTUs into blocks, and then to recursively subdivide the blocks until a configuration supporting further encoding is achieved. For example, the lumens component of a frame may be subdivided until the individual blocks contain relatively uniform brightness and darkness values. Furthermore, the chromens component of a frame may be subdivided until the individual blocks contain relatively uniform color values. Thus, the partitioning mechanism varies depending on the content of the video frame.
[0044] In step 105, various compression mechanisms are used to compress the image blocks partitioned in step 103. For example, interpretation and / or intrapretation may be used. Interpretation is designed to take advantage of the fact that objects in a common scene tend to appear in consecutive frames. Thus, blocks representing objects in a reference frame do not need to be described repeatedly in adjacent frames. Specifically, objects such as tables may remain in the same position across multiple frames. Thus, a table is described once, and adjacent frames can refer back to the reference frame. Pattern matching mechanisms may be used to match objects across multiple frames. Furthermore, moving objects may be represented across multiple frames, for example, by the motion of the object or the motion of the camera. As a concrete example, a video may show a car moving across the screen across multiple frames. Motion vectors may be used to describe such motion. A motion vector is a two-dimensional vector that provides an offset from the coordinates of an object in one frame to the coordinates of an object in a reference frame. As such, interpretation can encode image blocks in the current frame as a set of motion vectors indicating the offset from the corresponding block in the reference frame.
[0045] Intra-prediction encodes blocks within a common frame. It leverages the fact that lumens and chroma components tend to cluster within a frame. For example, a green patch in a part of a tree tends to be adjacent to similar green patches. Intra-prediction uses multiple directional prediction modes (e.g., 33 in HEVC), planar mode, and DC mode. Directional modes indicate that the current block is similar to / identical to samples of adjacent blocks in the corresponding direction. Planar mode indicates that a series of blocks along a row / column (e.g., a plane) can be interpolated based on adjacent blocks at the ends of the row. Planar mode effectively shows smooth light / color transitions across rows / columns by using a relatively constant slope of changing values. DC mode is used for boundary smoothing and indicates that a block is similar to / identical to the average value associated with samples of all adjacent blocks related to the angular direction of the directional prediction mode. Thus, intra-predicted blocks can represent image blocks as various related prediction mode values instead of actual values. Furthermore, the interpretation block can represent the image block as a motion vector instead of its actual value. In either case, the prediction block may not accurately represent the image block in some cases. Any differences are stored in the residual block. Transformations may be applied to the residual block to further compress the file.
[0046] In step 107, various filtering techniques may be applied. In HEVC, filters are applied according to an in-loop filtering scheme. The block-based prediction described above may cause the decoder to produce images with block noise. Furthermore, the block-based prediction scheme may encode the blocks and then reconstruct the encoded images for later use as reference blocks. The in-loop filtering scheme iteratively applies noise suppression filters, deblocking filters, adaptive loop filters, and sample-adaptive offset (SAO) filters to the blocks / frames. These filters mitigate such blocking artifacts so that the encoded file can be accurately reconstructed. Furthermore, these filters mitigate artifacts in the reconstructed reference blocks so that the artifacts are less likely to cause further artifacts in subsequent blocks encoded based on the reconstructed reference blocks.
[0047] Once the video signal is partitioned, compressed, and filtered, the resulting data is encoded in a bitstream in step 109. The bitstream includes the above data, in addition to any signaling data desirable to support proper video signal reconstruction at the decoder. For example, such data may include partition data, prediction data, residual blocks, and various flags that provide coding instructions to the decoder. The bitstream may be stored in memory for transmission to the decoder upon request. The bitstream may also be broadcast and / or multicast to multiple decoders. The generation of the bitstream is an iterative process. Therefore, steps 101, 103, 105, 107, and 109 may be performed sequentially and / or simultaneously across a large number of frames and blocks. The order shown in Figure 1 is presented for clarity and ease of discussion and is not intended to limit the video coding process to a specific order.
[0048] The decoder receives the bitstream and begins the decoding process in step 111. Specifically, the decoder uses an entropy decoding scheme to convert the bitstream into corresponding syntax and video data. The decoder uses the syntax data from the bitstream to determine the partitioning of the frame in step 111. The partitioning should match the result of block partitioning in step 103. The entropy encoding / decoding used in step 111 is described below. The encoder makes many choices during the compression process, such as selecting a block partitioning scheme from several possible choices based on the spatial positioning of values in the input image. Signaling the precise choice may involve using a number of bins. As used here, bins are binary values treated as variables (e.g., bit values that can change depending on the context). Entropy coding allows the encoder to discard any choice that is clearly not feasible in a particular case, while leaving a set of acceptable choices. Each acceptable choice is then assigned a codeword. The length of the codeword depends on the number of acceptable choices (e.g., one bin for two choices, two bins for three or four choices, etc.). The encoder then encodes a codeword for the selected choice. This scheme reduces the size of the codeword by being the desired size to uniquely represent a choice from a small subset of acceptable choices, as opposed to a codeword uniquely representing a choice from a potentially large set of all possible choices. The decoder then decodes the choice by determining a set of acceptable choices, in a similar manner to the encoder. By determining a set of acceptable choices, the decoder can read the codeword and determine the choice made by the encoder.
[0049] In step 113, the decoder performs block decoding. Specifically, the decoder uses an inverse transform to generate residual blocks. The decoder then uses the residual blocks and corresponding prediction blocks to reconstruct the image blocks according to the partitioning. The prediction blocks may include both intra-prediction blocks and inter-prediction blocks generated by the encoder in step 105. The reconstructed image blocks are then positioned within the frame of the reconstructed video signal according to the partitioning data determined in step 111. The syntax for step 113 may also be signaled in the bitstream via entropic coding, as described above.
[0050] In step 115, filtering is performed on the frames of the reconstructed video signal in a manner similar to that in step 107 in the encoder. For example, noise suppression filters, deblocking filters, adaptive loop filters, and SAO filters may be applied to the frames to remove blocking artifacts. Once the frames have been filtered, the video signal may be output to a display in step 117 for viewing by the end user.
[0051] Figure 2 is a schematic diagram of an example coding and decoding (codec) system 200 for video coding. Specifically, the codec system 200 provides functionality to support the implementation of operation method 100. The codec system 200 is generalized to represent components used in both encoder and decoder. The codec system 200 receives and partitions a video signal, as described with respect to steps 101 and 103 of operation method 100, thereby obtaining a partitioned video signal 201. When operating as an encoder, the codec system 200 then compresses the partitioned video signal 201 into a coded bitstream, as described with respect to steps 105, 107, and 109 of method 100. When operating as a decoder, the codec system 200 generates an output video signal from the bitstream, as described with respect to steps 111, 113, 115, and 117 of operation method 100. The codec system 200 includes a general-purpose coder control component 211, a transform scaling and quantization component 213, an intra-picture estimation component 215, an intra-picture prediction component 217, a motion compensation component 219, a motion estimation component 221, a scaling and inverse transform component 229, a filter control analysis component 227, an in-loop filter component 225, a decoding picture buffer component 223, and a header formatting and context adaptive binary arithmetic coding (CABAC) component 231. Such components are combined as shown. In Figure 2, black lines indicate the movement of data to be encoded / decoded, while dashed lines indicate the movement of control data that controls the operation of other components. The encoder may contain all the components of the codec system 200. The decoder may contain a subset of the components of the codec system 200.For example, the decoder may include an intra-picture prediction component 217, a motion compensation component 219, a scaling and inverse transform component 229, an in-loop filter component 225, and a decoding picture buffer component 223. These components will be described below.
[0052] A partitioned video signal 201 is a captured video sequence partitioned into blocks of pixels by a coding tree. The coding tree employs various splitting modes to subdivide blocks of pixels into smaller blocks. These blocks can then be further subdivided into even smaller blocks. Blocks are sometimes called nodes on the coding tree. Larger parent nodes are split into smaller child nodes. The number of times a node is subdivided is called the node / coding tree depth. Divided blocks may in some cases be contained within a coding unit (CU). For example, a CU may be a sub-part of a CTU that includes lumen blocks, red difference chroma (Cr) blocks, and blue difference chroma (Cb) blocks, along with the corresponding syntax instructions for the CU. Splitting modes may include binary trees (BT), ternary trees (TT), and quadary trees (QT) used to partition a node into two, three, or four child nodes, respectively, whose shape varies depending on the splitting mode used. The partitioned video signal 201 is transferred for compression to the general-purpose coder control component 211, the transformation scaling and quantization component 213, the intrapicture estimation component 215, the filter control analysis component 227, and the motion estimation component 221.
[0053] The general-purpose coder control component 211 is configured to make decisions regarding the coding of images in a video sequence into a bitstream, according to applicable constraints. For example, the general-purpose coder control component 211 manages the optimization of bitrate / bitstream size for reconstruction quality. Such decisions may be based on memory space / bandwidth availability and image resolution requirements. The general-purpose coder control component 211 also manages buffer utilization in relation to the transmission rate to mitigate buffer underrun and overrun problems. To manage these problems, the general-purpose coder control component 211 manages partitioning, prediction, and filtering by other components. For example, the general-purpose coder control component 211 may dynamically increase compression complexity to increase resolution and bandwidth utilization, or decrease compression complexity to decrease resolution and bandwidth utilization. Thus, the general-purpose coder control component 211 controls other components of the codec system 200 to balance video signal reconstruction quality and bitrate concerns. The general-purpose coder control component 211 generates control data that controls the operation of other components. The control data is also transferred to the header formatting and CABAC component 231 so that it is encoded into the bitstream to signal parameters for decoding by the decoder.
[0054] The partitioned video signal 201 is also sent to the motion estimation component 221 and the motion compensation component 219 for interprediction. A frame or slice of the partitioned video signal 201 may be divided into multiple video blocks. The motion estimation component 221 and the motion compensation component 219 perform interpredictive coding of the received video blocks for one or more blocks within one or more reference frames to provide time predictions. The codec system 200 may perform multiple coding passes, for example, to select an appropriate coding mode for each block of video data.
[0055] The motion estimation component 221 and the motion compensation component 219 may be highly integrated, but are represented separately for conceptual purposes. The motion estimation performed by the motion estimation component 221 is a process that generates motion vectors, thereby estimating the motion of video blocks. The motion vectors may, for example, represent the displacement of coded objects relative to predicted blocks. Predicted blocks are blocks that are deemed to be in exact agreement with the blocks to be coded, with respect to the pixel difference. Predicted blocks are sometimes called reference blocks. Such pixel differences may be determined by the sum of absolute difference (SAD), the sum of square difference (SSD), or other difference metrics. HEVC uses several coded objects, including CTUs, coding tree blocks (CTBs), and CUs. For example, a CTU may be split into CTBs, which may then be split into CBs for inclusion in CUs. CUs may be encoded as prediction units (PUs) containing prediction data and / or transformation units (TUs) containing transformed residual data for each CU. The motion estimation component 221 generates motion vectors, PUs, and TUs by using rate-distortion analysis as part of the rate-distortion optimization process. For example, the motion estimation component 221 may determine a number of reference blocks, a number of motion vectors, etc., for the current block / frame, and select the reference blocks, motion vectors, etc., that have the best rate-distortion characteristics. The best rate-distortion characteristics balance both the quality of video reconstruction (e.g., the amount of data loss due to compression) and coding efficiency (e.g., the size of the final encoding).
[0056] In some examples, the codec system 200 may calculate values for sub-integer pixel positions of the reference picture stored in the decoding picture buffer component 223. For example, the video codec system 200 may interpolate values for quarter-pixel, eighth-pixel, or other fractional pixel positions of the reference picture. Thus, the motion estimation component 221 may perform motion search on the full pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision. The motion estimation component 221 calculates motion vectors for the PU of video blocks in the intercoded slice by comparing the PU positions with the predicted block positions of the reference pixels. The motion estimation component 221 outputs the calculated motion vectors as motion data to the header formatting and CABAC component 231 for encoding, and as motion data to the motion compensation component 219.
[0057] Motion compensation performed by the motion compensation component 219 may include fetching or generating a predicted block based on a motion vector determined by the motion estimation component 221. As before, the motion estimation component 221 and the motion compensation component 219 may be functionally integrated in some examples. Upon receiving a motion vector for the PU of the current video block, the motion compensation component 219 may find the position of the predicted block pointed to by the motion vector. The residual video block is then formed by subtracting the pixel values of the predicted block from the pixel values of the current video block being coded to form a pixel difference value. Generally, the motion estimation component 221 performs motion estimation on the lumen component, and the motion compensation component 219 uses a motion vector calculated based on the lumen component for both the chromen and lumen components. The predicted block and residual block are then transferred to the transformation scaling and quantization component 213.
[0058] The partitioned video signal 201 is also sent to the intra-picture estimation component 215 and the intra-picture prediction component 217. Similar to the motion estimation component 221 and the motion compensation component 219, the intra-picture estimation component 215 and the intra-picture prediction component 217 may be highly integrated, but are represented separately for conceptual purposes. The intra-picture estimation component 215 and the intra-picture prediction component 217 intra-predict the current block for the block within the current frame as an alternative to the inter-prediction performed by the motion estimation component 221 and the motion compensation component 219 between frames as described above. In particular, the intra-picture estimation component 215 determines the intra-prediction mode to be used to encode the current block. In some examples, the intra-picture estimation component 215 selects an appropriate intra-prediction mode for encoding the current block from several tested intra-prediction modes. The selected intra-prediction mode is then forwarded to the header formatting and CABAC component 231 for encoding.
[0059] For example, the intrapicture estimation component 215 calculates rate-distortion values for various tested intra-prediction modes using rate-distortion analysis and selects the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between the encoded block and the original unencoded block encoded to produce the encoded block, in addition to the bit rate (e.g., number of bits) used to generate the encoded block. The intrapicture estimation component 215 calculates a ratio from the distortion and rate for various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for that block. Furthermore, the intrapicture estimation component 215 may be configured to code depth blocks of a depth map using a depth modeling mode (DMM) based on rate-distortion optimization (RDO).
[0060] The intrapicture prediction component 217, when implemented in an encoder, may generate residual blocks from prediction blocks based on a selected intrapicture prediction mode determined by the intrapicture estimation component 215, or, when implemented in a decoder, may read residual blocks from a bitstream. The residual blocks contain the difference in values between the prediction blocks and the original blocks, represented as a matrix. The residual blocks are then transferred to the transformation scaling and quantization component 213. The intrapicture estimation component 215 and the intrapicture prediction component 217 may act on both the lumen and chroma components.
[0061] The transformation scaling and quantization component 213 is configured to further compress the residual blocks. The transformation scaling and quantization component 213 applies a transformation, such as a discrete cosine transform (DCT), a discrete sine transform (DST), or a conceptually similar transformation, to the residual blocks to generate video blocks with residual transformation coefficient values. Wavelet transforms, integer transforms, subband transforms, or other types of transformations may also be used. The transformation may convert the residual information from the pixel value domain to a transformation domain, such as the frequency domain. The transformation scaling and quantization component 213 is also configured to scale the transformed residual information, for example, based on frequency. Such scaling involves applying a scale factor to the residual information so that different frequency information is quantized at different granularities, which may affect the final visual quality of the reconstructed video. The transformation scaling and quantization component 213 is also configured to quantize the transformation coefficients to further reduce the bitrate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be changed by adjusting the quantization parameters. In some examples, the transformation scaling and quantization component 213 may then perform a scan of a matrix containing the quantized transformation coefficients. The quantized transformation coefficients are then transferred to the header formatting and CABAC component 231 to be encoded into the bitstream.
[0062] The scaling and inverse transform component 229 applies the inverse operation of the transform scaling and quantization component 213 to support motion estimation. The scaling and inverse transform component 229 applies inverse scaling, transform, and / or quantization to reconstruct the residual block in the pixel region for later use as a reference block that may become a prediction block for other current blocks, for example. The motion estimation component 221 and / or motion compensation component 219 may compute the reference block by re-adding the residual block to the corresponding prediction block for use in motion estimation of subsequent blocks / frames. Filters are applied to the reconstructed reference block to mitigate artifacts that occurred during scaling, quantization, and transform. Such artifacts could otherwise lead to false predictions (and further artifacts) when subsequent blocks are predicted.
[0063] The filter-controlled analysis component 227 and the in-loop filter component 225 apply filters to residual blocks and / or reconstructed image blocks. For example, a transformed residual block from the scaling and inverse transform component 229 may be combined with the corresponding predictive block from the intra-picture prediction component 217 and / or motion compensation component 219 to reconstruct the original image block. The filter may then be applied to the reconstructed image block. In some examples, the filter may be applied to the residual block instead. Like the other components in Figure 2, the filter-controlled analysis component 227 and the in-loop filter component 225 are highly integrated and may be implemented together, but are represented separately for conceptual purposes. A filter applied to a reconstructed reference block is applied to a specific spatial region and includes several parameters to adjust how such a filter is applied. The filter-controlled analysis component 227 analyzes the reconstructed reference block and sets the corresponding parameters to determine where such a filter should be applied. Such data is transferred to the header formatting and CABAC component 231 as filter-controlled data for encoding. The in-loop filter component 225 applies such filters based on filter control data. These filters may include deblocking filters, noise suppression filters, SAO filters, and adaptive loop filters. Such filters may be applied, as examples, in the spatial / pixel domain (e.g., for reconstructed pixel blocks) or in the frequency domain.
[0064] When operating as an encoder, the filtered and reconstructed image blocks, residual blocks, and / or prediction blocks are stored in the decoding picture buffer component 223 for later use in motion estimation as described above. When operating as a decoder, the decoding picture buffer component 223 stores the reconstructed and filtered blocks and transfers them to the display as part of the output video signal. The decoding picture buffer component 223 may be any memory device capable of storing the prediction blocks, residual blocks, and / or reconstructed image blocks.
[0065] The header formatting and CABAC component 231 receives data from various components of the codec system 200 and encodes such data into a coded bitstream for transmission to the decoder. Specifically, the header formatting and CABAC component 231 generates various headers for encoding control data such as general control data and filter control data. Furthermore, prediction data, including intra-prediction and motion data, in addition to residual data in the form of quantized transformation coefficient data, is also encoded into the bitstream. The final bitstream contains all the information desired by the decoder to reconstruct the original partitioned video signal 201. Such information may also include an intra-prediction mode index table (also called a codeword mapping table), definitions of encoding contexts for various blocks, indications of the most likely intra-prediction mode, indications of partition information, etc. Such data may be encoded using entropy coding. For example, information may be encoded using context adaptive variable length coding (CAVLC), CABAC, syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or other entropy coding techniques. Following entropy coding, the coded bitstream may be transmitted to another device (e.g., a video decoder) or archived for later transmission or retrieval.
[0066] Figure 3 is a block diagram representing an example video encoder 300. The video encoder 300 may be used to implement the encoding function of the codec system 200 and / or to implement steps 101, 103, 105, 107, and / or 109 of the operation method 100. The encoder 300 partitions the input video signal to produce a partitioned video signal 301 which is substantially similar to the partitioned video signal 201. The partitioned video signal 301 is then compressed and encoded into a bitstream by the components of the encoder 300.
[0067] Specifically, the partitioned video signal 301 is transferred to the intra-picture prediction component 317 for intra-prediction. The intra-picture prediction component 317 may be substantially similar to the intra-picture estimation component 215 and the intra-picture prediction component 217. The partitioned video signal 301 is also transferred to the motion compensation component 321 for inter-prediction based on a reference block in the decoding picture buffer component 323. The motion compensation component 321 may be substantially similar to the motion estimation component 221 and the motion compensation component 219. The prediction block and residual block from the intra-picture prediction component 317 and the motion compensation component 321 are transferred to the transformation and quantization component 313 for transformation and quantization of the residual block. The transformation and quantization component 313 may be substantially similar to the transformation scaling and quantization component 213. The transformed and quantized residual block and the corresponding prediction block (along with associated control data) are transferred to the entropiccoding component 331 for coding into the bitstream. The entropic coding component 331 may be substantially similar to the header formatting and CABAC component 231.
[0068] The transformed and quantized residual blocks and / or corresponding prediction blocks are also transferred from the transform and quantization component 313 to the inverse transform and quantization component 329 for reconstruction into reference blocks used by the motion compensation component 321. The inverse transform and quantization component 329 may be substantially similar to the scaling and inverse transform component 229. The in-loop filter in the in-loop filter component 325 is also applied, by example, to the residual blocks and / or the reconstructed reference blocks. The in-loop filter component 325 may be substantially similar to the filter control analysis component 227 and the in-loop filter component 225. The in-loop filter component 325 may include multiple filters as described with respect to the in-loop filter component 225. The filtered blocks are then stored in the decoding picture buffer component 323 for use as reference blocks by the motion compensation component 321. The decoding picture buffer component 323 may be substantially similar to the decoding picture buffer component 223.
[0069] Figure 4 is a block diagram representing an example video decoder 400. The video decoder 400 may be used to implement the decoding function of the codec system 200 and / or to implement steps 111, 113, 115, and / or 117 of the operation method 100. The decoder 400 receives a bitstream from the encoder 300, for example, and generates a reconstructed output video signal based on the bitstream for display to the end user.
[0070] The bitstream is received by the entropy decoding component 433. The entropy decoding component 433 is configured to implement an entropy decoding scheme such as CAVLC, CABAC, SBAC, PIPE coding, or other entropy coding techniques. For example, the entropy decoding component 433 may use header information to provide context for interpreting additional data encoded as codewords within the bitstream. The decoded information may include any desirable information for decoding the video signal from the residual block, such as general control data, filter control data, partition information, motion data, prediction data, and quantized transformation coefficients. The quantized transformation coefficients are transferred to the inverse transform and quantization component 429 for reconstruction into the residual block. The inverse transform and quantization component 429 may be analogous to the inverse transform and quantization component 329.
[0071] The reconstructed residual blocks and / or predicted blocks are transferred to the intra-picture prediction component 417 for reconstruction into image blocks based on intra-prediction operation. The intra-picture prediction component 417 may be similar to the intra-picture estimation component 215 and the intra-picture prediction component 217. Specifically, the intra-picture prediction component 417 uses prediction mode to find the position of a reference block in the frame and applies the residual blocks to the result to reconstruct the intra-predicted image block. The reconstructed intra-predicted image block and / or residual block, along with the corresponding intra-prediction data, are transferred to the decoding picture buffer component 423 via the in-loop filter component 425. These may be substantially similar to the decoding picture buffer component 223 and the in-loop filter component 225, respectively. The in-loop filter component 425 filters the reconstructed image block, residual block, and / or predicted block, and this information is stored in the decoding picture buffer component 423. The reconstructed image blocks from the decoding picture buffer component 423 are transferred to the motion compensation component 421 for interpretation. The motion compensation component 421 may be substantially similar to the motion estimation component 221 and / or motion compensation component 219. Specifically, the motion compensation component 421 uses motion vectors from a reference block to generate a prediction block, and applies a residual block to the result to reconstruct the image block. The resulting reconstructed block may also be transferred to the decoding picture buffer component 423 via the in-loop filter component 425. The decoding picture buffer component 423 continues to store further reconstructed image blocks that can be reconstructed within a frame by partition information. Such frames may also be arranged in a sequence. The sequence is output to a display as a reconstructed output video signal.
[0072] Figure 5 is a schematic diagram representing an example bitstream 500 containing the encoded video sequence with HPS513. For example, bitstream 500 may be generated by codec system 200 and / or encoder 300 for decoding by codec system 200 and / or decoder 400. As another example, bitstream 500 may be generated by encoder in step 109 of method 100 for use by decoder in step 111.
[0073] The bitstream 500 includes a sequence parameter set (SPS) 510, multiple picture parameter sets (PPS) 512, multiple slice headers 514, and image data 520. The SPS 510 contains sequence data common to all pictures in the video sequence contained in the bitstream 500. Such data may include picture sizing, bit depth, coding tool parameters, bitrate limits, etc. The PPS 512 contains parameters specific to one or more corresponding pictures. Thus, each picture in the video sequence may refer to one PPS 512. The PPS 512 may indicate coding tools available for the tiles in the corresponding picture, quantization parameters, offsets, picture-specific coding tool parameters (e.g., filter controls), etc. The slice headers 514 contain parameters specific to one or more corresponding slices in a picture. Thus, each slice in the video sequence may refer to a slice header 514. The slice header 514 may include slice type information, picture order count (POC), reference picture list, predicted weights, tile entry point, deblocking parameters, etc. In some examples, slices may be called tile groups. In such cases, the slice header 514 may be called a tile group header.
[0074] Image data 520 includes video data encoded according to interprediction and / or intraprediction, along with corresponding transformed and quantized residual data. Such image data 520 is sorted according to the partitioning used to partition the image before encoding. For example, a video sequence is divided into pictures 521. Picture 521 is divided into slices 523. Slices 523 may be further divided into tiles and / or CTUs. CTUs are further divided into coding blocks based on a coding tree. Coding blocks can then be encoded / decoded according to the prediction mechanism. For example, picture 521 may contain one or more slices 523. Picture 521 refers to PPS 512, and slice 523 refers to slice header 514. Each slice 523 may contain one or more tiles. Each slice 523 and / or picture 521 may contain multiple CUUs in that case.
[0075] Each picture 521 may contain the entire set of visual data associated with the video sequence at the corresponding point in time. The VR system may display a user-selected region of picture 521, which creates the sense of being present in the scene represented by picture 521. The regions that the user may want to see are unknown when bitstream 500 is encoded. Thus, picture 521 may contain each possible region that the user might potentially see. However, in a VR context, the corresponding codec may be designed based on the assumption that the user will only see the selected region of picture 521, and the rest of picture 521 will be discarded.
[0076] Each slice 523 may be a rectangle defined by a CTU in the upper left corner and a CTU in the lower right corner. In some examples, slice 523 contains a sequence of tiles and / or CTUs in a raster scan order progressing from left to right and top to bottom. In other examples, slice 523 is a rectangular slice. A rectangular slice does not have to traverse the entire width of the picture according to the raster scan order. Instead, a rectangular slice may contain rectangular and / or square regions of picture 521 defined with respect to CTUs and / or tile rows and CTUs and / or tile columns. Slice 523 is the smallest unit that can be displayed separately by a decoder. Thus, slices 523 from picture 521 may be assigned to different sub-regions 522 to separately represent desired regions of picture 521. For example, in a VR context, picture 521 may contain a visible sphere of the entire data, but the user may only see sub-pictures 522 containing one or more slices 523 on a head-mounted display.
[0077] As described above, the video codec may assume that the unselected region of picture 521 should be discarded by the decoder. Therefore, a subbitstream 501 may be extracted from bitstream 500. The extracted subbitstream 501 may contain the selected subpicture 522 and associated syntax. The unselected region of picture 521 may be transmitted at a lower resolution or omitted to improve coding efficiency. Subpicture 522 is the selected region of picture 521 and may contain one or more associated slices 524. Slice 524 is a subset of slice 523 representing the selected region of picture 521 related to subpicture 522. Subbitstream 501 also contains SPS 510, PPS 512, slice header 514, and / or sub-parts thereof related to subpicture 522 and slice 524.
[0078] A sub-bitstream 501 may be extracted from bitstream 500. For example, a user using a decoder may view a segment of video. The user may select a corresponding area of picture 521. The decoder may request a subsequent sub-picture 522 related to the area the user is currently viewing. The encoder may then transfer the sub-picture 522 related to the selected area at a higher resolution, and the remaining area of picture 521 at a lower resolution. To enable such functionality, the decoder may extract 529 one or more sub-bitstreams 501 from bitstream 500. Extraction 529 includes placing the sub-picture 522 in the sub-bitstream 501, while including the slice 524 in the sub-picture 522. Extraction 529 also includes placing the associated SPS 510, PPS 512, and slice header 514 in the sub-bitstream as desired to assist in decoding the sub-picture 522 and slice 524.
[0079] One problem associated with the extraction 529 of subbitstream 501 is that the addressing associated with picture 521 may differ from the addressing associated with subpicture 522. The addressing problem will be discussed in more detail below. In some systems, the slice header 514 may be rewritten to accommodate such addressing mismatches. However, subbitstream 501 may contain a large number of slice headers 514 (e.g., one or two per picture 521), and such slice headers 514 are dynamically rewritten for each user. As such, rewriting the slice headers 514 in this manner can be very processor-intensive. The present disclosure includes a mechanism that allows the slice headers 514 to be extracted 529 into subbitstream 501 without rewriting the slice headers 514.
[0080] In a system that rewrites the slice header 514, slices 523 and 524 are addressed based on index values such as slice index, tile index, CTU index, etc. Such indices increase in value in the raster scan order. To correct addressing mismatches, the disclosed embodiments use a defined ID value for each slice, tile, and / or CTU. Such defined IDs may be default values and / or selected by the encoder. Defined IDs may increase in a consistent manner in the raster scan order, but such defined IDs do not have to be monotonically increasing. Thus, defined IDs may have gaps between values to allow address management. For example, indices may be monotonically increasing (e.g., 0, 1, 2, 3, etc.), while defined IDs may increase by some defined multiple (e.g., 0, 10, 20, 30, etc.). The encoder may include mappings 535 in bitstream 500 and sub-bitstream 501, which allows the decoder to map from defined IDs to indices that the decoder can interpret.
[0081] The parameter sets, such as SPS510 and / or PPS512, may include an ID flag 531. The ID flag 531 may be set to indicate that a mapping 535 is available to map slice addresses from a location based on picture 521 to a location based on sub-picture 522. Thus, the ID flag 531 may be set to indicate to the decoder that the disclosed mechanism is being used in bitstream 500 and sub-bitstream 501. For example, the ID flag 531 may be coded as an explicit tile ID flag, sps_subpic_id_present_flag, or other syntax element. The ID flag 531 may be encoded in bitstream 500 and extracted in sub-bitstream 501.
[0082] Parameter sets such as SPS510 and / or PPS512 may include the syntax element ID532. ID532 may indicate a subpicture 522 within picture 521. For example, an array of ID532 may be included in the PPS512 of bitstream 500. When subbitstream 501 is extracted 529, ID532 associated with the subpicture 522 to be sent to the decoder may be included in the PPS512 of subbitstream 501. In other examples, instructions for associated ID532 may be inserted into the PPS512 of subbitstream 501 to enable the decoder to determine the correct ID532. For example, ID532 may be coded as SubPicIdx, Tile_id_val[i], or other syntax element indicating the boundary of subpicture 522.
[0083] Parameter sets such as SPS510 and / or PPS512 may include a syntax element of slice address length 533. Furthermore, the slice header 514 may include the slice address 534 of slice 523. The slice address 534 is included as a defined ID value. The slice address 534 may be extracted directly into the slice header 514 in the subbitstream 501 without modification to avoid rewriting the slice header 514. For example, the slice address 534 may be coded as slice_address, tile_group_address, or other syntax element indicating the boundary between slices 523 and 524. The slice address length 533 may then be used to interpret the slice address 534. For example, the slice address 534 may be coded as a variable-length value that includes a value defined by the encoder, and therefore followed by a byte alignment field. The length 533 of the slice address may indicate the number of bits contained in the corresponding slice address 534, and thus indicate the boundaries of slice address 534 to the decoder. In such a case, the decoder can use the length 533 of the slice address (e.g., from PPS512) to interpret slice address 534. In such a case, the slice header 514 does not need to be rewritten to adjust the byte alignment field following slice address 534. For example, the length 533 of the slice address may be coded as subpic_id_len_minus1, tile_id_len_minus1, or other syntax element indicating the length 533 of the slice address. The length 533 of the slice address may be contained in the PPS512 of bitstream 500 and then extracted into the PPS512 of subbitstream 501.
[0084] The mapping 535 may also be transmitted in a parameter set such as SPS510, PPS512, and / or slice header 514. The mapping 535 indicates a mechanism for mapping slice addresses from a location based on picture 521 to a location based on subpicture 522. The mapping 535 may be encoded in bitstream 500 and extracted 529 into a corresponding parameter set in subbitstream 501. For example, the mapping 535 may be coded as a SliceSubpicToPicIdx[SubPicIdx][slice_address] syntax element, a tileIDToIdx[Tile_group_address] syntax element, or other syntax element indicating a mechanism for mapping slice addresses from a location based on picture 521 to a location based on subpicture 522.
[0085] Therefore, the decoder can read the subbitstream 501 and obtain the ID flag 531 to determine that slice 524 is addressed by a defined address instead of an index. The decoder can obtain the ID 532 to determine the subpicture 522 contained in the subbitstream 501. The decoder can also obtain the slice address 534 and the length 533 of the slice address to interpret the slice address 534. The decoder can then obtain the mapping 535 to map the slice address 534 into a format that the decoder can interpret. The decoder can then use the slice address 534 when decoding and displaying the subpicture 522 and the corresponding slice 524.
[0086] Figure 6 is a schematic diagram representing an exemplary picture 600 partitioned for coding. For example, picture 600 may be encoded into and decoded from bitstream 500 by, for example, a codec system 200, an encoder 300, and / or a decoder 400. Furthermore, picture 600 may be partitioned and / or contained within subpictures in subbitstream 501 to support encoding and decoding according to method 100.
[0087] Picture 600 can be partitioned into slice 623, which may be substantially the same as slice 523. Slice 623 may be further partitioned into tile 625 and CTU 627. In Figure 6, slices 623 are represented by thick lines alternating with white backgrounds and shaded areas to graphically distinguish them from each other. Tiles 625 are shown by dashed lines. The boundaries of tiles 625 located on the boundaries of slice 623 are represented by thick dashed lines, while the boundaries of tiles 625 not located on the boundaries of slice 623 are represented by thin dashed lines. The boundaries of CTU 627 are represented by thin solid lines, except where the boundaries of CTU 627 are covered by the boundaries of tile 625 or slice 623. In this example, picture 600 contains nine slices 623, 24 tiles 625, and 216 CTU 627.
[0088] As shown, slice 623 is a rectangle with boundaries that may be defined by the contained tile 625 and / or CTU 627. Slice 623 does not have to extend across the entire width of picture 600. Tile 625 may be generated in slice 623 according to rows and columns. CTU 627 may be partitioned from tile 625 and / or slice 623 to generate partitions of picture 600 suitable for subdividing into coding blocks for coding according to interpretation and / or intrapretation. Picture 600 may be encoded in a bitstream such as bitstream 500. Regions of picture 600 may be contained in subpictures and extracted into subbitstreams such as subpicture 522 and subbitstream 501, respectively.
[0089] Figure 7 is a schematic diagram representing an exemplary sub-picture 722 extracted from picture 700. For example, picture 700 may be substantially the same as picture 600. Furthermore, picture 700 may be encoded into bitstream 500 by, for example, codec system 200 and / or encoder 300. Sub-picture 722 may be extracted into sub-bitstream 501 by, for example, codec system 200, encoder 300 and / or decoder 400, and then decoded. Furthermore, picture 700 may be used to support encoding and decoding according to method 100.
[0090] As shown, picture 700 includes the upper left corner 702 and the lower right corner 704. Subpicture 722 includes one or more slices 723 from picture 700. When using an index, the upper left corner 702 and the lower right corner 704 are associated with the first and last indexes, respectively. However, the decoder may display only subpicture 722 and not the entire picture 700. Furthermore, the slice address 734 of the first slice 723a may not align with the upper left corner 702, and the slice address 734 of the third slice 723c may not align with the lower right corner 704. As such, the slice address 734 for subpicture 722 does not align with the slice address 734 for picture 700. This disclosure uses a defined ID for slice address 734 instead of an index. The decoder can use the mapping to map the slice address 734 from a position based on picture 700 to a position based on subpicture 722. The decoder can then use the mapped slice address 734 to place the first slice 723a in the upper left corner 702 of the decoder display, the third slice 723c in the lower right corner 704 of the decoder display, and the second slice 723b between the first slice 723a and the third slice 723c.
[0091] As described herein, this disclosure describes an improvement for explicit tile ID signaling in video coding where tiles are used for picture partitioning. The description of the technique is based on VVC by ITU-T and ISO / IEC JVET. However, the technique is also applicable to other video codec standards. The following are exemplary embodiments described herein.
[0092] The concepts of tile index and tile ID can be distinguished. A tile's tile ID may or may not be equal to its tile index. When the tile ID is different from the tile index, a mapping between the tile ID and the tile index may be signaled in the PPS. The tile ID may be used to signal the tile group address in the tile group header instead of using the tile index. In this way, the value of the tile ID can remain the same when the tile group is extracted from the original bitstream. This can be achieved by updating the mapping between the tile ID and the tile index in the PPS referenced by the tile group. This approach addresses the case where the value of the tile index may change depending on the subpicture to be extracted. It should be noted that other parameter sets (e.g., other than the slice header) may still be rewritten when performing subbitstream extraction based on MCTS.
[0093] The above can be achieved by using flags in a parameter set on which tile information is signaled. For example, PPS can be used as the parameter set. For example, explicit_tile_id_flag may be used for this purpose. explicit_tile_id_flag may be signaled regardless of the number of tiles in the picture and may indicate that explicit tile signaling is being used. Syntax elements may also be used to specify the number of bits for signaling the tile ID value (e.g., the mapping between tile index and tile ID). Such syntax elements may also be used to signal the tile ID / address in the tile group header. For example, the tile_id_len_minus1 syntax element may be used for this purpose. tile_id_len_minus1 may not be present if explicit_tile_id_flag is equal to zero (e.g., if the tile ID is set equal to the tile index). If tile_id_len_minus1 does not exist, its value may be inferred to be equal to the value of Ceil(Log2(NumTilesInPic)). A further constraint may require that the bitstream resulting from MCTS subbitstream extraction may contain an explicit_tile_id_flag set to 1 for the active PPS, as long as the subbitstream does not contain the top-left corner tile in the original bitstream.
[0094] In an exemplary embodiment, the video coding syntax may be modified as described below to achieve the functionality described herein. An exemplary CTB raster and tile scan process may be described as follows: A list of ctbAddrTs in the range of 0 to PicSizeInCtbsY-1, named TileId[CtbAddrTs], which identifies the conversion from CTB addresses to tile IDs in the tile scan, and a list of tileIdx in the range of 0 to PicSizeInCtbsY-1, named NumCtusInTile[tileIdx], which identifies the conversion from tile indices to the number of CTUs in a tile, are derived as follows:
number
[0095] A list of tileIdx, NumCtusInTile[tileIdx], which identifies the conversion from tile index to the number of CTUs in a tile and spans from 0 to PicSizeInCtbsY-1, can be derived as follows:
number
[0096] The set of NumTilesInPic tileId values, TileIdToIdx[tileId], which identifies the conversion from tile ID to tile index, can be derived as follows:
number
[0097] An example picture parameter set RBSP syntax could be written as follows: [Table 1]
[0098] An example tile group header syntax could be written as follows: [Table 2]
[0099] An example tile group data syntax could be written as follows: [Table 3]
[0100] An example picture parameter set RBSP semantics may be described as follows: Explicit_tile_id_flag set to 1 indicates that the tile ID of each tile is explicitly signaled. Explicit_tile_id_flag set to zero indicates that the tile ID is not explicitly signaled. For bitstreams resulting from MTCS subbitstream extraction, the value of explicit_tile_id_flag may be set to 1 for active PPS, unless the resulting bitstream does not include the top-left corner tile in the original bitstream. tile_id_len_minus1 plus1 specifies the number of bits used to represent the syntax elements tile_id_val[i] and tile_group_address in the tile group header referencing the PPS. The value of tile_id_len_minus1 may be between Ceil(Log2(NumTilesInPic)) and 15. If it does not exist, the value of tile_id_len_minus1 can be inferred to be equal to Ceil(Log2(NumTilesInPic)). It should be noted that in some cases the value of tile_id_len_minus1 may be greater than Ceil(Log2(NumTilesInPic)). This is because the current bitstream may be the result of MCTS subbitstream extraction. In that case, the tile ID, which can be the tile index in the original bitstream, may be represented by Ceil(Log2(OrgNumTilesInPic)) bits, where OrgNumTilesInPic is the NumTilesInPic of the original bitstream, and is greater than the NumTilesInPic of the current bitstream. tile_id_val[i] identifies the tile ID of the i-th tile of the picture referencing the PPS. The length of tile_id_val[i] is tile_id_len_minus1+1 bits.For any integers m and n in the range of 0 or greater and NumTilesInPic-1 or less, tile_id_val[m] does not have to be equal to tile_id_val[n] if m is not equal to n, and tile_id_val[m] may be less than tile_id_val[n] if m is less than n.
[0101] The following variables: ColWidth[i], a list of i values in the range of 0 to num_tile_columns_minus1, which specifies the width of the i-th tile column in the CTB unit; RowHeight[j], a list of j values in the range of 0 to num_tile_rows_minus1, which specifies the height of the j-th tile row in the CTB unit; ColBd[i], a list of i values in the range of 0 to num_tile_columns_minus1+1, which specifies the position of the i-th tile column boundary in the CTB unit; RowBd[j], a list of j values in the range of 0 to num_tile_rows_minus1+1, which specifies the position of the j-th tile row boundary in the CTB unit; CtbAd, a list of ctbAddrRs in the range of 0 to PicSizeInCtbsY-1, which specifies the conversion from the CTB address in the picture's CTB raster scan to the CTB address in the tile scan. drRsToTs[ctbAddrRs]; A list of ctbAddrTs ranging from 0 to PicSizeInCtbsY-1, identifying the conversion from the CTB address in the tile scan to the CTB address in the picture's CTB raster scan. CtbAddrTsToRs[ctbAddrTs]; A list of ctbAddrTs ranging from 0 to PicSizeInCtbsY-1, identifying the conversion from the CTB address in the tile scan to the tile ID. TileId[ctbAddrTs]; A list of tileIdx ranging from 0 to PicSizeInCtbsY-1, identifying the conversion from the tile index to the number of CTUs in the tile. NumCtusInTile[tileIdx]; A list of tileIdx ranging from 0 to PicSizeInCtbsY-1, identifying the conversion from the tile ID to the CTB address of the first CTB in the tile in the tile scan. FirstCtbAddrTs[tileIdx];The following can be derived by calling CTB raster and tile scan transformations: TileIdToIdx[tileID], a set of NumTilesInPic tileId values that specifies the conversion from tile ID to tile index; FirstCtbAddrTs[tileIdx], a list of tileIdx in the range of 0 to NumTilesInPic-1 that specifies the conversion from tile ID to CTB address of the first CTB in the tile in the tile scan; ColumnWidthInLumaSamples[i], a list for i in the range of 0 to num_tile_columns_minus1 that specifies the width of the i-th tile column in the unit of luma samples; and RowHeightInLumaSamples[j], a list for j in the range of 0 to num_tile_rows_minus1 that specifies the height of the j-th tile row in the unit of luma samples.
[0102] The tile_group_address identifies the tile ID of the first tile in a tile group. The length of the tile_group_address is tile_id_len_minus1+1 bits. The value of the tile_group_address may be between 0 and 2tile_id_len_minus1+1-1, and the value of the tile_group_address may not be equal to the value of the tile_group_address of any other coded tile group NAL unit in the same coded picture.
[0103] Figure 8 is a schematic diagram of an example video coding device 800. The video coding device 800 is suitable for implementing the examples / embodiments disclosed herein. The video coding device 800 has a transceiver unit (Tx / Rx) 810 including a downstream port 820, an upstream port 850, and / or a transmitter and / or receiver for communicating data upstream and / or downstream over a network. The video coding device 800 also includes a processor 830 including a logic unit and / or a central processing unit (CPU) for processing data, and a memory 832 for storing data. The video coding device 800 may also have electrical optic-electric (OE) components, electro-optic (EO) components, and / or wireless communication components coupled to the upstream port 850 and / or downstream port 820 for communicating data over an electrical, optical, or wireless communication network. The video coding device 800 may also include input and / or output (I / O) devices 860 for communicating data to and from a user. The I / O device 860 may include output devices such as a display for showing video data and speakers for outputting audio data. The I / O device 860 may also include input devices such as a keyboard, mouse, and trackball, and / or corresponding interfaces for interacting with such output devices.
[0104] The processor 830 is implemented by hardware and software. The processor 830 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and digital signal processors (DSPs). The processor 830 communicates with downstream ports 820, Tx / Rx 810, upstream port 850, and memory 832. The processor 830 has a coding module 814. The coding module 814 implements the above-disclosed embodiments, such as methods 100, 900, and 1000, which may use bitstream 500, picture 600, and / or picture 700. The coding module 814 may also implement any other methods / mechanisms described herein. Furthermore, the coding module 814 may implement a codec system 200, an encoder 300, and / or a decoder 400. For example, when operating as an encoder, the coding module 814 can identify flags, subpicture IDs, and length in the PPS. The coding module 814 can also encode the slice address in the slice header. The coding module 814 can then extract the subbitstreams of the subpictures from the bitstream of the picture without rewriting the slice header. When operating as a decoder, the coding module 814 can invoke flags to determine whether an explicit slice address is being used instead of an index. The coding module 814 can also read the length and subpicture IDs from the PPS, as well as the slice address from the slice header. The coding module 814 can then interpret the slice address using the length and map the slice address from the picture-based address to the subpicture-based address using the subpicture ID.As such, the coding module 814 can determine the desired location of a slice regardless of the selected subpicture and without requiring the slice header to be rewritten to adapt to address changes based on the subpicture. As such, the coding module 814 provides the video coding device 800 with additional functionality when partitioning and coding video data, avoids certain processes to reduce processing overhead, and / or improves coding efficiency. Thus, in addition to addressing problems specific to video coding technology, the coding module 814 also improves the functionality of the video coding device 800. Furthermore, the coding module 814 achieves transformations of the video coding device 800 into different states. Alternatively, the coding module 814 can be implemented as instructions stored in memory 832 and executed by processor 830 (for example, as a computer program product stored on a non-temporary medium).
[0105] Memory 832 has one or more memory types, such as disk, tape drive, solid-state drive, read-only memory (ROM), random-access memory (RAM), flash memory, ternary associative memory (TCAM), and static random-access memory (SRAM). Memory 832 may be used as an overflow data storage device to store such programs when they are selected for execution, and to store instructions and data read during program execution.
[0106] Figure 9 is a flowchart of an exemplary method 900 for encoding the bitstreams of a picture, such as bitstream 500 and picture 600, to assist in the extraction of subbitstreams of subpictures, such as subbitstream 501 and subpicture 522, respectively, without rewriting the slice header, by using explicit address signaling. Method 900 may be used by an encoder, such as a codec system 200, encoder 300, and / or video coding device 800, when performing method 100.
[0107] Method 900 may be initiated when an encoder receives a video sequence containing multiple pictures and decides, for example, based on user input, to encode the video sequence into a bitstream. The video sequence is partitioned into pictures / images / frames for further partitioning before encoding. In step 901, the pictures of the video sequence are encoded into a bitstream. A picture may contain multiple slices, including a first slice. The first slice may be any slice within the picture, but is referred to as the first slice for clarity of discussion. For example, the top-left corner of the first slice does not have to be aligned with the top-left corner of the picture.
[0108] In step 903, the slice header associated with the slice is encoded into the bitstream. The slice header contains the slice address of the first slice. The slice address may have a defined value, such as a number selected by the encoder. Such a value may be arbitrary, but may be incremented in the raster scan order (e.g., left to right and top to bottom) to support consistent coding functionality. The slice address may not have an index. In some examples, the slice address may be the slice_address syntax element.
[0109] In step 905, the PPS is encoded into the bitstream. The identifier and length of the slice address of the first slice may be encoded in the PPS within the bitstream. The identifier may be a subpicture identifier. The length of the slice address may indicate the number of bits contained in the slice address. For example, the length of the slice address in the PPS may contain enough data to interpret the slice address from the slice header coded in step 903. In some examples, the length may be the subpic_id_len_minus1 syntax element. Furthermore, the identifier may contain enough data to map the slice address from a picture-based location to a subpicture-based location. In some examples, the identifier may be the subPicIdx syntax element. For example, identifiers based on multiple subpictures may be included in the PPS. When a subpicture is extracted, the corresponding subpicture ID may be indicated in the PPS, for example, by using a flag / pointer and / or by excluding unused subpicture IDs. In some examples, an explicit ID flag may also be coded in the parameter set. The flag may indicate to the decoder that a mapping is available to map a slice address from a picture-based location to a subpicture-based location. In some examples, the mapping may be a SliceSubpicToPicIdx[SubPicIdx][slice_address] syntax element. Thus, the flag may indicate that the slice address is not an index. In some examples, the flag may be sps_subpic_id_present_flag.
[0110] In step 907, a sub-bitstream of the bitstream is extracted. For example, this may include extracting the first slice based on the slice address, length of the slice address, and identifier of the first slice, without rewriting the slice header. In a specific example, such extraction may also include extracting a sub-picture of a picture, in which case the sub-picture contains the first slice. A parameter set may also be included in the sub-bitstream. For example, the sub-bitstream may have a sub-picture, slice header, PPS, SPS, etc.
[0111] In step 909, the subbitstream is stored for communication to the decoder. The subbitstream may then be transmitted to the decoder, as desired.
[0112] Figure 10 is a flowchart of an exemplary method 1000 for decoding sub-bitstreams of subpictures, such as sub-bitstream 501 and sub-picture 522, extracted from the bitstream of a picture, such as bitstream 500 and picture 600, by using explicit address signaling. Method 1000 may be used by a decoder such as a codec system 200, a decoder 400, and / or a video coding device 800 when performing method 100.
[0113] Method 1000 may begin when the decoder begins receiving a sub-bitstream extracted from the bitstream, for example, as a result of Method 900. In step 1001, the sub-bitstream is received. The sub-bitstream contains sub-pictures of the picture. For example, a bitstream encoded by an encoder may contain a picture, and the sub-bitstream is extracted from the bitstream by the encoder and / or slicer, and the sub-bitstream contains sub-pictures that include one or more regions from the picture in the bitstream. The received sub-picture may be partitioned into multiple slices. The multiple slices may include a slice designated as the first slice. The first slice may be any slice in the picture, but is described as the first slice for clarity of discussion. As an example, the upper-left corner of the first slice does not have to be aligned with the upper-left corner of the picture. The sub-picture also includes a PPS that describes the syntax associated with the picture, and therefore also describes the syntax associated with the sub-picture. The sub-bitstream also includes a slice header that describes the syntax associated with the first slice.
[0114] In step 1003, the parameter sets such as PPS and / or SPS may be parsed to obtain an explicit ID flag. The ID flag may indicate that a mapping is available to map slice addresses from picture-based locations to sub-picture-based locations. Thus, the flag may indicate that the corresponding slice address has a defined value and no index. In some examples, the flag may be sps_subpic_id_present_flag. Based on the value of the ID flag, PPS may be parsed to obtain the identifier and length of the slice address of the first slice. The identifier may be a sub-picture identifier. The length of the slice address may indicate the number of bits contained in the corresponding slice address. For example, the length of the slice address in PPS may contain enough data to interpret the slice address from the slice header. In some examples, the length may be the subpic_id_len_minus1 syntax element. Furthermore, the identifier may contain enough data to map slice addresses from picture-based locations to sub-picture-based locations. In some examples, the identifier may be the subPicIdx syntax element. For example, identifiers based on multiple subpictures may be included in the PPS. When subpictures are extracted, the corresponding subpicture IDs may be indicated in the PPS, for example, by using flags / pointers and / or by excluding unused subpicture IDs.
[0115] In step 1005, the slice address of the first slice is determined from the slice header based on the identifier and the length of the slice address. For example, the length from the PPS may be used to determine the bit boundary for interpreting the slice address from the slice header. The identifier and slice address may then be used to map the slice address from the picture-based location to the sub-picture-based location. As an example, the mapping between the picture-based location and the sub-picture-based location may be used to align the slice header to the sub-picture. This allows the decoder to compensate for address mismatches between the slice header and the picture addressing scheme caused by sub-bitstream extraction without the need for the slice header to be rewritten by the encoder and / or slicer. In some examples, the mapping may be a SliceSubpicToPicIdx[SubPicIdx][slice_address] syntax element.
[0116] In step 1007, the subbitstream may be decoded to generate a video sequence of a subpicture. The subpicture may include a first slice. Therefore, the first slice is also decoded. The video sequence of the subpicture, including the decoded first slice, may then be transmitted for display by, for example, a head-mounted display or other display device.
[0117] Figure 11 is a schematic diagram of an exemplary system 1100 that transmits sub-bitstreams of subpictures, such as sub-bitstream 501 and sub-picture 522, extracted from the bitstream of a picture, such as bitstream 500 and picture 600, by using explicit address signaling. System 1100 may be implemented by encoders and decoders such as a codec system 200, encoder 300, decoder 400, and / or video coding device 800. Furthermore, system 1100 may be used to implement methods 100, 900, and / or 1000.
[0118] System 1100 includes a video encoder 1102. The video encoder 1102 has an encoding module 1101 that encodes a picture having multiple slices, including a first slice, into a bitstream, encodes a slice header containing the slice address of the first slice into the bitstream, and encodes a PPS containing the identifier and length of the slice address of the first slice into the bitstream. The video encoder 1102 further has an extraction module 1103 that extracts a sub-bitstream of the bitstream by extracting the first slice based on the slice address, length of the slice address, and identifier of the first slice without rewriting the slice header. The video encoder 1102 further has a storage module 1105 that stores the sub-bitstream for communication toward a decoder. The video encoder 1102 further has a transmission module 1107 that transmits the sub-bitstream containing the slice header, PPS, first slice, and / or corresponding sub-picture toward a decoder. The video encoder 1102 may be further configured to perform any of the steps of Method 900.
[0119] System 1100 also includes a video decoder 1110. The video decoder 1110 has a receiving module 1111 that receives a subbitstream including subpictures of a picture partitioned into multiple slices including a first slice, PPS associated with the picture and subpictures, and a slice header associated with the first slice. The video decoder 1110 further has a parsing module 1113 that parses the PPS to obtain the identifier and slice address length of the first slice. The video decoder 1110 further has a determination module 1115 that determines the slice address of the first slice from the slice header based on the identifier and slice address length. The video decoder 1110 further has a decoding module 1117 that decodes the subbitstream to generate a video sequence of the subpictures including the first slice. The video decoder 1110 further has a transfer module 1119 that transfers the video sequence of the subpictures for display. The video decoder 1110 may be further configured to perform any of the steps of method 1000.
[0120] If there are no intervening components other than lines, traces, or other media between the first and second components, the first component is directly joined to the second component. If there are other intervening components other than lines, traces, or other media between the first and second components, the first component is indirectly joined to the second component. The term "joined" and its variations include both direct and indirect joining. The use of the term "about" means a range including ±10% of the following number unless otherwise stated.
[0121] It should be understood that the steps of the exemplary methods described herein do not necessarily have to be performed in the order described, and the order of the steps in such methods is merely illustrative. Similarly, such methods may include additional steps, and certain steps may be omitted or combined in methods according to various embodiments of this disclosure.
[0122] While several embodiments are provided in this disclosure, it should be understood that the disclosed systems and methods may be embodied in numerous other specific forms without departing from the spirit or scope of this disclosure. These examples should be considered illustrative rather than restrictive, and the intent should not be limited to the details given herein. For example, various elements or components may be combined or integrated into other systems, or certain features may be omitted or not implemented.
[0123] Furthermore, technologies, systems, subsystems, and methods described and illustrated in various embodiments, either separately or distinctly, may be combined or integrated with other systems, components, technologies, or methods without exceeding the scope of this disclosure. Other examples of modifications, substitutions, and alternatives can be seen by those skilled in the art and may be made without departing from the spirit and scope disclosed herein.
Claims
1. A method implemented by the decoder, The system includes subpictures of a picture partitioned into multiple slices, a slice header associated with the first slice among the multiple slices, and a sequence parameter set (SPS), wherein the SPS receives a bitstream containing the subpicture identifier (ID) of the subpicture. The bitstream is parsed to obtain a parameter for deriving the length of the slice address of the first slice, and depending on the presence or absence of the parameter, the length of the slice address is presumed to be equal to Ceil(Log2(NumTilesInPic)), where NumTilesInPic includes the number of tiles in the picture, The slice address of the first slice is determined from the slice header based on the length of the slice address, Decoding the subpicture including the first slice based on the slice address and the subpicture ID. A method of having.
2. The length of the slice address indicates the number of bits included in the slice address. The method according to claim 1.
3. Determining the slice address of the first slice is The length is used to determine the bit boundary for interpreting the slice address from the slice header, The sub-picture ID and the slice address are used to map the slice address from the picture-based position to the sub-picture-based position. Having, The method according to claim 1 or 2.
4. The mapping between the position based on the picture and the position based on the subpicture aligns the slice header to the subpicture without requiring the slice header to be rewritten. The method according to claim 3.
5. A method performed by an encoder, This involves encoding a picture into a bitstream, wherein the picture has subpictures partitioned into multiple slices. The bitstream is encoded with parameters for deriving the length of the slice address of the first slice among the plurality of slices, where the length of the slice address is presumed to be equal to Ceil(Log2(NumTilesInPic)), and NumTilesInPic includes the number of tiles in the picture. The slice header of the bitstream encodes the slice address of the first slice based on the length of the slice address, and the slice header is associated with the first slice. Encoding the subpicture ID of the subpicture into the sequence parameter set (SPS) of the bitstream. A method of having.
6. The length of the slice address indicates the number of bits included in the slice address. The method according to claim 5.
7. The slice address has a defined value and does not have an index. The method according to claim 5 or 6.
8. The method further includes extracting subpictures of the aforementioned picture. The subpicture includes the first slice, The bitstream comprises the subpicture, the slice header including the slice address, and the parameter set including the parameters. The method according to any one of claims 5 to 6.
9. Processor and Memory and A receiver coupled to the aforementioned processor, The transmitter coupled to the aforementioned processor It has, The processor, the memory, the receiver, and the transmitter are configured to perform the method described in any one of claims 1 to 8. Video coding device.
10. A computer program used by a video coding device, When executed by a processor, the computer has a computer executable instruction to cause the video coding device to perform the method according to any one of claims 1 to 8. Computer program.
11. A receiving unit comprising a picture partitioned into multiple slices, a slice header associated with the first slice among the multiple slices, and a sequence parameter set (SPS), wherein the SPS is configured to receive a bitstream containing the subpicture identifier (ID) of the subpicture, A parsing unit configured to parse the bitstream to obtain a parameter for deriving the length of the slice address of the first slice, wherein, depending on the presence or absence of the parameter, the length of the slice address is presumed to be equal to Ceil(Log2(NumTilesInPic)), where NumTilesInPic includes the number of tiles in the picture, A determination unit configured to determine the slice address of the first slice from the slice header based on the length of the slice address, A decoding unit configured to decode the subpicture including the first slice based on the slice address and the subpicture ID. A decoder having
12. The decoder further includes a transfer unit configured to transfer the subpicture for display. The decoder according to claim 11.
13. It has an encoding unit, and the encoding unit is The picture is encoded into a bitstream, and the picture has subpictures partitioned into multiple slices. Within the bitstream, a parameter for deriving the length of the slice address of the first slice among the plurality of slices is encoded, and the length of the slice address is presumed to be equal to Ceil(Log2(NumTilesInPic)), where NumTilesInPic contains the number of tiles in the picture. The slice header of the bitstream encodes the slice address of the first slice based on the length of the slice address, and the slice header is associated with the first slice. The subpicture ID of the subpicture is encoded into the sequence parameter set (SPS) of the bitstream. An encoder configured in such a way.
14. The encoder further includes a storage unit configured to store the bitstream for communication with the decoder. The encoder according to claim 13.
15. When executed by a computer, the computer parses the bitstream, The aforementioned bitstream is Data representing sub-pictures of a picture partitioned into multiple slices, A slice header associated with the first slice among the plurality of slices, A sequence parameter set (SPS) including the subpicture identifier (ID) of the aforementioned subpicture, A parameter for deriving the length of the slice address of the first slice, In the process by which the computer parses the bitstream, The parameter is used to infer, depending on the presence or absence of the parameter, that the length of the slice address is equal to Ceil(Log2(NumTilesInPic)), where NumTilesInPic includes the number of tiles in the picture. The length of the slice address is used to determine the slice address of the first slice from the slice header. The slice address and the subpicture ID are used to decode the subpicture including the first slice. Computer program.
16. A method for transmitting an encoded bitstream of video data, Obtaining a bitstream generated by the method described in any one of claims 5 to 8 from a storage medium, Transmitting the aforementioned bitstream A method of having.
17. A system for transmitting an encoded bitstream of video data, A receiver configured to acquire a bitstream generated by the method described in any one of claims 5 to 8 from a storage medium, A transfer unit configured to transmit the bitstream and A system that has
18. A method for storing an encoded bitstream of video data, Receiving the bitstream generated by the method described in any one of claims 5 to 8, The bitstream is stored in a storage medium. A method of having.
19. A system for storing an encoded bitstream of video data, A receiver configured to receive the bitstream generated by the method described in any one of claims 5 to 8, A storage medium configured to store the bitstream A system that has