Scaled intra-reference picture
By reconstructing scaled and full-scale video pictures using intra-prediction and weighted averaging, the method addresses inefficiencies in existing video coding technologies, enhancing compression efficiency and quality across different spatial resolutions.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- TENCENT AMERICA LLC
- Filing Date
- 2023-10-04
- Publication Date
- 2026-07-01
AI Technical Summary
Existing video coding technologies face challenges in efficiently compressing video data while maintaining quality, particularly in handling spatial and temporal redundancies, especially when dealing with different spatial resolutions.
The method involves reconstructing a scaled version of a current picture from a first sub-bitstream and a full-scale version from a second sub-bitstream, using intra-prediction and interpolation to predict samples, and blending them with weighted averages, while also partitioning blocks based on distribution of non-zero residuals and entropy-decoding partitioning information.
This approach enhances video encoding/decoding efficiency by improving compression quality and reducing data volume, particularly in scenarios with varying spatial resolutions.
Smart Images

Figure 0007883667000002 
Figure 0007883667000003 
Figure 0007883667000004
Abstract
Description
Technical Field
[0001] [Related Applications] This application claims the benefit of priority of U.S. Patent Application No. 18 / 376,333, filed October 3, 2023, which claims the benefit of priority of U.S. Provisional Patent Application No. 63 / 425,557, filed November 15, 2022, entitled "Scaled Intra Reference Picture". The disclosures of the foregoing applications are hereby incorporated by reference in their entirety.
[0002] [Technical Field] The present disclosure generally describes embodiments related to video coding.
Background Art
[0003] The background description provided here is for the purpose of generally presenting the background of the present disclosure. The research of the presently named inventors, to the extent it is described in this background section, is not admitted as prior art to the present disclosure, either explicitly or implicitly, any more than is the description of aspects that may not be regarded as prior art at the time of filing.
[0004] Image / video compression helps transfer image / video data among various devices, storage, and networks while minimizing quality degradation. In some examples, video codec technology can compress video based on spatial and temporal redundancies. For example, a video codec can use a technique called intra prediction that can compress an image based on spatial redundancy. For example, in intra prediction, reference data from the current picture being reconstructed can be used for sample prediction. In another example, a video codec can use a technique called inter prediction that can compress an image based on temporal redundancy. For example, in inter prediction, motion compensation can be used to predict samples in the current picture from previously reconstructed pictures. Motion compensation is indicated by a motion vector (MV). [Overview of the Initiative]
[0005] The aspects of the disclosure include methods and apparatus for video encoding / decoding. In some examples, the apparatus for video decoding includes a processing circuit. The processing circuit receives a bitstream containing a first sub-bitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second sub-bitstream corresponding to a full-scale current picture, the full-scale having a second spatial resolution higher than the first spatial resolution. The processing circuit reconstructs the scaled version of the current picture from the first sub-bitstream and reconstructs a second block of the full-scale current picture based on (i) partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) intra-prediction information of one or more first blocks in the scaled version.
[0006] In the example, the first region in the scaled version of the current picture is in the same position as the second block in the full-scale version of the current picture.
[0007] In the example, the first sample in the second block can now be predicted based on the reconstructed sample in the first region of the scaled version of the picture. The first sample in the second block is in the same position as the reconstructed sample in the first region. The second sample in the second block can be predicted by interpolation based on at least the predicted first sample in the second block and the reconstructed sample in the upper left of the second block. The predictor for the second block includes (i) the predicted first sample in the second block, (ii) the reconstructed sample in the upper left of the second block, and (iii) the predicted second sample in the second block. The second block is reconstructed from the predictor for the second block.
[0008] In the example, the samples in the second block are predicted using intra-prediction. (i) the predicted samples in the second block and (ii) the corresponding upsampled reconstructed samples in the first region of the scaled version of the current picture can be blended using a weighted average. The weights of the blended samples in the second block depend on the position of the blended samples in the second block.
[0009] In the example, the reconstructed samples within the first region of the scaled version of the picture are upsampled. The upsampled samples within the first region of the scaled version are filtered. The second block can be reconstructed using the filtered upsampled samples within the first region as predictors for the second block.
[0010] In the example, the residuals of the samples in the second block can be predicted based on the residuals of the reconstructed samples in the first region. Whether the samples in the second block have non-zero residuals can be predicted based on the distribution of non-zero residuals in the first region. In the example, the distribution of non-zero residuals in the second block is arithmetic-decoded, and the distribution of non-zero residuals in the first region is used as the context for arithmetic decoding.
[0011] In the embodiment, the partitioning information of one or more first blocks indicates whether each of the one or more first blocks is divided into a smaller block. The second block can be reconfigured based on the partitioning information of one or more first blocks by deciding whether to partition the second block in the full-scale current picture based on the partitioning information of one or more first blocks. The processing circuit reconfigures the second block based on the decision whether to partition the second block.
[0012] In the example, the second sub-bitstream contains the flag for the second block. In response to the decision to partition the second block, the flag indicates whether to apply the partition to the block partitioned from the second block. In response to the decision to partition the second block, the flag indicates whether to apply the partition to the second block.
[0013] In the example, the partitioning information includes a flag for each of the first blocks, indicating whether or not each first block should be divided into smaller blocks. The partitioning information for the second block can be entropy-decoded, and the flags for each of the first blocks can be used as context for entropy decoding.
[0014] In the example, the intra-prediction information includes intra-prediction mode (IPM) information from one or more first blocks. The most probable mode (MPM) list in the second block can be constructed based on the IPM information from one or more first blocks. The second block can be reconstructed based on the MPM list.
[0015] In the example, the intra-prediction information includes reference line index information for one or more first blocks. The reference line index for the second block can be determined based on the reference line index information for one or more first blocks, and the second block can be reconstructed based on the reference line index for the second block.
[0016] In the example, the second sub-bitstream indicates that the intra-skip mode is used for the second block. The intra-prediction information indicates one of the prediction modes from one or more first blocks. The prediction modes from one or more first blocks can be used for the second block, and the second block can be reconstructed based on the prediction mode.
[0017] Aspects of the present disclosure also provide a non - transient computer - readable medium storing instructions that, when executed by a computer, cause the computer to execute a method for video encoding / decoding.
Brief Description of the Drawings
[0018] Further features, characteristics, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and the accompanying drawings.
[0019] [Figure 1] It is a schematic diagram of an exemplary block diagram of a communication system (100).
[0020] [Figure 2] It is a schematic diagram of an exemplary block diagram of a decoder.
[0021] [Figure 3] It is a schematic diagram of an exemplary block diagram of an encoder.
[0022] [Figure 4] Shows the position of spatial merge candidates according to an embodiment of the present disclosure.
[0023] [Figure 5] Shows candidate pairs considered for redundancy checking of spatial merge candidates according to an embodiment of the present disclosure.
[0024] [Figure 6] Shows an exemplary scaling of motion vectors for temporal merge candidates.
[0025] [Figure 7] Shows an exemplary candidate position for temporal merge candidates of the current CU.
[0026] [Figure 8] Shows an example of the Intra - template matching prediction (IntraTMP) mode according to an embodiment of the present disclosure.
[0027] [Figure 9] An example of a reference area for coding CTU(m,n) is shown.
[0028] [Figure 10] The full-scale current picture and a scaled version of the current picture according to embodiments of this disclosure are shown.
[0029] [Figure 11] A flowchart illustrating the decryption process according to a specific embodiment of this disclosure is shown.
[0030] [Figure 12] A flowchart illustrating the coding process according to a specific embodiment of this disclosure is shown.
[0031] [Figure 13] This is a schematic diagram of a computer system according to one embodiment. [Modes for carrying out the invention]
[0032] Figure 1 shows a block diagram of a video processing system (100) in several examples. The video processing system (100) is a video encoder and video decoder in a streaming environment, which is an example of the application of the subject matter of disclosure. The subject matter of disclosure is equally applicable to, for example, video conferencing, digital TV, streaming services, storage of compressed video on digital media including CDs, DVDs, memory sticks, etc., and other video-enabled applications.
[0033] The video processing system (100) includes a capture subsystem (113) which may include a video source (101), such as a digital camera, that generates, for example, an uncompressed video picture stream (102). In one example, the video picture stream (102) includes samples captured by the digital camera. The video picture stream (102) is shown in thick lines to emphasize its high data volume compared to encoded video data (104) (or encoded video bitstream) and may be processed by an electronic device (120) which includes a video encoder (103) coupled with the video source (101). The video encoder (103) may include hardware, software, or a combination thereof, and may enable or implement aspects of the subject matter of disclosure as detailed below. The encoded video data (104) (or encoded video bitstream) is shown in thin lines to emphasize its low data volume compared to the video picture stream (102) and may be stored in a streaming server (105) for future use. One or more streaming client subsystems, such as client subsystems (106) and (108) in Figure 1, can access a streaming server (105) to read copies (107) and (109) of encoded video data (104). Client subsystem (106) may include a video decoder (110) within, for example, an electronic device (130). The video decoder (110) decodes the input copy (107) of the encoded video data and generates an output video picture stream (111) that can be rendered on a display (112) (e.g., a display screen) or other rendering device (not shown). In some streaming systems, the encoded video data (104), (107), and (109) (e.g., video bitstreams) may be encoded according to a specific video coding / compression standard. An example of these standards is ITU-T Recommendation H.265. For example, a video coding standard under development is informally known as VVC (Versatile Video Coding). The subject of disclosure may be used in the context of VVC.
[0034] It should be noted that electronic devices (120) and (130) may include other components (not shown). For example, electronic device (120) may include a video decoder (not shown), and electronic device (130) may also include a video encoder (not shown).
[0035] Figure 2 shows an exemplary block diagram of a video decoder (210). The video decoder (210) may be included in an electronic device (230). The electronic device (230) may include a receiver (231) (e.g., a receiving circuit). In the example of Figure 1, the video decoder (210) can be used instead of the video decoder (110).
[0036] The receiver (231) can receive, for example, one or more coded video sequences in a bitstream, which are decoded by the video decoder (210). In embodiments, one coded video sequence is received at a time, and the decoding of each coded video sequence is independent of the decoding of other coded video sequences. The coded video sequences may be received from a channel (201), which may be a hardware / software link to a storage device that stores coded video data. The receiver (231) may receive coded video data together with other data, for example, coded audio data and / or auxiliary data streams that may be transferred to their respective usage entities (not shown). The receiver (231) may isolate the coded video sequences from other data. To eliminate network jitter, a buffer memory (215) may be coupled between the receiver (231) and the entropy decoder / parser (220) (hereinafter, "Parser (220)"). In certain applications, the buffer memory (215) is part of the video decoder (210). Alternatively, a buffer memory (not shown) may exist outside the video decoder (210) (not shown). Further, a buffer memory (not shown) may exist outside the video decoder (210), for example, to eliminate network jitter, or in addition to another buffer memory (215) inside the video decoder (210), for example, to handle playback timing. When the receiver (231) receives data controllably from a storage / transmission device with sufficient bandwidth, or from an isosynchronous network, the buffer memory (215) may not be necessary or can be made small. For use in best-effort packet networks such as the Internet, the buffer memory (215) may be required, may be relatively large, advantageously adaptive in size, and may be implemented at least partially in the operating system or a similar element (not shown) outside the video decoder (210).
[0037] The video decoder (210) may include a parser (220) to reconstruct symbols (221) from the coded video sequence. These categories of symbols include information used to manage the operation of the video decoder (210), and information for controlling rendering devices, such as a renderer (212) (e.g., a display screen), which may not be an integrated part of the electronic device (230) but may be coupled to the electronic device (230), as shown in Figure 2. The control information for the rendering device may be in the form of SEI (Supplemental Enhancement Information) messages or VUI (Video Usability Information) parameter set fragments (not shown). The parser (220) may parse / entropy decode the received coded video sequence. The coding of the coded video sequence may follow video coding techniques or standards and may follow various principles, including variable-length coding, Huffman coding, context-dependent or non-context-dependent arithmetic coding, etc. The parser (220) may extract from the coded video sequence a set of subgroup parameters for at least one of the subgroups of pixels in the video decoder, based on at least one parameter corresponding to that group. Subgroups may include GOP (Groups of Picture), picture, tile, slice, macroblock, coding unit (CU), block, transform unit (TU), prediction unit (PU), etc. The parser (220) may also extract information such as transformation coefficients, quantization parameter values, motion vectors, etc., from the coded video sequence.
[0038] The parser (220) may perform an entropy decoding / parse operation on the video sequence received from the buffer memory (215) to generate a symbol (221).
[0039] The reconstruction of symbol (221) may include multiple different units, depending on the type of coded video picture or part thereof (e.g., inter and intra pictures, inter and intra blocks) and other factors. Which units are included and how they are included can be controlled by group control information parsed by parser (220) from the coded video sequence. The flow of such subgroup control information between parser (220) and the following multiple units is not shown for clarity.
[0040] Beyond the functional blocks already mentioned, the video decoder (210) can be conceptually subdivided into numerous functional units, as described below. In actual implementations operating under commercial constraints, many of these units may interact closely with each other and be at least partially integrated. However, for the purpose of illustrating the subject of this disclosure, the following conceptual subdivision into functional units is appropriate.
[0041] The first unit is the scaler / inverse unit 251. The scaler / inverse unit (251) receives control information from the parser (220) as symbols (221), including quantized transformation coefficients and which transformation to use, block size, quantization coefficients, quantization scaling matrix, etc. The scaler / inverse unit (251) can output a block containing sample values that can be input to the aggregator (255).
[0042] In some cases, the output samples of the scaler / inverse unit (251) may relate to intracoded blocks. Intracoded blocks are blocks that do not use prediction information from previously reconstructed pictures but can use prediction information from portions reconstructed before the current picture. Such prediction information can be provided by the intrapicture prediction unit (252). In some cases, the intrapicture prediction unit (252) generates a block of the same size and shape as the block being reconstructed, using the surrounding already reconstructed information fetched from the current picture buffer (258). The current picture buffer (258) buffers, for example, the reconstructed current picture partially and / or completely. In some cases, the aggregator (255) adds the prediction information generated by the intraprediction unit (252) to the output sample information provided by the scaler / inverse unit (251) on a sample-by-sample basis.
[0043] In other cases, the output samples of the scaler / inverse unit (251) may be associated with an interconnected, and possibly motion-compensated, block. In such cases, the motion-compensated prediction unit (253) can access the reference picture memory (257) to fetch samples to be used for prediction. After motion-compensating the fetched samples according to the symbols (221) associated with the block, these samples may be added to the output of the scaler / inverse unit (251) by the aggregator (255) to generate output sample information (in this case, called residual samples or residual signals). The addresses in the reference picture memory (257) from which the motion-compensated prediction unit (253) fetches prediction samples can be controlled by the motion vectors available to the motion-compensated prediction unit (253), for example, in the form of symbols (221) which may have X, Y, and reference picture components. Motion compensation may include interpolation of sample values fetched from the reference picture memory (257) when the exact motion vectors of subsamples are in use, motion vector prediction mechanisms, etc.
[0044] The output samples of the aggregator (255) can undergo various loop filtering techniques in the loop filter unit (256). Video compression techniques may include in-loop filtering techniques controlled by parameters contained in the coded video sequence (also called the coded video bitstream), which are made available to the loop filter unit (256) as symbols (221) from the parser (220). Video compression may respond not only to previously reconstructed and loop-filtered sample values, but also to metadata obtained during decoding of earlier portions (in decoding order) of the coded picture or coded video sequence.
[0045] The output of the loop filter unit (256) may be a sample stream that can be output to the renderer (212) and stored in the reference picture memory (257) for use in future interpicture prediction.
[0046] A specific coded picture, once fully reconfigured, can be used as a reference picture for future predictions. For example, once the coded picture corresponding to the current picture is fully reconfigured and the coded picture is identified as a reference picture (e.g., by the parser (220)), the current picture buffer (258) can become part of the reference picture memory (257) and a fresh current picture buffer can be reallocated before starting the reconfiguration of subsequent coded pictures.
[0047] The video decoder (210) may perform decoding operations in accordance with a standard such as ITU-T Rec.H.265 or a specified video compression technology. The coded video sequence may follow the syntax specified by the video compression technology or standard in use, in the sense that the coded video sequence conforms to both the video compression technology or standard and the profile documented in the video compression technology or standard. Specifically, the profile may select certain tools from all the tools available in the video compression technology or standard as tools that are only usable under the profile. Also, compliance may require that the complexity of the coded video sequence be within the limits set by the level of the video compression technology or standard. In some cases, the level limits the maximum picture size, maximum frame rate, maximum reconstruction sample rate (e.g., measured in megasamples / second), maximum reference picture size, etc. The limits set by the level may, in some cases, be further restricted through the HRD (Hypothetical Reference Decoder) specification and metadata for HRD buffer management signaled in the coded video sequence.
[0048] In the embodiment, the receiver (231) may receive additional (redundant) data along with the encoded video. The additional data may be included as a portion of the coded video sequence. The additional data may be used by the video decoder (210) to correctly decode the data and / or to more accurately reconstruct the original video data. The additional data may take the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) extension layers, redundant slices, redundant pictures, forward error correction codes, etc.
[0049] Figure 3 shows an exemplary block diagram of a video encoder (303). The video encoder (303) is included in the electronic device (320). The electronic device (320) includes a transmitter (340) (e.g., a transmitting circuit). The video encoder (303) can be used in place of the video encoder (103) in the example of Figure 1.
[0050] The video encoder (303) may receive video samples from a video source (301) (not part of the electronic device (320) in the example in Figure 3) that can capture video images to be coded by the video encoder (303). In another example, the video source (301) is part of the electronic device (320).
[0051] The video source (301) may provide a source video sequence to be coded by the video encoder (303) in the form of a digital video sample stream of any appropriate bit depth (e.g., 8-bit, 10-bit, 12-bit, ...), any color space (e.g., BT.601 Y CrCB, RGB, ...), and any appropriate sampling structure (e.g., YCrCb 4:2:0, YCrCb 4:4:4). In a media delivery system, the video source (301) may be a storage device that stores previously prepared video. In a video conferencing system, the video source (301) may be a camera that captures local image information as a video sequence. The video data may be provided as a series of individual pictures that give motion when viewed sequentially. The pictures themselves may be organized as a spatial array of pixels. Each pixel may contain one or more samples, depending on the sampling structure, color space, etc., in use. The following description will focus on samples.
[0052] According to the embodiment, the video encoder (303) may encode and compress the pictures of the source video sequence into a coded video sequence (343) in real time or under any other required time constraints. Implementing an appropriate coding speed is one function of the control unit (350). In some embodiments, the control unit (350) controls and is functionally coupled to other functional units, which are described later. The couplings are not shown for clarity. Parameters set by the control unit (350) may include rate control-related parameters (picture skip, quantizer, lambda value of rate distortion optimization technique, ...), picture size, GOP (group of pictures) layout, maximum motion vector search range, etc. The control unit (350) may be configured to have other appropriate functions related to the video encoder (303) optimized for a particular system design.
[0053] In some embodiments, the video encoder (303) is configured to operate within a coding loop. In a very simplified explanation, one example might include a source coder (330) (responsible for generating symbols, such as a symbol stream, based on the input and reference pictures to be coded), and a (local) decoder (333) built into the video encoder (303). The decoder (333) reconstructs the symbols to create sample data in a similar manner to that created by a (remote) decoder. The reconstructed sample stream (sample data) is fed into the reference picture memory (334). The contents of the reference picture memory (334) are also bit-accurate between the local and remote encoders, when decoding the symbol stream yields bit-accurate results independently of the decoder location (local or remote). In other words, the predictive portion of the encoder "sees" the exact same sample values as the decoder "sees" when using predictions during decoding, as reference picture samples. This fundamental principle of reference picture synchronization (and the resulting drift, for example, when synchronization cannot be maintained due to channel errors) is used similarly in several related technologies.
[0054] The operation of a “local” decoder (333) may be the same as that of a “remote” decoder, such as the video decoder (210) detailed above in relation to Figure 2. However, also briefly referring to Figure 2, since symbols are available and the encoding / decoding of symbols to the coded video sequence by the entropy coder (345) and parser (220) may be lossless, the entropy decoding unit of the video decoder (210), including the buffer memory (215) and parser (220), may not be fully implemented in the local decoder (333).
[0055] In the embodiments, the decoder techniques present in the decoder, excluding analysis / entropy decoding, are present in the corresponding encoder in the same or substantially the same functional form. Therefore, the subject matter of the disclosed information focuses on the decoder operation. A description of the encoder techniques can be omitted, as they are the reverse of the decoder techniques from which they are comprehensively described. More detailed descriptions in specific areas are provided below.
[0056] During operation, in some examples, the source coder (330) may perform motion-compensated predictive coding. This predictively codes the input picture by referencing one or more previously coded pictures from a video sequence designated as “reference pictures”. In this method, the coding engine (332) codes the difference between the pixel blocks of the input picture and the pixel blocks of the reference picture which may be selected as the predictive criterion for the input picture.
[0057] The local video decoder (333) may decode the coded video data of a picture that may be designated as a reference picture, based on symbols generated by the source coder (330). The operation of the coding engine (332) may, advantageously, be lossy. When the coded video data can be decoded by a video decoder (not shown in Figure 3), the reconstructed video sequence may, as a standard, be a copy of the source video sequence with some errors. The local video decoder (333) may duplicate the decoding process that may be performed by the video decoder on the reference picture, resulting in a reconstructed reference picture to be stored in the reference picture memory (334). Thus, the video encoder (303) may store a copy of the reconstructed reference picture that has the same content as the reconstructed reference picture obtained by the far-end video decoder (if there are no transmission errors).
[0058] The predictor (335) may perform a predictive search for the coding engine (332). That is, for a new picture to be coded, the predictor (335) may search the reference picture memory (334) for sample data (such as candidate reference pixel blocks) or specific metadata such as reference picture motion vectors, block shapes, etc., which could serve as appropriate predictive criteria for the new picture. The predictor (335) may operate sample block-pixel block by sample block to find appropriate predictive criteria. In some examples, the input picture may have predictive criteria drawn from multiple reference pictures stored in the reference picture memory (334), as determined by the search results obtained by the predictor (335).
[0059] The control unit (350) may manage the coding operations of the source coder (330), including, for example, setting parameters and subgroup parameters used for encoding video data.
[0060] The outputs of all the aforementioned functional units may undergo entropy coding in the entropy coder (345). The entropy coder (345) converts the symbols generated by the various functional units into coded video sequences by applying lossless compression to the symbols according to techniques such as Huffman coding, variable-length coding, arithmetic coding, etc.
[0061] The transmitter (340) may buffer the coded video sequence generated by the entropy coder (345) in preparation for transmission over a communication channel (360), which may be a hardware / software link to a storage device capable of storing coded video data. The transmitter (340) may merge the coded video data from the video encoder (303) with other data to be transmitted, such as coded audio data and / or auxiliary data streams (sources not shown).
[0062] The control unit (350) may manage the operation of the video encoder (303). During coding, the control unit (350) may assign each coded picture a specific coded picture type that may affect the coding techniques that can be applied to each picture. For example, a picture may often be assigned as one of the following picture types:
[0063] An intra-picture (I-picture) may be encoded and decoded without using any other pictures in the sequence as a source for prediction. Some video codecs allow different types of intra-pictures, including, for example, IDR (Independent Decoder Refresh) pictures.
[0064] A predictive picture (P-picture) may, in most cases, be a picture that can be coded and decoded using intra-prediction or inter-prediction with motion vectors and reference indices to predict the sample values of each block.
[0065] A bidirectionally predictive picture (B-picture) may be coded and decoded using intra-prediction or inter-prediction with two motion vectors and a reference index to predict the sample values for each block. Similarly, a multi-predictive picture may use two or more reference pictures and associated metadata for the reconstruction of a single block.
[0066] A source picture may generally be spatially subdivided into multiple sample blocks (e.g., blocks of 4x4, 8x8, 4x8, or 16x16 samples each), and each block may be coded. Blocks may be coded predictively by references to other (already coded) blocks, determined by the coding assignments applied to each picture in the block. For example, blocks of picture I may be coded non-predictively, or they may be coded predictively by referencing already coded blocks of the same picture (spatial prediction or intra-prediction). Pixel blocks of picture P may be coded predictively via spatial prediction or temporal prediction by referencing one previously coded reference picture. Blocks of picture B may be coded predictively via spatial prediction or temporal prediction by referencing one or two previously coded reference pictures.
[0067] The video encoder (303) may perform coding operations in accordance with a predetermined video coding technique or standard, such as ITU-T Rec.H.265. In these operations, the video encoder (303) may perform various compression operations, including predictive coding operations that utilize temporal and spatial redundancy in the input video sequence. The coded video data may, therefore, conform to the syntax specified by the video coding technique or standard being used.
[0068] In one embodiment, the transmitter (340) may transmit additional data along with the encoded video. The source coder (330) may include such data as part of the encoded video sequence. The additional data may include time / space / SNR extension layers, other forms of redundant data such as redundant pictures and slices, SEI messages, VUI parameter set fragments, etc.
[0069] Video may be captured as multiple source pictures (video pictures) in a time series. Intra-picture prediction (sometimes abbreviated as intra-prediction) utilizes spatial correlations within a given picture, while inter-picture prediction utilizes (temporal or other) correlations between pictures. In one example, a particular picture being encoded / decoded is called the current picture and is partitioned into blocks. When a block in the current picture is similar to a reference block in a previously coded and still buffered reference picture in the video, the block in the current picture can be coded by a vector called a motion vector. The motion vector points to a reference block in the reference picture and may have a third dimension to identify the reference picture if multiple reference pictures are in use.
[0070] In some embodiments, bi-prediction techniques can be used for interpicture prediction. According to bi-prediction techniques, two reference pictures are used, such as a first reference picture and a second reference picture, both of which are earlier in the video in decoding order than the current picture (but may be earlier and later in display order, respectively). Blocks in the current picture can be coded by a first motion vector pointing to a first reference block in the first reference picture and a second motion vector pointing to a second reference block in the second reference picture. Blocks can be predicted by combining the first and second reference blocks.
[0071] Furthermore, merge mode techniques can be used in interpicture prediction to improve coding efficiency.
[0072] According to some embodiments of this disclosure, predictions such as inter-picture prediction and intra-picture prediction are performed within units of blocks. For example, according to the HEVC standard, pictures in a video-picture sequence are partitioned into coding tree units (CTUs) for compression. CTUs within a picture have the same size, such as 64x64 pixels, 32x32 pixels, or 16x16 pixels. Typically, a CTU contains three coding tree blocks (CTBs), i.e., one lumen CTB and two chroma CTBs. Each CTU can be recursively quad-tree partitioned into one or more coding units (CUs). For example, a 64x64 pixel CTU can be partitioned into one 64x64 pixel CU, or four 32x32 pixel CUs, or sixteen 16x16 pixel CUs. In one example, each CU is analyzed to determine the prediction type of the CU, such as inter-prediction type or intra-prediction type. A CU is divided into one or more prediction units (PUs) depending on its temporal and / or spatial predictability. Typically, each PU contains a chroma prediction block (PB) and two chroma PBs. In one embodiment, the prediction operation in coding (encoding / decoding) is performed within the units of the prediction block. Using a chroma prediction block as an example of a prediction block, the prediction block contains a matrix of values (e.g., chroma values) for pixels such as 8x8 pixels, 16x16 pixels, 8x16 pixels, 16x8 pixels, etc.
[0073] It should be noted that the video encoders (103) and (303), and the video decoders (110) and (210) can be implemented using any suitable technique. In one embodiment, the video encoders (103) and (303), and the video decoders (110) and (210) can be implemented using one or more integrated circuits. In another embodiment, the video encoders (103) and (303), and the video decoders (110) and (210) can be implemented using one or more processors that execute software instructions.
[0074] VVC allows the use of various interprediction modes. For interpredicted CUs, motion parameters can include additional information such as the MV, one or more reference picture indices, reference picture list usage indices, and specific coding features used to generate interpredicted samples. Motion parameters can be communicated explicitly or implicitly. If a CU is coded in skip mode, it can be associated with a PU and may not have significant residual coefficients, coded motion vector delta or MV difference (e.g., MVD), or reference picture indices. Merge mode can be specified when the motion parameters of a current CU are obtained from neighboring CUs, including spatial and / or temporal candidates, and optionally additional information such as that introduced in VVC. Merge mode can be applied to interpredicted CUs as well as skip mode. In the example, an alternative to merge mode is the explicit transmission of motion parameters, where the MV, the corresponding reference picture indices for each reference picture list, and reference picture list usage flags and other information are explicitly signaled for each CU.
[0075] In embodiments such as VVC, the VVC Test Model (VTM) reference software includes enhanced merge prediction, merge motion vector difference (MMVD) mode, adaptive motion vector prediction (AMVP) mode using symmetric MVD signaling, affine motion compensation prediction, subblock-based temporal motion vector prediction (SbTMVP), adaptive motion vector resolution (AMVR), motion field storage (1 / 16 lumasample MV storage and 8x8 motion field compression), bi-prediction with CU-level weights (BCW), bi-directional optical flow (BDOF), prediction refinement using optical flow (PROF), decoder-side motion vector refinement (DMVR), and combined inter and intra It includes one or more refined interpredictive coding tools, such as prediction (CIIP) and geometric partitioning mode (GPM). The following describes in detail the methods associated with interpredictive coding.
[0076] In some cases, extended merge prediction can be used. For example, in VTM4, such a merge candidate list consists of the following five types of candidates, in order: spatial MVP from spatially neighboring CUs, temporal MVP from CUs at the same location, historical MVP (HMVP) from a FIFO (first-in-first-out) table, average MVP per pair, and zero MV.
[0077] The size of the merge candidate list can be signaled in the slice header. In the example, the maximum allowed size of the merge candidate list is 6 in VTM4. For each CU coded in merge mode, the index of the best merge candidate (e.g., the merge index) can be coded using truncated unary binarization (TU). The first bin of the merge index can be coded in context (e.g., context-adaptive binary arithmetic coding (CABAC)), and bypass coding can be used for other bins.
[0078] The following are some examples of the generation process for each category of merge candidates. In the embodiment, the spatial candidates are derived as follows. The derivation of spatial merge candidates in VVC can be the same as the derivation in HEVC. In the example, up to four merge candidates are selected from the candidates located in the positions shown in Figure 4. Figure 4 shows the positions of the spatial merge candidates according to the embodiment of the present disclosure. Referring to Figure 4, the derivation order is B1, A1, B0, A0, B2. Position B2 is considered only if the CUs at positions A0, B0, B1, and A1 are unavailable (e.g., because the CUs belong to a different slice or tile) or are intracoded. After the candidate at position A1 is added, the addition of the remaining candidates is subject to redundancy checks, which ensure that candidates with the same motion information are excluded from the candidate list, thus improving coding efficiency.
[0079] To reduce computational complexity, not all possible candidate pairs are considered in the redundancy check described above. Instead, only the pairs linked by the arrows in Figure 5 are considered, and candidates are added to the candidate list only if the corresponding candidates used in the redundancy check do not have the same motion information. Figure 5 shows candidate pairs considered for redundancy checking of spatial merge candidates according to one embodiment of the present disclosure. Referring to Figure 5, the pairs linked by each arrow include A1 and B1, A1 and A0, A1 and B2, B1 and B0, and B1 and B2. This allows for comparison of candidates at positions B1, A0, and / or B2 with candidates at position A1, and candidates at positions B0 and / or B2 with candidates at position B1.
[0080] In the embodiment, the temporal candidates are derived as follows. In the example, only one temporal merge candidate is added to the candidate list. Figure 6 shows the scaling of exemplary motion vectors for the temporal merge candidate. To derive the temporal merge candidate for the current CU(611) in the current picture(601), the scaled MV(621) (e.g., shown by the dotted line in Figure 6) can be derived based on the CU(612) belonging to the CU(604) at the same location. In the example, the CU(604) at the same location (also called the CU(604)) is, for example, a specific reference picture used for temporal motion vector prediction. The CU(604) at the same location used for temporal motion vector prediction can be indicated by a reference index in syntax such as high-level syntax (e.g., picture header, slice header).
[0081] The reference picture list used to derive the CU(612) at the same location can be explicitly signaled in the slice header. The scaled MV(621) for the temporal merge candidate can be obtained as shown by the dotted line in Figure 6. The scaled MV(621) can be scaled from the MV of the CU(612) at the same location using picture order count (POC) distances tb and td. The POC distance tb can be defined as the POC difference between the current reference picture (602) and the current picture (601) of the current picture (601). The POC distance td can be defined as the POC difference between the reference picture (604) and the reference picture (603) at the same location of the reference picture (603). The reference picture index of the temporal merge candidate can be set to 0.
[0082] Figure 7 shows exemplary candidate positions (e.g., C0 and C1) for a temporal merge candidate for the current CU. The position of the temporal merge candidate can be selected from candidate positions C0 and C1. Candidate position C0 is located at the lower right corner of the CU(710) at the same position in the current CU. Candidate position C1 is located at the center of the CU at the same position in the current CU. If the CU at candidate position C0 is unavailable, intracoded, or outside the current row of the CTU, the temporal merge candidate is derived using candidate position C1. Otherwise, for example, if the CU at candidate position C0 is available, intracoded, and in the current row of the CTU, the temporal merge candidate is derived using candidate position C0.
[0083] Intra-block copy (IBC) mode can be used in video coding such as HEVC and VVC. In the example, as in HEVC, the IBC concept requires additional memory in the DPB, and hardware implementations use external memory. Additional external memory access comes with increased memory bandwidth. In the example, as in VVC, IBC mode uses fixed memory, which can be implemented using on-chip memory, significantly reducing memory bandwidth requirements and hardware complexity. Reference sample memory (RSM) can be used to hold samples for a single CTU. A special feature of RSM is its continuous update mechanism, which replaces the reconstructed sample of the current CTU with the reconstructed sample of the left neighboring CTU. Block vector (BV) coding in IBC employs the concept of a merge list for mutual prediction. The IBC list construction process considers two spatial neighbor BVs and five history-based BVs (HBVPs). In the example, only the first HBVP is compared to the spatial candidate when added to the candidate list. Conventional interpretation uses two different candidate lists, one for merge mode and the other for normal mode, but the IBC candidate list is for both cases. In merge mode, up to six candidates from the list may be used, while in normal mode, only the twelfth candidate is used. Block vector difference (BVD) coding employs motion vector difference (MVD) processing to make the final BV of any size. The reconstructed BV may point to regions outside the reference sample region and requires correction by removing absolute offsets in each direction using modulo operations of the width and height of the RSM.
[0084] Figure 8 shows an example of an Intra-Template Matching Prediction (IntraTMP) mode according to one embodiment of the present disclosure. In this embodiment, as in ECM software, IntraTMP is a special intra-prediction mode that can copy a best-prediction block (821) from the reconstructed portion of the current frame (or current picture), and the template (e.g., an L-shaped template) (820) of the best-prediction block (821) may match the current template (810) of the current block (811). For a given search range, the encoder can search for the template (820) that is most similar to the current template (810) in the reconstructed portion of the current frame, and the corresponding block (821) can be used as the prediction block. The encoder can signal the use of the IntraTMP mode, and the same prediction operation can be performed on the decoder side.
[0085] The prediction signal can be generated by matching a current template (810), such as an L-shaped causal neighborhood of the current block (811), with a template (e.g., (820)) of another block (e.g., (821)) within a given search region. The exemplary search region shown in Figure 8 can include multiple CTUs (or SBs). Referring to Figure 8, the search region can include the current CTU R1 (e.g., a portion of the current CTU R1), the upper left CTU R2, the upper CTU R3, and the left CTU R4. The cost function can include any appropriate cost function, such as the sum of absolute differences (SAD).
[0086] Within each region, the decoder can search for the template (e.g., (820)) that has the minimum cost (e.g., minimum SAD) for the current template (810), and use the block (e.g., (821)) associated with the template with the minimum SAD as the prediction block.
[0087] The dimensions of the region indicated by (SearchRange_w, SearchRange_h) can be set to be proportional to the block dimensions (BlkW, BlkH) so that there is a fixed number of SAD comparisons per pixel. For example,
number
[0088] Parameter a can be a constant that controls the trade-off between gain and complexity. In the example, a is 5.
[0089] The intra-template matching tool can be enabled for CUs of specific sizes, such as those with a width and height of 64 or less. The maximum CU size in IntraTMP mode is configurable.
[0090] IntraTMP mode can be signaled at the CU level via a dedicated flag, for example, when decoder-side intra-mode derivation (DIMD) is not currently being used by the CU.
[0091] In the example, as in ECM5, IntraTMP mode accesses 320 top samples and 320 left samples to support a 64x64 block. A memory size like 320 top samples and 320 left samples per block can improve the coding efficiency of IBC mode. The reference area or search range of IBC mode can be extended. In the example, the reference area of IBC mode is extended to the top two CTU rows. Figure 9 shows an example of a reference area for coding CTU(m,n). The integers m and n are indices representing the location of the CTU. To code CTU(m,n), the reference area can contain CTUs with indices (m-2,n-2), ..., (W-1,n-2), (0,n-1), ..., (W-1,n-1), (0,n), ..., and (m,n). Here, W represents the maximum horizontal index of the CTU in the current tile, slice, picture, etc. The settings (for example, accessing 320 upper samples and 320 left samples to predict a block) can ensure that the IBC mode does not require extra memory in current tests of the Model of Essential Video Coding (ETM) platform when the CTU size is 128x128. The per-sample block vector search range (or local search range) can be restricted horizontally to [-(C<<1),C>>2] (or [-2C,1 / 4C]) and vertically to [-C,C>>2] (or [-C,1 / 4C]) to adapt to the expansion of the reference area, where C represents the CTU size, such as 128. For example, the BV of a block is restricted horizontally to [-2C,1 / 4C]) and vertically to [-C,1 / 4C]).
[0092] Scalable video coding can be applied to intrapictures. In various examples, such as the AVC and HEVC scalable coding standards, scalable video coding can include one or more of the following: temporal scalability, spatial scalability, and SNR scalability. In some examples of scalable video coding techniques, both a base layer and an extension layer are required for display purposes.
[0093] In intra-prediction, in some examples, there is a limitation that the predicted samples generated from the upper-left reconstruction region have little correlation with the lower-right region of the current block. More accurate predicted samples can be generated using the disclosed method (e.g., the embodiments described in Figures 10-12). Furthermore, in some examples, blocks of higher-resolution pictures (e.g., (1001)) can be generated based on blocks of lower-resolution pictures (e.g., (1002)), thus enabling the transmission of fewer bits and improving coding efficiency.
[0094] The disclosure method may involve using a scaled, intracoded picture of the same content as a reference to predict the current picture. A scaled version of the current picture or a scaled picture may represent a downsampled version of the current picture. A full-scale current picture (also called a current picture with full scale) may represent a current picture with its original resolution or the current picture at its original resolution.
[0095] Currently, a scaled version of a picture can be compressed as part of the intracoding framework. In the bitstream, the coded representation of the current picture has two sub-bitstreams. For example, the sub-bitstream representing the scaled version of the current picture (e.g., the first sub-bitstream) is sent first, followed by the sub-bitstream representing the full-scale current picture (e.g., the second sub-bitstream).
[0096] In one embodiment, the bitstream may include a first sub-bitstream corresponding to a scaled version of the current picture and a second sub-bitstream corresponding to a full-scale current picture, as shown in Figure 10. Figure 10 shows a full-scale current picture (1001) and a scaled version of the current picture (1002) according to an embodiment of the present disclosure. The full-scale current picture (1001) and the scaled version of the current picture (1002) may correspond to the same current picture and may have different spatial resolutions. The scaled version (1002) may have a first spatial resolution, such as Ws × Hs samples, where Ws and Hs are the width and height of the scaled version (1002). The full-scale current picture (1001) may have a second spatial resolution, such as Wf × Hf samples, where Ws and Hs are the width and height of the full-scale current picture (1001). The second spatial resolution may be higher than the first spatial resolution. The original resolution of the scaled version (1002) can represent the second spatial resolution.
[0097] In this framework, a scaled version of the current picture is upsampled to its original resolution after decoding and can be used as a reference to predict the full-scale current picture. The upscaling (or upsampling) procedure applied to the decoded scaled version of the current picture can be specified so as to produce the same resampled reconstructed image. In one embodiment, referring to Figure 10, a scaled version of the current picture (1002) is upsampled to the original resolution of the full-scale current picture (1001) after decoding and can then be used as a reference or predictor to predict the full-scale current picture (1001). In another embodiment, a scaled version of the current picture (1002) is not upsampled and can be used as a reference or predictor to predict the full-scale current picture (1001).
[0098] A scaled version of the current picture may have one of the following characteristics: 1) smaller width and the same height; 2) smaller height and the same width; or 3) smaller width and smaller height. A scaled version of the current picture may have a smaller width than the full-scale current picture and the same height as the full-scale current picture. A scaled version of the current picture may have a smaller height than the full-scale current picture and the same width as the full-scale current picture. As shown in Figure 10, a scaled version of the current picture may have a smaller width than the full-scale current picture and a smaller height than the full-scale current picture.
[0099] The scaled version of the current picture and the full-scale current picture can have the same partitioning structure; therefore, each block in the full-scale current picture is in the same position as each block in the scaled version.
[0100] The scaled version of the current picture and the full-scale current picture can have separate partitioning structures; therefore, blocks in the full-scale current picture do not have to be in the same location as blocks in the scaled version. Referring to Figure 10, the full-scale current picture (1001) contains a second block (1010). The second block (1010) can be in the same location as the first region (1020) in the scaled version of the current picture (1002). For example, the second block (1010) in the full-scale current picture (1001) and the first region (1020) in the scaled version (1002) correspond to the same physical region in the current picture. The second block (1010) in the full-scale current picture (1001) and the first region (1020) in the scaled version (1002) can have the same shape. The size of the first region (1020) can be scaled to the size of the second block (1010) based on, for example, Ws, Hs, Wf, and Hf.
[0101] According to embodiments of the present disclosure, a first region (1020) in the scaled version (1002) of the current picture can be in the same position as a second block (1010) in the full-scale current picture (1001). The first region (1020) can overlap with one or more first blocks in the scaled version (1002) of the current picture. In the example shown in Figure 10, the first region (1020) overlaps with the first blocks (1021) to (1022) in the scaled version (1002) of the current picture. In the example shown in Figure 10, the first blocks (1021) to (1022) completely overlap with the first region (1020), and the first region (1020) includes the first blocks (1021) to (1022).
[0102] In some embodiments, due to different partitioning structures, one or more first blocks partially overlap with the first region (1020). For example, one of the one or more first blocks includes a sample located outside the first region (1020).
[0103] A scaled version (1002) of the current picture can be reconstructed from the first sub-bitstream, for example, by reconstructing the samples within the scaled version (1002). The second block (1010) in the full-scale current picture (1001) can be reconstructed based on either (i) partitioning information of one or more first blocks (1021)-(1022) in the scaled version (1002), or (ii) intra-prediction information of one or more first blocks (1021)-(1022) in the scaled version (1002).
[0104] In one embodiment, for example, after expanding the block size as indicated by a scaling ratio, a partitioning prediction using the partitioning results of the scaled picture can be used as a basis for predicting the partitioning of the full-scale current picture. For example, the bitstream may include a first sub-bitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second sub-bitstream corresponding to the full-scale current picture, where the full-scale has a second spatial resolution higher than the first spatial resolution. Furthermore, one or more first blocks in the scaled version of the current picture are in the same position as second blocks in the full-scale current picture. An example method can reconstruct one or more first blocks in the scaled version of the current picture from the first sub-bitstream. The disclosed method can then reconstruct second blocks of the full-scale current picture based on (i) partitioning information of one or more first blocks in the scaled version, or (ii) intra-prediction information of one or more first blocks in the scaled version.
[0105] In one example, the scaling ratio is 2, and a block in the current full-scale picture (e.g., a 16x16 block) is in the same position as and corresponds to a block in the scaled picture (e.g., an 8x8 block). If the 8x8 block in the scaled picture is coded as an 8x8 block without being divided, then the 16x16 block in the current full-scale picture can be assumed to be undivided and can be coded as a 16x16 block. If the 8x8 block in the scaled picture is divided and coded as four 4x4 blocks, then the 16x16 block in the current full-scale picture can be assumed to be divided into four 8x8 blocks, each of which is coded separately.
[0106] Referring to Figure 10, the partitioning information of the scaled version (1002) can be used to predict the partitioning information of the full-scale current picture (1001). The partitioning information of the first block (1021) can indicate whether the first block (1021) will be divided into smaller blocks. The partitioning information of the first block (1022) can indicate whether the first block (1022) will be divided into smaller blocks. Therefore, it can be decided whether to partition the second block (1010) of the full-scale current picture (1001) based on the partitioning information of the first blocks (1021) and (1022). Based on the decision of whether to partition the second block (1010), the second block (1010) can be reconfigured.
[0107] In one embodiment, a single flag can be signaled for each block size to indicate whether further division is necessary.
[0108] In one embodiment, a flag is signaled for the second block (1010). If it is determined that the second block (1010) is partitioned based on the partitioning information of the first blocks (1021) to (1022), the flag may indicate whether the partition is applied to the blocks partitioned from the second block (1010). If it is determined that the second block (1010) is not partitioned based on the partitioning information of the first blocks (1021) to (1022), the flag may indicate whether the partition is applied to the second block (1010).
[0109] In the example, a flag is signaled for each of the first blocks (1021) to (1022), indicating whether the first block (1021) to (1022) will be further divided.
[0110] In one embodiment, a flag indicating the partitioning mode of a scaled picture can be used as context for entropy coding of the partitioning mode of the full-scale current picture.
[0111] In the example, the partitioning information includes one or more flags for each of the first blocks (e.g., (1021) or (1022)) indicating whether or not to divide each first block (e.g., (1021) or (1022)) into smaller blocks. The partitioning information for the second block (e.g., (1021) or (1022)) can be entropy coded (e.g., entropy coded or entropy decoded), and the one or more flags for each of the first blocks (1021) or (1022) can be used as context for entropy coding (e.g., entropy coded or entropy decoded).
[0112] Reconstruction samples of scaled pictures can be used in the intra-prediction process of the full-scale current picture.
[0113] The bottom row, right column, or bottom right corner of the full-scale current picture can be predicted using the same position in the reconstructed sample of the scaled picture. The remaining samples in the current block can be interpolated and predicted along with the reconstructed samples above and to the left of the full-resolution picture. Examples of such interpolation include planar mode and bilateral interpolation.
[0114] In one embodiment, referring to Figure 10, a first sample (e.g., sample (1043)) in the second block (1010) is predicted based on a reconstructed sample (e.g., (1073)) in the first region (1020) of the scaled version (1002) of the current picture. The first sample in the second block (1010) can be in the same position as the reconstructed sample in the first region (1020). Referring to Figure 10, sample (1043) is located in the lower right corner of the second block (1010) and is in the same position as the reconstructed sample (1073) located in the lower right corner of the first region (1020). A second sample within the second block (1010) (e.g., a sample at location (1051) or a sample at location (1052)) can be predicted by interpolation based on at least a predicted first sample (1043) within the second block (1010) and a reconstructed sample within the second block (1010) (e.g., a top-left reconstructed sample (1044)). The predictor of the second block (1010) may include (i) a predicted first sample (1043) within the second block (1010), (ii) a reconstructed sample within the second block (1010) (e.g., a top-left reconstructed sample (1044)), and (iii) a predicted second sample within the second block (1010) (e.g., a predicted sample at location (1051) or a predicted sample at location (1052)). The second block (1010) can be reconstructed from the predictor of the second block (1010).
[0115] In one embodiment, referring to Figure 10, the bottom row (1041), right column (1042), or bottom right corner (1043) of the second block (1010) can be predicted based on the reconstructed sample at each of the same locations within the first region (1020) of the scaled version (1002). For example, the bottom row (1041) is predicted based on the reconstructed sample in the bottom row (1071) within the first region (1020). The right column (1042) is predicted based on the reconstructed sample in the right column (1072) within the first region (1020).
[0116] The predicted lower row (1041), predicted right column (1042), and / or predicted lower right corner (1043) of the second block (1010) can be used together with the upper and left reconstructed samples of the second block (1010) in the full-scale current picture (1001) (e.g., sample (1044)) to predict the remaining samples of the second block (1010) by interpolation. As mentioned above, in some examples of intra-prediction, there is a limitation that the predicted samples generated from the upper left reconstructed region have a low correlation with the lower right region of the current block, and therefore the prediction of the lower right region may not be accurate. The described method can make the predicted sample of the second block (1010) more accurate by using the reconstructed lower right region (e.g., (1071), (1072), and / or (1073)) of the scaled version (1002) to predict the lower right region (e.g., (1041), (1042), and / or (1043)) of the second block (1010).
[0117] A normal intra-prediction block of a full-scale current picture can be blended with an upsampled reconstruction block at the same location, with more weight given to the prediction from the scaled picture at the bottom right corner of the current block and more weight given to the prediction from the full-resolution picture at the top left corner of the current block.
[0118] Referring to Figure 10, the sample in the second block (1010) can be used with intra-prediction to obtain an intra-predicted block of the second block (1010) in the full-scale current picture (1001). (i) The predicted sample of the second block (1010) obtained from the intra-prediction (e.g., the intra-prediction sample) and (ii) the corresponding upsampled reconstructed sample in the first region (1020) can be blended using a weighted average. The weight of the blended sample in the second block (1010) can depend on the position of the blended sample within the second block (1010). For example, the closer the blended sample is to the lower right corner (1043) of the second block (1010), the greater the weight given to the reconstructed sample in the first region (1020). For example, the closer the blended sample is to the upper left corner of the second block (1010), the greater the weight given to the intra-predicted sample in the second block (1010). For example, position (1051) is associated with the first weight for the intra-predicted sample in the second block (1010), and position (1052) is associated with the second weight for the intra-predicted sample in the second block (1010). The first weight is greater than the second weight.
[0119] The method described uses position-dependent weights, and therefore assigns more weight to samples that are likely to be more accurate, thus the predicted sample in the second block (1010) may be more accurate.
[0120] Intra prediction mode (IPM) predictions from a scaled picture to a full-resolution picture can be performed for the same location. The prediction can be used as an additional candidate in a list, such as a most probable mode (MPM) list. Alternatively, the prediction can be used together with IPMs from spatial neighbor blocks to derive an MPM list. For chroma components, the direct mode (DM) may be from an IPM associated with a chroma block at the same location from either a full-scale picture or a scaled picture.
[0121] In one example, intra-prediction information includes IPM information for one or more first blocks (1021)-(1022), such as IPMs used to code one or more first blocks (1021)-(1022). The IPMs for the second block (1010) may be predicted based on the IPM information for one or more first blocks (1021)-(1022). The MPM list for the second block (1010) can be constructed based on the IPM information for one or more first blocks (1021)-(1022). In one example, the predicted IPMs are used as candidates to add to the MPM list for the second block (1010). In another example, the predictions can be used together with IPMs from spatially neighboring blocks of the second block (1010) to derive the MPM list. For chroma components, the direct DM mode may be either the IPM associated with the same chroma block from the full-scale current picture (1001) or a scaled version of the current picture (1002). The second block (1010) can be reconstructed based on the MPM list.
[0122] In Multiple Reference Lines (MRL) mode, in addition to the direct adjacent lines of neighboring samples, one of two non-adjacent reference lines can be used as a reference line for intra-picture prediction of the luma sample. The non-adjacent reference line may be only two or three lines away from the current block. It is possible to predict the reference line index from the same location in a scaled picture to a full-resolution picture. The reference line used at the same location from the scaled picture can be used to predict the reference line of the current block in the full-scale picture.
[0123] In one embodiment, the intra-prediction information includes reference line index information for one or more first blocks (1021) to (1022), for example, each reference line or each reference line index used in each of the one or more first blocks (1021) to (1022). Based on the reference line index information for one or more first blocks (1021) to (1022), the reference line index or reference line of the second block can be predicted. Based on the reference line index (or reference line) of the second block (1010), the second block (1010) can be reconstructed.
[0124] The intra-skip mode is used to code the current block in a full-scale picture. In one example, when intra-skip mode is used, all prediction modes are inherited from the prediction modes associated with the corresponding sample in the scaled picture. Furthermore, when intra-skip mode is applied, the residual sample can be assumed to be 0, and none of the syntax associated with residual coding is signaled.
[0125] In one embodiment, the intra-skip mode is used to code the second block (1010) in the full-scale current picture (1001). The prediction mode of one or more of the first blocks (1021)~(1022) in the scaled version (1002) (e.g., (1021) or (1022)) can be inherited by the second block (1010). The second block (1010) can be reconstructed based on at least the inherited prediction mode. When the intra-skip mode is applied to the second block (1010) in the full-scale current picture (1001), the residual samples of the second block (1010) can be assumed to be zero, and none of the syntax related to residual coding is signaled to the second block (1010).
[0126] Reconstruction samples of scaled pictures can be used as predictors in the interlayer prediction process for the full-scale current picture. In one example, the scaled picture is in the base layer, and the full-scale current picture is in the expansion layer.
[0127] An upsampled reconstructed sample at the same location within a scaled picture can be used as a predictor for the current block in the full-scale current picture. In this case, the displacement vector pointing from the current block location to the reference block location in the scaled picture is 0. For example, referring to Figure 10, a first region (1020), which is an upsampled reconstructed sample at the same location, can be a predictor for a second block (1010) in the full-scale current picture (1001). The vector (e.g., displacement vector) (1003) points from the second block (1010) to the reference block (the upsampled and reconstructed first region (1020) in the scaled version (1002)), and can be 0, for example, when the second block (1010) and the reference block are at the same location.
[0128] This prediction block can be filtered before being applied as a predictor. The reference samples above and to the left of the current block in the full-scale current picture can be used to extend this prediction block.
[0129] In one embodiment, referring to Figure 10, an upsampled and reconstructed first region (1020) located in the same position as the second block (1010) can be filtered before being used as a predictor for the second block (1010). Reference samples (1061)~(1062) above and / or to the left of the second block (1010) can be used to extend (e.g., filter) the upsampled and reconstructed first region (1020).
[0130] This prediction block can be combined in the blending process with another predictor generated from the intra-prediction process within the full-scale current picture, as described above.
[0131] In one example, each sample within a block is assigned an equal weight.
[0132] In another example, different weightings can be applied depending on the sample position within the block. For instance, if the current position is far from the top and / or left reference sample, more weight can be assigned to the predictor from the scaled picture.
[0133] Referring to Figure 10, the first predictor is the upsampled and reconstructed first region (1020) at the same location as the second block (1010). The second predictor can be generated using intra-prediction from the second block (1010) in the full-scale current picture (1001). The first and second predictors can be combined using a weighted average. In one example, equal weighting is applied to each sample in the second block (1010). In another example, different weightings can be applied depending on the location within the second block (1010). For example, if the current location is further away from the upper reference sample and / or left reference sample, more weight can be assigned to the first predictor from the scaled version (1002).
[0134] The residual signals of a scaled picture can be used as predictors to code the residual signals of the full-scale current picture. For example, the residuals of a sample in the second block (1010) within the full-scale current picture (1001) can be predicted based on the residuals of a reconstructed sample in the first region (1020) within the scaled version (1002).
[0135] A significance map, which is the distribution of non-zero residuals within a block, can have a correlation between the same location in the scaled picture and the full-scale picture. Here, the distribution of non-zero residuals within a block can be either sample-based 0 or 1 signaling or sub-block-based signaling, for example, 1 bit is used for each 4x4 sub-block to indicate whether there are non-zero residuals within that region. Referring to Figure 10, the distribution of non-zero residuals within the first region (1020) of the scaled version (1002) can be correlated with the distribution of non-zero residuals within the second block (1010) in the full-scale current picture (1001), where the second block (1010) is in the same location as the first region (1020).
[0136] In one embodiment, a significance map of blocks at the same location within a scaled picture (or a distribution of non-zero residuals within a block) can be used to predict whether the same location within a block in the full-scale current picture has non-zero residuals. For example, whether a sample in a second block (1010) has non-zero residuals is predicted based on the distribution of non-zero residuals within a first region (1020).
[0137] In another embodiment, a significance map of identical blocks in a scaled picture (the distribution of non-zero residuals within a block) can be used as context for arithmetic coding of a significance map of identical blocks in the full-scale current picture. In the example, the distribution of non-zero residuals within the second block (1010) is arithmetic-decoded. The distribution of non-zero residuals within the first region (1020) can be used as context for arithmetic decoding.
[0138] The above method may not require a scaled picture to code all blocks / CTUs in the full-scale current picture. If some blocks in the scaled picture are not used for prediction, their reconstructed values may not provide useful information, and therefore these blocks may be skipped or coarsely processed during the coding of the scaled picture. In one example, unused blocks are coded using a fixed sample value. In another example, syntax can be designed at the block level or CTU level to signal that this block / CTU will be skipped because it has no content. In the embodiments described in the disclosure, it is not possible to code all blocks and / or all CTUs in the full-scale current picture (1001) using the scaled version (1002). If block (1030) in the scaled version (1002) is not used for prediction, the reconstructed sample within block (1030) may not provide useful information. Therefore, block (1030) may be skipped or coarsely processed during the coding of the scaled version (1002). In one example, for blocks within an unused scaled version (1002) (e.g., block (1030)), the blocks are coded using a fixed sample value. In another example, the syntax can be designed at the block level or CTU level to signal that a corresponding block or CTU within the scaled version (1002) will be skipped if it has no content.
[0139] In one example, the first sub-bitstream contains the entire scaled version (1002), and each block within the scaled version (1002) is coded and included in the first sub-bitstream. In another example, the first sub-bitstream contains a portion of the scaled version (1002), and the first block within the scaled version (1002) is coded and included in the first sub-bitstream, while the second block within the scaled version (1002) is not coded and is not included in the first sub-bitstream. In yet another example, the second block is coarsely processed (e.g., using a constant sample value).
[0140] In one example, the second sub-bitstream contains the entire full-scale current picture (1001), and each block of the entire full-scale current picture (1001) is coded and included in the second sub-bitstream. In another example, the second sub-bitstream contains a portion of the full-scale current picture (1001), and the first sample of the full-scale current picture (1001) is coded and included in the second sub-bitstream, while the samples of the full-scale current picture (1001) are not coded and are not included in the second sub-bitstream.
[0141] Figure 11 shows a flowchart illustrating an overview of process (1100) according to one embodiment of the present disclosure. Process (1100) can be used in a video decoder. In various embodiments, process (1100) is performed by a processing circuit such as a processing circuit that performs the functions of a video decoder (110), a processing circuit that performs the functions of a video decoder (210), and so on. In some embodiments, process (1100) is implemented by a software instruction, and so the processing circuit performs process (1100) when it executes the software instruction. Process (1100) starts at (S1101) and proceeds to (S1110).
[0142] In (S1110), a bitstream is received, which may include a first sub-bitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second sub-bitstream corresponding to the full-scale current picture, the full-scale version having a second spatial resolution higher than the first spatial resolution.
[0143] (S1120) allows the current scaled version of the picture to be reconstructed from the first subbitstream.
[0144] In (S1130), the second block in the full-scale current picture can be reconstructed based on (i) partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) intra-prediction information of one or more first blocks in the scaled version.
[0145] In the example, as explained in Figure 10, the first region in the scaled version of the current picture is in the same position as the second block in the full-scale current picture.
[0146] In the example, the first sample in the second block can now be predicted based on the reconstructed sample in the first region of the scaled version of the picture. The first sample in the second block is in the same position as the reconstructed sample in the first region. The second sample in the second block can be predicted by interpolation based on at least the predicted first sample in the second block and the reconstructed sample in the upper left of the second block. The predictor for the second block includes (i) the predicted first sample in the second block, (ii) the reconstructed sample in the upper left of the second block, and (iii) the predicted second sample in the second block. The second block is reconstructed from the predictor for the second block.
[0147] In the example, the samples in the second block are predicted using intra-prediction. (i) the predicted samples in the second block and (ii) the corresponding upsampled reconstructed samples in the first region of the scaled version of the current picture can be blended using a weighted average. The weights of the blended samples in the second block depend on the position of the blended samples in the second block.
[0148] In the example, the reconstructed samples within the first region of the scaled version of the picture are upsampled. The upsampled samples within the first region of the scaled version are filtered. The second block can be reconstructed using the filtered upsampled samples within the first region as predictors for the second block.
[0149] In the example, the residuals of the samples in the second block can be predicted based on the residuals of the reconstructed samples in the first region. Whether the samples in the second block have non-zero residuals can be predicted based on the distribution of non-zero residuals in the first region. In the example, the distribution of non-zero residuals in the second block is arithmetic-decoded, and the distribution of non-zero residuals in the first region is used as the context for arithmetic decoding.
[0150] Next, the process proceeds to (S1199) and terminates.
[0151] Process (1100) can be appropriately adapted. The steps of process (1100) can be changed and / or omitted. Additional steps can be added. Any appropriate implementation order can be used.
[0152] In the embodiment, the partitioning information of one or more first blocks indicates whether each of the one or more first blocks is divided into smaller blocks. The second block can be reconfigured based on the partitioning information of one or more first blocks by deciding whether to partition the second block in the full-scale current picture based on the partitioning information of one or more first blocks, and by reconfiguring the second block based on the decision whether to partition the second block.
[0153] In the example, the second sub-bitstream contains the flag for the second block. In response to the decision to partition the second block, the flag indicates whether to apply the partition to the block partitioned from the second block. In response to the decision to partition the second block, the flag indicates whether to apply the partition to the second block.
[0154] In the example, the partitioning information includes a flag for each of the first blocks, indicating whether or not each first block should be divided into smaller blocks. The partitioning information for the second block can be entropy-decoded, and the flags for each of the first blocks can be used as context for entropy decoding.
[0155] In the example, the intra-prediction information includes intra-prediction mode (IPM) information from one or more first blocks. The most probable mode (MPM) list in the second block can be constructed based on the IPM information from one or more first blocks. The second block can be reconstructed based on the MPM list.
[0156] In the example, the intra-prediction information includes reference line index information for one or more first blocks. The reference line index for the second block can be determined based on the reference line index information for one or more first blocks, and the second block can be reconstructed based on the reference line index for the second block.
[0157] In the example, the second sub-bitstream indicates that the intra-skip mode is used for the second block. The intra-prediction information indicates one of the prediction modes from one or more first blocks. The prediction modes from one or more first blocks can be used for the second block, and the second block can be reconstructed based on the prediction mode.
[0158] Figure 12 shows a flowchart illustrating an overview of process (1200) according to one embodiment of the present disclosure. Process (1200) can be used in a video encoder. In various embodiments, process (1200) is performed by a processing circuit such as a processing circuit that performs the functions of a video encoder (103), a processing circuit that performs the functions of a video encoder (303), and so on. In some embodiments, process (1200) is implemented by a software instruction, and so the processing circuit performs process (1200) when it executes the software instruction. The process starts at (S1201) and proceeds to (S1210).
[0159] In (S1210), one or more first blocks in the scaled version of the current picture having a first spatial resolution are encoded. The first region in the scaled version overlaps with one or more first blocks.
[0160] (S1220) can encode a second block in the full-scale current picture based on (i) partitioning information of one or more first blocks in the scaled version, or (ii) intra-prediction information of one or more first blocks in the scaled version. The full-scale current picture has a second spatial resolution that is higher than the first spatial resolution. The first region in the scaled version is in the same position as the second block in the full-scale current picture.
[0161] (S1230) The bitstream may include an encoded scaled version containing one or more encoded first blocks, and a full-scale encoded current picture containing an encoded second block.
[0162] Next, the process proceeds to (S1299) and terminates.
[0163] Process (1200) can be appropriately adapted. The steps of process (1200) can be changed and / or omitted. Additional steps can be added. Any suitable implementation order can be used. The embodiments shown in Figures 10-11 can be appropriately adapted and used for process (1200).
[0164] Embodiments of this disclosure may be used separately or combined in any order. Furthermore, each of the methods (or embodiments), encoders, and decoders may be implemented by processing circuits (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored on a non-temporary computer-readable medium.
[0165] The techniques described above can be implemented as computer software using computer-readable instructions and can be physically stored on one or more computer-readable media. For example, Figure 13 shows a computer system (1300) suitable for implementing a particular embodiment of the subject matter of this disclosure.
[0166] Computer software can be coded using any suitable machine code or computer language that can be processed by mechanisms such as assembly, compilation, and linking to generate code containing instructions that can be executed directly or through interpretation, microcode execution, etc., by one or more computer central processing units (CPUs), graphics processing units (GPUs), etc.
[0167] The instructions can be executed on various computers or their components, including, for example, personal computers, tablet computers, servers, smartphones, game consoles, Internet of Things devices, etc.
[0168] The components shown in Figure 13 of the computer system (1300) are illustrative and do not imply any limitation on the scope of use or functionality of computer software implementing embodiments of the present disclosure. Furthermore, the configuration of the components should not be construed as having any dependencies or requirements relating to any one or combination of the components shown in the exemplary embodiments of the computer system (1300).
[0169] The computer system (1300) may include certain human interface input devices. Such human interface input devices may respond to input from one or more human users, for example, through sensory input (e.g., keystrokes, swipes, data grab actions), voice input (e.g., voice, clapping), visual input (e.g., gestures), or olfactory input (not shown). The human interface devices may also be used to capture certain media that do not necessarily need to be directly related to conscious human input, such as voice (e.g., conversation, music, ambient sounds), images (e.g., scanned images, photographic images taken from a digital camera), or video (e.g., including 2D video, 3D video, and stereoscopic video).
[0170] The input human interface device may include one or more of the following (only one is shown): keyboard (1301), mouse (1302), trackpad (1303), touchscreen (1310), data grab (not shown), joystick (1305), microphone (1306), scanner (1307), and camera (1308).
[0171] The computer system (1300) may also include certain human interface output devices. Such human interface output devices may stimulate the senses of one or more human users, for example, through sensory output, sound, light, and smell / taste. Such human interface output devices may include sensory output devices (e.g., touchscreen (1310), data grab (not shown), or joystick (1305) (for sensory feedback, however, there may also be sensory feedback devices that do not function as input devices), sound output devices (e.g., speaker (1309), headphones (not shown)), visual output devices (e.g., screen (1310), CRT screen, LCD screen, plasma screen, OLED screen, each having or not having touchscreen input capability, each having or not having sensory feedback capability, some of which may be capable of outputting two-dimensional visual output or three-dimensional or more output through means such as stereoscopic output, virtual reality glasses (not shown), holographic displays, and smoke tanks (not shown), and printers (not shown))).
[0172] The computer system (1300) may also include human-accessible storage devices, and related media such as CD / DVD ROM / RW (1320) with media such as CD / DVD (1321), thumb drives (1322), removable hard drives or solid drives (1323), legacy magnetic media such as tapes and floppy disks (not shown), and devices based on dedicated ROM / ASIC / PLD such as security dongles (not shown).
[0173] Those skilled in the art should also understand that the term “computer-readable medium” as used in connection with the subject matter of this disclosure does not include a transmission medium, carrier wave, or other transient signal.
[0174] The computer system (1300) may also include an interface (1354) to one or more communication networks (1355). The networks may be, for example, wireless, wired, or optical. The networks may further be local, wide-area, urban, vehicle and industrial, real-time, latency-tolerant, etc. Examples of networks include local area networks such as Ethernet, cellular networks including wireless LAN, GSM, 3G, 4G, 5G, LTE, etc., wired or wireless wide-area digital networks including cable TV, satellite TV, and terrestrial broadcast TV, vehicle and industrial networks including CANBus, etc. Certain networks generally require an external network interface connected to a specific general-purpose data port or peripheral bus (1349) (e.g., a USB port on the computer system (1300)). Others are generally integrated into the core of the computer system (1300) by connection to a system bus, as described later (e.g., an Ethernet interface to a PC computer system, or a cellular network interface to a smartphone computer system). Using these networks, the computer system (1300) can communicate with other entities. Such communication may be unidirectional (e.g., broadcast television), unidirectional (e.g., CANbus to a specific CANbus device), or bidirectional (e.g., local or wide-area digital network). Specific protocols and protocol stacks may be used in each of the aforementioned networks and network interfaces.
[0175] The aforementioned human interface device, human-accessible storage device, and network interface can be mounted on the core (1340) of the computer system (1300).
[0176] The core (1340) may include one or more central processing units (CPUs) (1341), graphics processing units (GPUs) (1342), dedicated programmable processing units in the form of FPGAs (1343), hardware accelerators for specific tasks (1344), graphics adapters (1350), etc. These devices may be connected via a system bus (1348) along with read-only memory (ROM) (1345), random access memory (1346), and internal mass storage devices such as internal, user-inaccessible hard drives, SSDs, etc. (1347). In some computer systems, the system bus (1348) is accessible in the form of one or more physical plugs to allow expansion with additional CPUs, GPUs, etc. Peripherals can be attached directly to the core's system bus (1348) or via a peripheral bus (1349). In the example, a screen (1310) can be connected to the graphics adapter (1350). Peripheral device bus architectures include PCI, USB, etc.
[0177] The CPU (1341), GPU (1342), FPGA (1343), and accelerator (1344) can execute specific instructions that, when combined, can generate the aforementioned computer code. This computer code can be stored in ROM (1345) or RAM (1346). Temporary data can also be stored in RAM (1346), while permanent data can be stored, for example, in a built-in mass storage device (1347). High-speed storage and retrieval to any of the memory devices can be enabled through the use of cache memory that may be closely associated with one or more of the CPU (1341), GPU (1342), mass storage device (1347), ROM (1345), RAM (1346), etc.
[0178] Computer-readable media may contain computer code for performing actions performed by various computers. The media and computer code may be specifically designed and configured for the purposes of this disclosure, or they may be of a type well known and available to those skilled in the computer software field.
[0179] As an example and not limited thereto, a computer system (1300) having an architecture, and specifically a core (1340), can provide functionality as a result of a processor (including a CPU, GPU, FPGA, accelerator, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media may be specific storage devices of the core (1340) with non-transient characteristics, such as a core-integrated mass storage device (1347) or ROM (1345), and media associated with the user-accessible mass storage devices described above. Software implementing various embodiments of the present disclosure can be stored in such devices and executed by the core (1340). The computer-readable media may include one or more memory devices or chips, depending on the specific needs. The software can cause the core (1340) and specifically the processor (including a CPU, GPU, FPGA, etc.) therein to execute specific processes or specific parts of specific processes described herein, including defining and modifying data structures stored in RAM (1346) according to software-defined processes. As an addition or alternative, a computer system may provide functionality as a result of a logic hardwired or other circuit implementation (e.g., an accelerator (1344)) that can operate together with or in place of the software to perform the specific processes or specific parts of the specific processes described herein. References to software include logic, and vice versa, where appropriate. References to computer-readable media may include, where appropriate, circuits (such as integrated circuits (ICs)) that house software for execution, circuits that implement logic for execution, or both. This disclosure includes any appropriate combination of hardware and software.
[0180] The use of “at least one” or “one” in this disclosure is intended to include any one or combination of the elements described. For example, references to at least one of A, B, or C; at least one of A, B, and C; at least one of A, B, and / or C; and at least one of A-C are intended to include A only, B only, C only, or any combination thereof. References to one of A or B, and one of A and B are intended to include A or B or (A and B). The use of “one” does not exclude any combination of the enumerated elements where applicable, such as when the elements are not mutually exclusive.
[0181] While this disclosure describes several exemplary embodiments, alternatives, substitutions, and various equivalents exist and are included within the scope of this disclosure. As will be apparent to those skilled in the art, numerous systems and methods can be devised to implement the principles of this disclosure and thus fall within the spirit and scope of this disclosure, although these are not expressly shown or described herein.
Claims
1. A method for video decoding, Steps of receiving a bitstream including a first subbitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second subbitstream corresponding to a full-scale current picture having a second spatial resolution higher than the first spatial resolution, The steps include: reconstructing the scaled version of the current picture from the first sub-bitstream; (i) partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) intra prediction information of one or more first blocks in the scaled version, Includes, The first region in the scaled version of the current picture is in the same position as the second block in the full-scale current picture, and the first region overlaps with one or more first blocks in the scaled version of the current picture. The step of reconfiguring the second block is: A step of predicting a first sample in a second block based on a reconstructed sample in a first region in the scaled version of the current picture, wherein the first sample in the second block is in the same position as the reconstructed sample in the first region. A step of predicting a second sample in the second block by interpolation based at least on the predicted first sample in the second block and the upper-left reconstructed sample in the second block, wherein the predictor of the second block includes (i) the predicted first sample in the second block, (ii) the upper-left reconstructed sample in the second block, and (iii) the predicted second sample in the second block, and the second block is reconstructed from the predictor of the second block. Methods that further include the above.
2. The partitioning information of the one or more first blocks indicates whether each of the one or more first blocks is divided into smaller blocks. The step of reconfiguring the second block is: Based on the partitioning information of the one or more first blocks, it is determined whether to partition the second block in the full-scale current picture. Based on the decision of whether or not to partition the second block, the second block is reconfigured. The method according to claim 1, further comprising the step of reconfiguring the second block based on the partitioning information of one or more first blocks.
3. The second sub-bitstream includes the flags of the second block, In response to the decision to partition the second block, the flag indicates whether the partition is applied to the block partitioned from the second block. In response to the decision not to partition the second block, the flag indicates whether partitioning is applied to the second block. The method according to claim 2.
4. The partitioning information includes, for each of the one or more first blocks, a flag indicating whether each of the first blocks is divided into smaller blocks. The method according to claim 2, further comprising the step of entropy decoding the partitioning information of the second block, wherein the flag for each of the one or more first blocks is used as a context for the entropy decoding.
5. The intra-prediction information includes intra-prediction mode (IPM) information for one or more first blocks, The step of reconfiguring the second block is: The steps include: configuring a most probable mode (MPM) list for the second block based on the IPM information of one or more first blocks; The steps include: reconstructing the second block based on the MPM list; The method according to claim 1, including the method described in claim 1.
6. The intra prediction information includes reference line index information of one or more first blocks, The step of reconfiguring the second block is: The steps include determining the reference line index of the second block based on the reference line index information of one or more first blocks, The steps include: reconstructing the second block based on the reference line index of the second block; The method according to claim 1, including the method described in claim 1.
7. The second sub-bitstream indicates that intra-skip mode is used for the second block. The intra prediction information indicates one prediction mode from one or more first blocks. The method according to claim 1, wherein the step of reconstructing the second block includes the steps of using one of the one or more first blocks as a prediction mode for the second block, and reconstructing the second block based on the prediction mode.
8. A method for video decoding, Steps of receiving a bitstream including a first subbitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second subbitstream corresponding to a full-scale current picture having a second spatial resolution higher than the first spatial resolution, The steps include: reconstructing the scaled version of the current picture from the first sub-bitstream; (i) partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) intra prediction information of one or more first blocks in the scaled version, Includes, The first region in the scaled version of the current picture is in the same position as the second block in the full-scale current picture, and the first region overlaps with one or more first blocks in the scaled version of the current picture. The step of reconfiguring the second block is: A step of predicting the samples in the second block using intra prediction, The steps include blending (i) the predicted samples in the second block and (ii) the corresponding upsampled reconstructed samples in the first region of the scaled version of the current picture using a weighted average, Methods that include...
9. The method according to claim 8, wherein the weight of the blended sample in the second block depends on the position of the blended sample in the second block.
10. A method for video decoding, Steps of receiving a bitstream including a first subbitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second subbitstream corresponding to a full-scale current picture having a second spatial resolution higher than the first spatial resolution, The steps include: reconstructing the scaled version of the current picture from the first sub-bitstream; (i) partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) intra prediction information of one or more first blocks in the scaled version, Includes, The first region in the scaled version of the current picture is in the same position as the second block in the full-scale current picture, and the first region overlaps with one or more first blocks in the scaled version of the current picture. The step of reconfiguring the second block is: The steps include upsampling the reconstructed sample in the first region of the scaled version of the current picture, The steps include filtering the upsampled samples within the first region of the scaled version, The steps include: reconstructing the second block using filtered upsampled samples within the first region as predictors for the second block; Methods that include...
11. A method for video decoding, Steps of receiving a bitstream including a first subbitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second subbitstream corresponding to a full-scale current picture having a second spatial resolution higher than the first spatial resolution, The steps include: reconstructing the scaled version of the current picture from the first sub-bitstream; (i) partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) intra prediction information of one or more first blocks in the scaled version, Includes, The first region in the scaled version of the current picture is in the same position as the second block in the full-scale current picture, and the first region overlaps with one or more first blocks in the scaled version of the current picture. The step of reconfiguring the second block is: The step includes predicting the residuals of the sample in the second block based on the residuals of the reconstructed sample in the first region, The step of predicting the residual is: A step of predicting whether a sample in the second block has a non-zero residual based on the distribution of non-zero residuals in the first region, or A step of arithmetic decoding the distribution of non-zero residuals in the second block, wherein the distribution of non-zero residuals in the first region is used as the context for the arithmetic decoding, Methods that include...
12. The method according to claim 1, wherein the first region includes one or more first blocks.
13. A device for video decoding, A device comprising a processing circuit configured to perform the method according to any one of claims 1 to 12.
14. A method for video encoding, The bitstream includes the steps of including a first sub-bitstream corresponding to a scaled version of the current picture having a first spatial resolution and a second sub-bitstream corresponding to a full-scale current picture having a second spatial resolution higher than the first spatial resolution, (i) Partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) Intra-prediction information of one or more first blocks in the scaled version, the second block of the full-scale current picture is encoded. The first region in the scaled version of the current picture is in the same position as the second block in the full-scale current picture, and the first region overlaps with one or more first blocks in the scaled version of the current picture. Based on the reconstructed sample in the first region in the scaled version of the current picture, the first sample in the second block is predicted, and the first sample in the second block is in the same position as the reconstructed sample in the first region. A method wherein a second sample in the second block is predicted by interpolation based on at least the predicted first sample in the second block and the upper-left reconstructed sample in the second block, and the predictor of the second block includes (i) the predicted first sample in the second block, (ii) the upper-left reconstructed sample in the second block, and (iii) the predicted second sample in the second block, and the second block is reconstructed from the predictor of the second block.
15. A method for video encoding, The process includes transmitting a bitstream that includes a first sub-bitstream corresponding to a scaled version of the current picture having a first spatial resolution, and a second sub-bitstream corresponding to a full-scale current picture having a second spatial resolution higher than the first spatial resolution. (i) Partitioning information of one or more first blocks in the scaled version of the current picture, or (ii) Intra-prediction information of one or more first blocks in the scaled version, the second block of the full-scale current picture is encoded. The first region in the scaled version of the current picture is in the same position as the second block in the full-scale current picture, and the first region overlaps with one or more first blocks in the scaled version of the current picture. Based on the reconstructed sample in the first region in the scaled version of the current picture, the first sample in the second block is predicted, and the first sample in the second block is in the same position as the reconstructed sample in the first region. A method wherein a second sample in the second block is predicted by interpolation based on at least the predicted first sample in the second block and the upper-left reconstructed sample in the second block, and the predictor of the second block includes (i) the predicted first sample in the second block, (ii) the upper-left reconstructed sample in the second block, and (iii) the predicted second sample in the second block, and the second block is reconstructed from the predictor of the second block.