Restrictions on the parameter set for adaptive loop filtering

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
In-loop reshaping and adaptive loop filtering techniques enhance video coding efficiency in HEVC and future standards by optimizing video block construction and filtering, addressing challenges in high-resolution video compression and decoding.

JP7875892B2Inactive Publication Date: 2026-06-18DOUYIN VISION CO LTD +1

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: DOUYIN VISION CO LTD
Filing Date: 2024-01-04
Publication Date: 2026-06-18
Estimated Expiration: Not applicable · inactive patent

Application Information

Patent Timeline

04 Jan 2024

Application

18 Jun 2026

Publication

JP7875892B2

IPC: H04N19/82; H04N19/70; H04N19/117; H04N19/30

CPC: H04N19/186; H04N19/176; H04N19/136; H04N19/117; H04N19/463; H04N19/70; H04N19/82; H04N19/98

AI Tagging

Application Domain

Digital video signal modification

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 0007875892000042
Figure 0007875892000043
Figure 0007875892000044

Patent Text Reader

Abstract

To provide a device, a system, and a method for digital video coding including in-loop reshaping for video coding.SOLUTION: A video processing method includes performing a conversion between a video containing one or more video data units and a bitstream representation of the video, the bitstream representation follows formatting rules that specify the inclusion of side information indicating default parameters for the coding mode that are applicable to a video block of one or more video data units for which the coding mode is enabled. The side information provides parameters for configuring the video block on the basis of the representation of the video block in the original and reshaped regions and / or luma-dependent scaling of the chroma residual of the chroma video block.SELECTED DRAWING: Figure 31B

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] [Cross-references to related applications] This application is a divisional application of Japanese Patent Application No. 2021-555615, filed on September 15, 2021, based on International Patent Application No. PCT / CN2020 / 080600, filed on March 23, 2020, claiming priority and benefits thereof to International Patent Application No. PCT / CN2019 / 079393, filed on March 23, 2019. All of the above patent applications are incorporated herein by reference in their entirety.

[0002] This patent document relates to video coding technology, devices, and systems. [Background technology]

[0003] Despite advancements in video compression, digital video still accounts for the largest share of bandwidth usage on the Internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth demand for digital video is expected to continue growing. [Overview of the Initiative]

[0004] This document describes devices, systems, and methods for digital video coding, specifically in-loop reshaping (ILR) for video coding. The methods described may be applicable to both existing video processing standards (e.g., High Efficiency Video Coding (HEVC)) and future video processing standards or video codecs.

[0005] In one representative aspect, the disclosed technology may be used to provide a method for video processing. This method includes performing a conversion between a video including one or more video data units and a bitstream representation of the video, the bitstream representation conforming to a format rule that specifies the inclusion of side information indicating default parameters of the coding mode applicable to video blocks of the one or more video data units for which the coding mode is enabled, the side information providing parameters for constructing the video blocks based on the representation of the video blocks and / or the luma-dependent scaling of the chroma residuals of chroma video blocks in the original and reformed regions.

[0006] In another representative aspect, the disclosed technology may be used to provide a method for video processing. This method includes performing a conversion between a video including one or more video data units and a bitstream representation of the video, the bitstream representation conforming to a format rule that specifies the inclusion of side information indicating default parameters of the coding mode applicable to video blocks of the one or more video data units for which the coding mode is enabled, the default parameters being used for the coding mode when there are no explicitly notified parameters in the bitstream representation, the coding mode having the construction of the video blocks based on the representation of the video blocks and / or the luma-dependent scaling of the chroma residuals of chroma video blocks in the original and reformed regions.

[0007] In yet another representative aspect, the disclosed technology may be used to provide a method for video processing. This method includes the step of constructing a bitstream representation including a syntax element that notifies temporal layer information and parameters for a coding mode applicable to video blocks of the one or more video data units for conversion between a video including the one or more video data units and a bitstream representation of the video, and the step of performing the conversion based on the constructing step, wherein the coding mode includes constructing the video blocks based on luma-dependent scaling of a chroma residual of an original region, a reshaped region, and / or a chroma video block.

[0008] In yet another representative aspect, the disclosed technology may be used to provide a method for video processing. This method includes the step of parsing a bitstream representation including a syntax element that notifies temporal layer information and parameters for a coding mode applicable to video blocks of the one or more video data units for conversion between a video including the one or more video data units and a bitstream representation of the video, and the step of performing the conversion based on the parsing step, wherein the coding mode includes constructing the video blocks based on luma-dependent scaling of a chroma residual of an original region, a reshaped region, and / or a chroma video block.

[0009] In further other representative embodiments, the disclosed technology may be used to provide a method for video processing. The method includes the step of performing a conversion between a first video data unit of video and a bitstream representation of the video, wherein a coding mode is applicable to video blocks of the first video data unit, and the coding mode constitutes the video blocks based on luma-dependent scaling of the original region and the reshaped region and / or chroma residuals of the chroma video block, based on side information associated with the coding mode, and the side information is determined according to rules based on a time layer index.

[0010] In further other representative embodiments, the disclosed technology may be used to provide a method for video processing. The method includes the steps of: configuring a bitstream representation of a video containing one or more video data for conversion between the video and a bitstream representation of the video, the bitstream representation including a syntax element that notifies time layer information and parameters for a coding mode applicable to a video block of the one or more video data units; and performing the conversion based on the configuring step, wherein the coding mode constitutes the current block of the video based on a filtering process using Adaptive Loop Filter (ALF) coefficients.

[0011] In further other representative embodiments, the disclosed technology may be used to provide a method for video processing. This method includes the steps of: parsing a bitstream representation of a video comprising one or more video data units for conversion between the video and a bitstream representation of the video, the bitstream representation comprising a syntax element that notifies time layer information and parameters for a coding mode applicable to the video blocks of the one or more video data units; and performing the conversion based on the parsing step, wherein the coding mode constitutes the current block of the video based on a filtering process using adaptive loop filter (ALF) coefficients.

[0012] In further other representative embodiments, the disclosed technology may be used to provide a method for video processing. The method includes the step of performing a conversion between a first video data unit of video and a bitstream representation of the video, wherein a coding mode is applicable to video blocks of the first video data unit, and the coding mode constitutes a current block of the video based on a filtering process using adaptive loop filter (ALF) coefficients, on side information associated with the coding mode, and the side information is determined according to a rule based on a time layer index.

[0013] In yet another representative embodiment, the above method is embodied in the form of processor-executable code and stored in a computer-readable program medium.

[0014] In further other representative embodiments, a device configured or operable to perform the above method is disclosed. The device may include a processor programmed to implement this method.

[0015] In yet another representative embodiment, a video decoder device may implement the method described herein.

[0016] The above and other aspects and features of the disclosed technology are described in further detail in the drawings, specification and claims. [Brief explanation of the drawing]

[0017] [Figure 1] Here is an example of how to construct a merge candidate list. [Figure 2] Examples of spatial candidate locations are shown. [Figure 3] Examples of candidate pairs undergoing redundancy testing for spatial merge candidates are shown. [Figure 4A] This shows an example of the position of the second prediction unit (PU) based on the current block size and shape. [Figure 4B] This shows an example of the position of the second prediction unit (PU) based on the current block size and shape. [Figure 5] This shows an example of motion vector scaling for time merge candidates. [Figure 6] Examples of candidate positions for time merge are shown. [Figure 7] This example shows how to generate composite bipredictive merge candidates. [Figure 8] An example of constructing a candidate for motion vector prediction is shown. [Figure 9] An example of motion vector scaling for a candidate spatial motion vector is shown. [Figure 10] This example shows motion prediction using Adaptive Time Motion Vector Prediction (ATMVP) for a coding unit (CU). [Figure 11] This shows an example of a coding unit (CU) with subblocks and adjacent blocks used by the Space-Time Motion Vector Prediction (STMVP) algorithm. [Figure 12] An example of adjacent samples for deriving illuminance compensation (IC) parameters is shown. [Figure 13A] Here is an example of a simplified four-parameter affine model. [Figure 13B] An example of a simplified 6-parameter affine model is shown. [Figure 14] Examples of affine motion vector fields (MVFs) for each subblock are shown. [Figure 15A] An example of a four-parameter affine model is shown. [Figure 15B] An example of a 6-parameter affine model is shown. [Figure 16] This example shows motion vector prediction for AF_INTER for genetically determined affine candidates. [Figure 17] An example of motion vector prediction for AF_INTER for a constructed affine candidate is shown. [Figure 18A] An example of a candidate block for AF_MERGE mode is shown. [Figure 18B] An example of CPMV predictor derivation for AF_MERGE mode is shown. [Figure 19] Examples of candidate positions for affine merge mode are shown. [Figure 20] An example of the UMVE search process is shown. [Figure 21] An example of a UMVE search point is shown. [Figure 22] This example demonstrates decoder-side motion vector refinement (DMVR) based on bilateral template matching. [Figure 23] An example flowchart of a decoding flow involving reshaping is shown. [Figure 24] An example of adjacent samples used in bilateral filtering is shown. [Figure 25] An example of a window covering two samples used in weight calculation is shown. [Figure 26] An example of a scan pattern is shown. [Figure 27] An example of the intermode decoding process is shown. [Figure 28] Here are some other examples of intermode decoding processes. [Figure 29] An example of an intermode decoding process with a post-reconstruction filter is shown. [Figure 30] Another example of an intermode decoding process with a post-reconstruction filter is shown. [Figure 31A] A flowchart illustrating an example video processing method is shown. [Figure 31B] A flowchart illustrating an example video processing method is shown. [Figure 31C] A flowchart illustrating an example video processing method is shown. [Figure 31D] A flowchart illustrating an example video processing method is shown. [Figure 32] This is a block diagram of an example hardware platform for implementing the visual media decoding or visual media encoding techniques described in this disclosure. [Figure 33] This is a block diagram of an example video processing system in which the disclosed technology may be implemented. [Modes for carrying out the invention]

[0018] With the increasing demand for higher resolution video, video coding methods and techniques are ubiquitous in current technology. Video codecs typically consist of electronic circuits or software that compress or decompress digital video and are constantly being improved to provide greater coding efficiency. Video codecs convert uncompressed video to a compressed format, and vice versa. There is a complex relationship between video quality, the amount of data (determined by the bitrate) used to represent the video, the complexity of the encoding and decoding algorithms, sensitivity to data loss and errors, ease of editing, random access, and end-to-end delay (latency). Compressed formats typically adhere to standard video compression specifications, such as the HEVC (High Efficiency Video Coding) standard (also known as H.265 or MPEG-H Part 2), the VVC (Versatile Video Coding) standard (to be completed), or other current and / or future video coding standards.

[0019] Embodiments of the disclosed technology may be applied to existing video coding standards (e.g., HEVC, H.265) and future standards to improve compression performance. Section headings are used herein to improve the readability of the description and are not intended to limit this disclosure or embodiments (and / or practices) to any particular section.

[0020] [1. Example of interpretation in HEVC / H.265] Video coding standards have improved significantly over the years, and now offer high coding efficiency and support for higher resolutions. Recent standards such as HEVC and H.265 are based on hybrid video coding structures that utilize time prediction and transformation coding.

[0021] [1.1 Example of prediction mode] Each interpreted prediction unit (PU) has motion parameters for one or two reference picture lists. In some embodiments, the motion parameters include a motion vector and a reference picture index. In other embodiments, the use of one of the two reference picture lists may be signaled using inter_pred_idc. In yet another embodiment, the motion vector may be explicitly coded as a difference for the predictor.

[0022] When a CU is coded in skip mode, one PU is associated with the CU, and no significant residual coefficients, coded motion vector differences, or reference picture indices exist. A merge mode is specified, in which the motion parameters of the current PU are obtained from neighboring PUs, including spatial and temporal candidates. The merge mode is applicable not only to skip mode but also to any interpreted PU. An alternative to the merge mode is the explicit transmission of motion parameters, in which the motion vector (more precisely, the motion vector differences (MVD) compared to the motion vector predictor), the corresponding reference picture index for each reference picture list, and the use of the reference picture list are explicitly signaled for each PU. Such a mode is referred to herein as advanced motion vector prediction (AMVP).

[0023] The PU is generated from one block of samples when the signaling indicates that one of the two reference picture lists should be used. This is called "uni-prediction." Uni-prediction is available for both P-slice and B-slice.

[0024] When the signaling indicates that both reference picture lists should be used, the PU is generated from two blocks of samples. This is called "bi-prediction." Bi-prediction is only available for B-slice.

[0025] Reference Picture List In HEVC, the term interpretation is used to describe predictions derived from data elements (e.g., sample values or motion vectors) of reference pictures other than the currently decoded picture. Similarly, in H.264 / AVC, a picture can be predicted from multiple reference pictures. The reference pictures used for interpretation are organized into one or more reference picture lists. A reference index identifies which reference picture in the list should be used to generate the prediction signal.

[0026] A single reference picture list, List0, is used for the P slice, while two reference lists, List0 and List1, are used for the B slice. It should be noted that the reference pictures included in List0 / 1 may be past and future pictures in terms of capture / display order.

[0027] [1.1.1 Embodiments Constituting Candidate Merge Modes] When a PU is predicted using merge mode, the index pointing to an entry in the merge candidates list is parsed from the bitstream and used to read motion information. The structure of this list can be briefly described in the following sequence: Step 1: Derivation of initial candidates Step 1.1: Derivation of spatial candidates Step 1.2: Redundancy check for spatial candidates Step 1.3: Derivation of temporal candidates Step 2: Insert additional candidates Step 2.1: Generating biprediction candidates Step 2.2: Insertion of zero motion candidates

[0028] Figure 1 shows an example of constructing a merge candidate list based on the sequence of steps described above. For the derivation of the initial merge candidate, up to four merge candidates are selected from five candidates located at different positions. For the derivation of the time merge candidate, up to one merge candidate is selected from two candidates. Since a fixed number of candidates per PU are considered by the decoder, if the number of candidates does not reach the maximum number of merge candidates signaled in the slice header (MaxNumMergeCand), additional candidates are generated. Since the number of candidates is constant, the index of the best merge candidate is encoded using truncated unary binarization (TU). When the size of the CU is equal to 8, all PUs of the current CU share a single merge candidate list which is identical to the merge candidate list of 2N × 2N prediction units.

[0029] [1.1.2 Configuration of Spatial Merge Candidates] In deriving spatial merge candidates, up to four merge candidates are selected from the candidates located at the positions shown in Figure 2. The derivation order is A1, B1, B0, A0, and B2. Position B2 is considered only if it is unavailable for any of the PUs at positions A1, B1, B0, or A0 (because it belongs to another slice or tile) or if it is intra-coded. After the candidate at position A1 is added, the addition of any remaining candidates undergoes a redundancy check. This ensures that candidates with the same motion information are removed from the list to improve coding efficiency.

[0030] To reduce computational complexity, not all possible candidate pairs are considered in the redundancy check described above. Instead, pairs connected by arrows in Figure 3 are considered, and candidates are added to the list only if the corresponding candidate used for the redundancy check does not have the same motion information. Another source of identical motion information is a “second PU” associated with a different partition than 2N×2N. As an example, Figures 4A and 4B represent the second PU for the N×2N and 2N×N cases, respectively. When the current PU is partitioned as N×2N, the candidate at position A1 is not considered for list construction. In some embodiments, adding this candidate could result in two predictive units with the same motion information, which is redundant to have only one PU in the coding unit. Similarly, position B1 is not considered when the current PU is partitioned as 2N×N.

[0031] [1.1.3 Configuration of Time Merge Candidates] In this step, only one candidate is added to the list. Specifically, in the derivation of this time merge candidate, the scaled motion vector is derived based on the co-located PU belonging to the picture with the smallest POC difference with the current picture in a given list of reference pictures. The list of reference pictures used for the derivation of co-located PUs is explicitly signaled in the slice header.

[0032] Figure 5 shows an example (as a dashed line) of the derivation of the scaled motion vector of a time merge candidate, which is scaled from the motion vector of the co-position PU using POC distances tb and td. tb is defined as the POC distance between the current picture and its reference picture, and td is defined as the POC difference between the co-position picture and its reference picture. The reference picture index of the time merge candidate is set to zero. For B slices, two motion vectors (one for reference picture list 0 and the other for reference picture list 1) are obtained and combined to generate a dual predictive merge candidate.

[0033] For identical position PU(Y) belonging to the reference frame, the position of the temporal candidate is selected between candidates C0 and C1, as shown in Figure 6. Position C1 is used if the PU at position C0 is unavailable, intracoded, or outside the current CTU. Otherwise, position C0 is used in the derivation of the temporal merge candidate.

[0034] [1.1.4 Configuration of further types of merge candidates] In addition to spatial and temporal merge candidates, there are two further types of merge candidates: combined bi-predictive merge candidates and zero merge candidates. Combined bi-predictive merge candidates are generated by utilizing spatial and temporal merge candidates. Combined bi-predictive merge candidates are used only for B-slice. Combined bi-predictive merge candidates are generated by combining the first reference picturelist motion parameter of the first candidate with the other second reference picturelist motion parameter. If these two tuples result in different motion hypotheses, they will form a new bi-predictive candidate.

[0035] Figure 7 illustrates an example of this process, where two candidates in the original list (710 on the left), having mvL0 and refIdxL0 or mvL1 and refIdx1, are used to generate composite bipredictive merge candidates that are added to the final list (720 on the right). There are numerous rules regarding the possible combinations for generating these additional merge candidates.

[0036] Zero-motion candidates are inserted to fill the remaining entries in the merge candidate list and reach the MaxNumMergeCand capacity. These candidates have a zero-space displacement and a reference picture index that starts at zero and increments each time a new zero-motion candidate is added to the list. The number of reference frames used by these candidates is 1 and 2 for unidirectional and bidirectional predictions, respectively. In some embodiments, redundancy checks are not performed on these candidates.

[0037] [1.2 Advanced Motion Vector Prediction (AMVP) Embodiment] AMVP (Advanced Motion Vector Prediction) utilizes the spatial-temporal correlation of motion vectors with adjacent PUs used for the explicit transmission of motion parameters. It first checks the availability of temporally adjacent PU positions to the left and above, constructs a motion vector candidate list by removing redundant candidates and adding zero vectors to make the candidate list of a constant length. The encoder can then select the best predictor from the candidate list and transmit the corresponding index indicating the selected candidate. Similarly, through merge index signaling, the index of the best motion vector candidate is encoded using truncated unary. The maximum value encoded in this case is 2 (see Figure 8). The following sections provide details on the process of deriving motion vector prediction candidates.

[0038] [1.2.1 Example of deriving AMVP candidates] Figure 8 summarizes the derivation process for motion vector prediction candidates, which may be performed using refidx as input for each reference picture list.

[0039] In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For the derivation of spatial motion vector candidates, the two motion vector candidates are ultimately derived based on the motion vectors of each of the five different PUs located at different positions, as shown in Figure 2.

[0040] For the derivation of time-motion vector candidates, one motion vector candidate is selected from two candidates derived based on two different positions at the same location. After the first list of space-time candidates is generated, duplicate motion vector candidates in the list are removed. If there are more than two potential candidates, motion vector candidates with a reference picture index greater than 1 in the associated reference picture list are removed from the list. If there are fewer than two space-time motion vector candidates, an additional zero motion vector candidate is added to the list.

[0041] [1.2.2 Construction of candidate spatial motion vectors] In deriving candidate spatial motion vectors, at most two candidates can be considered from the five potential candidates derived from the PU at the positions shown in Figure 2 (these positions are the same as those for motion merge). The order of derivations on the left side of the current PU is defined as A0, A1 and scaled A0, and scaled A1. The order of derivations on the upper side of the current PU is defined as B0, B1, B2, scaled B0, scaled B1, and scaled B2. For each side, there are therefore four cases that can be used as candidate motion vectors, two of which do not require the use of spatial scaling, and the other two which do. The four different cases can be briefly described as follows: • No spatial scaling (1) Same reference picture list and same reference picture index (same POC) (2) Different reference picture lists, but the same reference picture (same POC) • Spatial scaling (3) Same reference picture list, but different reference pictures (different POCs) (4) Different reference picture lists and different reference pictures (different POCs)

[0042] The case without spatial scaling is checked first, followed by the case with spatial scaling. Spatial scaling is considered when the POC differs between the reference picture of an adjacent PU and that of the current PU, regardless of the reference picture list. Scaling of the motion vector above is allowed to assist in the parallel derivation of the left and above MV candidates if all PUs of the left candidate are unavailable or intracoded. Otherwise, spatial scaling is not allowed for the motion vector above.

[0043] As shown in Figure 9, in the case of spatial scaling, the motion vectors of adjacent PUs are scaled in the same way as in the case of time scaling. The main difference is that the reference picture list and index of the current PU are given as input, and the actual scaling process is the same as that of time scaling.

[0044] [1.2.3 Construction of Candidate Time Motion Vectors] Aside from the derivation of the reference picture index, all processes for deriving the time merge candidate are the same as those for deriving the spatial motion vector candidate (shown in the example in Figure 6). In some embodiments, the reference picture index is signaled to the decoder.

[0045] [2. Example of a new interpretation method in JEM] In some embodiments, future video coding techniques have been explored using reference software known as JEM (Joint Exploration Model). In JEM, subblock-based prediction is introduced in several coding tools, including affine prediction, Alternative Temporal Motion Vector Prediction (ATMVP), Spatial-Temporal Motion Vector Prediction (STMVP), bi-directional optical flow (BIO), Frame-Rate Up Conversion (FRUC), Locally Adaptive Motion Vector Resolution (LAMVR), Overlapped Block Motion Compensation (OBMC), Local Illumination Compensation (LIC), and Decoder-side Motion Vector Refinement (DMVR).

[0046] [2.1 Example of subCU-based motion vector prediction] In a JEM using quadtrees plus binary trees (QTBT), each CU can have at most one set of motion parameters for each prediction direction. In some embodiments, a two-sub-CU level motion vector prediction method is considered in the encoder by dividing a large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU. An alternative time motion vector prediction (ATMVP) method allows each CU to fetch multiple sets of motion information from multiple blocks smaller than the current CU in a reference picture at the same location. In a space-time motion vector prediction (STMVP) method, the motion vector of a sub-CU is recursively derived by using a time motion vector predictor and a spatial neighbor motion vector. In some embodiments, motion compression for the reference frame may be disabled to maintain a more accurate motion field for sub-CU motion prediction.

[0047] [2.1.1 Example of ATMVP] The ATMVP (Alternative Temporal Motion Vector Prediction) method improves upon the Temporal Motion Vector Prediction (TMVP) method by fetching multiple sets of motion information (including motion vectors and reference indices) from smaller blocks than the current CU (Clock Unit).

[0048] Figure 10 shows an example of the ATMVP motion prediction process for CU1000. The ATMVP method predicts the motion vector of sub-CU1001 within CU1000 in two steps. The first step is to identify the corresponding block 1051 in reference picture 1050 by its time vector. Reference picture 1050 is also called the motion source picture. The second step is to divide the current CU1000 into sub-CU1001 and obtain the motion vector and reference index of each sub-CU from the block corresponding to each sub-CU.

[0049] In the first step, the reference picture 1050 and its corresponding block are determined by the motion information of the spatially adjacent blocks in the current CU1000. To avoid the iterative process of scanning adjacent blocks, the first merge candidate in the merge candidate list of the current CU1000 is used. The first available motion vector and its associated reference index are set to be the time vector and the index to the motion source picture. In this way, the corresponding block can be identified more accurately than with the TMVP. Here, the corresponding block (sometimes called the co-position block) is always located in the lower right or center position relative to the current CU.

[0050] In the second step, the corresponding block of subCU 1051 is identified by the time vector of motion source picture 1050, by adding that time vector to the coordinates of the current CU. For each subCU, the motion information of its corresponding block (e.g., the smallest motion grid covering the central sample) is used to derive the motion information of that subCU. After the motion information of the corresponding N×N block is identified, it is converted into the motion vector and reference index of the current subCU, similar to HEVC's TMVP. At this point, motion scaling and other procedures are applied. For example, the decoder checks whether the low-latency condition (e.g., the POC of all reference pictures of the current picture is smaller than the POC of the current picture) is satisfied, and, if applicable, uses the motion vector MVx (e.g., the motion vector corresponding to the reference picture list X) to predict the motion vector MVy (e.g., X is equal to 0 or 1 and Y is equal to 1-X) for each subCU.

[0051] [2.1.2 Example of STMVP] In the STMVP (spatial-temporal motion vector prediction) method, the motion vectors of subCUs are recursively derived according to the raster scan order. Figure 11 shows a CU with four subblocks, along with its adjacent blocks. Consider an 8x8 CU containing four 4x4 subCUs A(1101), B(1102), C(1103), and D(1104). The adjacent 4x4 blocks in the current frame are denoted as a(1111), b(1112), c(1112), and d(1114).

[0052] The derivation of the motion of subCU A begins by identifying its two spatial neighbors. The first neighbor is an N×N block (block c 1113) above subCU A(1101). If block c(1113) is unavailable or intracoded, other N×N blocks above subCU A(1101) are checked (starting from block c 1113 and moving from left to right). The second neighbor is a block (block b 1112) to the left of subCU A(1101). If block b(1112) is unavailable or intracoded, other blocks to the left of subCU A(1101) are checked (starting from block b 1112 and moving from top to bottom). For each list, the motion information obtained from neighboring blocks is scaled to a first reference frame for the given list. Next, the Time Motion Vector Predictor (TMVP) for subblock A(1101) is derived by following the same procedure for TMVP derivation as defined in HEVC. Motion information for the same block at position D(1104) is fetched and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors are averaged separately for each reference list. The averaged motion vectors are assigned as the motion vectors for the current subCU.

[0053] [2.1.3 Example of subCU motion prediction mode signaling] In some embodiments, sub-CU modes are available as additional merge candidates, and no additional syntax elements are required to signal the mode. Two additional merge candidates are added to the merge candidates for each CU to represent ATMVP and STMVP modes. In other embodiments, up to seven merge candidates may be used if the sequence parameter set indicates that ATMVP and STMVP are available. The coding logic for additional merge candidates is the same as for merge candidates in HM. This means that for each CU in a P or B slice, two or more RD checks may be required for two additional merge candidates. In some embodiments, e.g., JEM, all bins of the merge index are context-coded by CABAC (Context-based Adaptive Binary Arithmetic Coding). In other embodiments, e.g., HEVC, only the first bin is context-coded, and the remaining bins are context-bypass coded.

[0054] [2.2 Examples of LIC in JEM] Local illumination compensation (LIC) is based on a linear model for illuminance variation using scaling coefficients a and offset b. It is then adaptively enabled or disabled for each intermode-coded coding unit (CU).

[0055] When LIC is applied to a CU, the least squares error method is used to derive parameters a and b by using adjacent samples of the current CU and their corresponding reference samples. More specifically, as shown in Figure 12, subsampled (2:1 subsampled) adjacent samples of the CU and corresponding samples in the reference picture (identified by motion information of the current CU or sub-CU) are used.

[0056] [2.2.1 Derivation of Prediction Blocks] IC parameters are derived and applied separately for each prediction direction. For each prediction direction, a first prediction block is generated from the combined motion information, followed by a time prediction block obtained by applying the LIC model. Subsequently, two time prediction blocks are used to derive the final prediction block.

[0057] If a CU is coded in merge mode, the LIC flag is copied from the adjacent block in the same way as motion information is copied in merge mode. Otherwise, the LIC flag signals the CU to indicate whether or not an LIC applies.

[0058] If LIC is available for the picture, an additional CU level RD check is required to determine whether LIC applies to the CU. If LIC is available for the CU, Mean-Removed Sum of Absolute Difference (MR-SAD) and Mean-Removed Sum of Absolute Hadamard-Transformed Difference (MR-SATD) are used instead of SAD and SATD for integer Pell motion search and fractional Pell motion search, respectively.

[0059] To reduce encoding complexity, the following encoding scheme is applied in JEM: If there is no obvious change in illumination between the current picture and its reference pictures, LIC is disabled for the entire picture. To identify this situation, the encoder calculates histograms of the current picture and all of its reference pictures. If the histogram difference between the current picture and all of its reference pictures is less than a given threshold, LIC is disabled for the current picture; otherwise, LIC is enabled for the current picture.

[0060] [2.3 Example of an interpretation method in VVC] Several new coding tools exist to improve interprediction, including Adaptive Motion Vector Difference Decomposition (AMVR), Affine Prediction Mode, Triangular Prediction Mode (TPM), ATMVP, Generalized Bi-Prediction (GBI), and Bi-Prediction Optical Flow (BIO) for signaling MVD.

[0061] [2.3.1 Example of a coding block structure in VVC] In VVC, quadtree / binary tree / multitree (QT / BT / TT) structures are used to divide pictures into square or rectangular blocks. In addition to QT / BT / TT, another tree (also known as a dual coding tree) is also used in VVC for I-frames. According to this other tree, the coding block structure is signaled separately for rumor and chroma components.

[0062] [2.3.2 Example of Adaptive Motion Vector Difference Decomposition] In some embodiments, the motion vector difference (MVD) (between the motion vector of the PU and the predicted motion vector) is signaled in quarter-lumen samples when use_integer_my_flag is equal to 0 in the slice header. JEM introduces Local Adaptive Motion Vector Decomposition (LAMVR). In JEM, the MVD can be coded in quarter-lumen samples, integer lumen samples, or 4 lumen samples. MVD decomposition is controlled at the coding unit (CU) level, and the MVD decomposition flag is conditionally signaled for each CU that has at least one non-zero MVD component.

[0063] For CUs with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter-luma-sample MV precision is used for the CU. If the first graph (equal to 1) indicates that quarter-luma-sample MV precision is not used, other flags are signaled to indicate whether integer luma-sample MV precision or 4-luma-sample MV precision is used.

[0064] If the first MVD decomposition flag for a CU is zero, or if it is not coded for the CU (i.e., all MVDs within the CU are zero), then a quarter-luma-sample MV decomposition is used for that CU. If the CU uses integer luma-sample MV precision or a 4-luma-sample MVD with zero precision, then the MVPs in the AMVP candidate list for that CU are rounded to the corresponding precision.

[0065] [2.3.3 Example of Affine Motion Compensation Prediction] In HEVC, only a translation motion model is applied for motion compensation prediction (MCP). However, cameras and objects can have many types of motion, such as zoom in / out, rotation, projection, and other irregular motions. In VVC, simplified affine transform motion compensation prediction is applied by 4-parameter affine models and 6-parameter affine models. As shown in Figures 13A and 13B, the affine motion field of a block is described by two control point motion vectors (in the 4-parameter affine model using variables a, b, e, and f) or three control point motion vectors (in the 6-parameter affine model using variables a, b, c, d, e, and f).

[0066] The motion vector field (MVF) of a block is described by the following equations for the 4-parameter affine model and the 6-parameter affine model, respectively:

number

[0067] Here, (mv h 0,mv h 0) is the motion vector of the control point (CP) in the upper left corner, and (mv h 1,mv h 1) is the motion vector of the control point in the upper right corner, and (mv h 2,mv h 2) is the motion vector of the control point at the lower left corner, where (x,y) represents the coordinates of the representative point relative to the top left sample in the current block. The CP motion vector may be signaled (similar to affine AMVP mode) or derived on the fly (similar to affine mode). w and h are the width and height of the current block. In practice, the division is performed by a right shift with rounding. In VTM, the representative point is defined to be the center position of the subblock; for example, if the coordinates of the top left corner of the subblock relative to the top left sample in the current block are (xs,ys), then the coordinates of the representative point are defined to be (xs+2,ys+2). For each subblock (e.g., 4x4 in VTM), the representative point is used to derive the motion vector for the entire subblock.

[0068] Figure 14 shows an example of affine MVF for each subblock of block 1300, where subblock-based affine transformation prediction is applied to further simplify motion compensation prediction. To derive the motion vectors for each M×N subblock, the motion vectors of the center samples of each subblock are calculated according to equations (1) and (2) and can be rounded to motion vector fractional precision (e.g., 1 / 16 in JEM). A motion compensation interpolation filter is then applied to generate predictions for each subblock using the derived motion vectors. The interpolation filter for 1 / 16 Pellet is introduced by the affine mode. After the MCP, the high-precision motion vectors of each subblock are rounded and saved with the same precision as the normal motion vectors.

[0069] [2.3.3.1 Example of signaling for affine prediction] Similar to the translational motion model, there are also two modes for signaling side information using the affine model: AFFINE_INTER mode and AFFINE_MERGE mode.

[0070] [2.3.3.2 Example of AF_INTER mode] For CUs where both width and height are greater than 8, AF_INTER mode may be applied. An affine flag at the CU level is signaled in the bitstream to indicate whether AF_INTER mode is being used. In this mode, for each reference picture list (List0 or List1), the affine AMVP candidate list consists of three types of affine motion predictors in the following order, each candidate containing the estimated CPMV of the current block: The difference between the best CPMV found on the encoder side (e.g., mv0, mv1, mv2 in Figure 17) and the estimated CPMV is signaled. Furthermore, the index of the affine AMVP candidate from which the estimated CPMV is derived is further signaled.

[0071] 1) Inherited affine motion predictors The examination order is similar to that of the spatial MVP in the HEVC AMVP list construction. First, the left genetic affine motion predictor is derived from the first block in {A1,A0} that has the same reference picture as the current block and is affine coded. Second, the upper genetic affine motion predictor is derived from the first block in {B1,B0,B2} that has the same reference picture as the current block and is affine coded. The five blocks A1, A0, B1, B0, and B2 are shown in Figure 16. When it is known that adjacent blocks are coded in affine mode, the CPMV of the coding unit covering the adjacent blocks is used to derive a predictor of the CPMV of the current block. For example, when A1 is coded in non-affine mode and A0 is coded in 4-parameter affine mode, the left genetic affine MV predictor will be derived from A0. In this case, for the upper left CPMV in FIG. 18B, it is MV0 N and for the upper right CPMV, it is MV1 N The CPMV of the CU covering A0, which is represented by, is MV0 for the upper left position (having coordinates (x0, y0)), upper right position (having coordinates (x1, y1)), and lower right position (having coordinates (x2, y2)) of the current block C , MV1 C , MV2 C is used to derive the estimated CPMV of the current block represented by.

[0072] 2) Constructed affine motion predictor The constructed affine motion predictor consists of control point motion vectors (CPMVs) derived from adjacent inter-coded blocks as shown in FIG. 17 that have the same reference picture. When the current affine motion model is 4-parameter affine, the number of CPMVs is 2, and when the current affine motion model is 6-parameter affine, the number of CPMVs is 3. Upper left CPMV (Outer 1) TIFF0007875892000002.tif8127 (hereinafter, bar mv0) is derived by the MV at the first block within the inter-coded group {A, B, C} that has the same reference picture as the current block. Upper right CPMV (Outer 2) TIFF0007875892000003.tif8127 (hereinafter, bar mv1) is derived by the MV at the first block within the inter-coded group {D, E} that has the same reference picture as the current block. Lower left CPMV (Outside 3) TIFF0007875892000004.tif8127 (hereafter referred to as bar mv2) is derived by the MV in the first block within the interconnected group {F,G} that has the same reference picture as the current block. - If the current affine motion model is a 4-parameter affine, the constructed affine motion predictor is inserted into the candidate list only if both bar mv0 and bar mv1 are determined, i.e., if bar mv0 and bar mv1 are used as estimated CPMV for the top-left position (having coordinates (x0, y0)) and top-right position (having coordinates (x1, y1)) of the current block. - If the current affine motion model is 6-parameter affine, the constructed affine motion predictor is inserted into the candidate list only if all bars mv0, mv1, and mv2 are determined, i.e., bars mv0, mv1, and mv2 are used as estimated CPMVs for the top-left position (with coordinates (x0,y0)), top-right position (with coordinates (x1,y1)), and bottom-right position (with coordinates (x2,y2)) of the current block. When inserting a configured affine motion predictor into the candidate list, the pruning process is not applied.

[0073] 3) Standard AMVP motion predictor The following applies until the number of affine motion predictions reaches its maximum value. 1) Derive the affine motion predictor by setting all CPMVs equal to bar mv2 if available. 2) Derive the affine motion predictor by setting all CPMVs equal to bar mv1 if available. 3) Derive the affine motion predictor by setting all CPMVs equal to bar mv0 if available. 4) Derive the affine motion predictor by setting all CPMVs equal to HEVC TMVPs if available. 5) The affine motion predictor is derived by setting all CPMVs to zero MV. It should be noted that: (outside 4) TIFF0007875892000005.tif8127 (hereafter, bar mv) i This point has already been derived in constructed affine motion.

[0074] In AF_INTER mode, two or three control points are required when a 4 or 6-parameter affine mode is used, so two or three MVDs need to be coded for those control points, as shown in Figures 15A and 15B. In existing implementations, the MV may be derived as follows, for example, it predicts mvd1 and mvd2 from mvd0.

number

[0075] Here, bar mv i , mvd i and mv i These are the predicted motion vector, motion vector difference, and motion vector of the top-left pixel (i=0), top-right pixel (i=1), or bottom-left pixel (i=2), respectively, as shown in Figure 15B. In some embodiments, the addition of two motion vectors (e.g., mvA(xA,yA) and mvB(xB,yB)) is equal to the sum of the two components separately. For example, newMV=mvA+mvB implies that the two components of newMV are set to (xA+xB) and (yA+yB), respectively.

[0076] [2.3.3.3 Example of AF_MERGE mode] When a CU is applied in AF_MERGE mode, it obtains the first block coded in affine mode from the valid adjacent reconfigured blocks. The selection order of the candidate blocks is, from left to right, top, top right, bottom left, top left (represented by A, B, C, D, E respectively), as shown in Figure 18A. For example, when the adjacent bottom left block is coded in affine mode, as represented by A0 in Figure 18B, the control point (CP) motion vectors mv0 of the top left, top right, and bottom left corners of the adjacent CU / PU containing block A are obtained. N , mv1 N and mv2 N The data is fetched. Then, the movement vectors mv0 for the top left corner / top right / bottom left corner on the current CU / PU are fetched. C , mv1 C and mv2 C (Used only for 6-parameter affine models) is mv0 N , mv1 N and mv2 N It is calculated based on the following. It should be noted that in VTM-2.0, if the current block is affine coded, the subblock located in the upper left corner (e.g., a 4x4 block in VTM) stores mv0, and the subblock in the upper right corner stores mv1. If the current block is coded with a 6-parameter affine model, the subblock in the lower left corner stores mv2, and if not (by a 4-parameter affine model), LB stores mv2'. Other subblocks store MV, which is used for MC.

[0077] After the CPMV v0 and v1 of the current CU are calculated according to the affine motion model of equations (1) and (2), the MVF of the current CU may be generated. To identify whether the current CU is coded in AF_MERGE mode, an affine flag may be signaled in the bitstream if there is at least one adjacent block coded in affine mode.

[0078] In some embodiments (for example, JVET-L0142 and JVET-L0632), the affine merge candidate list may consist of the following steps.

[0079] 1) Insertion of affine candidates by genetics An inherited affine candidate means that the candidate is derived from the affine motion model of its valid adjacent affine-coded block. Up to two inherited affine candidates are derived from the affine motion model of adjacent blocks and inserted into the candidate list. For left-side predictors, the scan order is {A0, A1}, and for upper-side predictors, the scan order is {B0, B1, B2}.

[0080] 2) Insertion of the constructed affine candidate If the number of candidates in the affine merge candidate list is less than MaxNumAffineCand (which is set to 5 in this configuration), constructed affine candidates are inserted into the candidate list. A constructed affine candidate is one in which the candidate is formed by combining the adjacent motion information of each control point.

[0081] a) The motion information of the control points is first derived from the specified spatial and temporal neighborhoods shown in Figure 19. CPk(k=1,2,3,4) represents the k-th control point. A0, A1, A2, B0, B1, B2, and B3 are the spatial positions for predicting CPk(k=1,2,3). T is the temporal position for predicting CP4. The coordinates of CP1, CP2, CP3, and CP4 are (0,0), (W,0), (H,0), and (W,H), respectively, where W and H are the width and height of the current block. The motion information for each control point is acquired according to the following priority order: For CP1, the check priority is B2 → B3 → A2. B2 is used if it is available. If B3 is available, B3 is used. If both B2 and B3 are unavailable, A2 is used. If all three candidates are unavailable, it is not possible to obtain information about the movement of CP1. For CP2, the check priority is B1 → B0. For CP3, the check priority is A1 → A0. For CP4, T is used.

[0082] b) Secondly, the combination of control points is used to construct the affine merge candidate. I. Motion information from three control points is required to construct a candidate for a six-parameter affine. The three control points can be selected from one of the following four combinations ({CP1,CP2,CP4}, {CP1,CP2,CP3}, {CP2,CP3,CP4}, {CP1,CP3,CP4}). The combinations {CP1,CP2,CP3}, {CP2,CP3,CP4}, and {CP1,CP3,CP4} will be converted into a six-parameter motion model represented by the upper left, upper right, and lower left control points. II. The motion vectors of two control points are required to construct a candidate for a four-parameter affine. The two control points can be selected from one of the following six combinations ({CP1,CP4},{CP2,CP3},{CP1,CP2},{CP2,CP4},{CP1,CP3},{CP3,CP4}). The combinations {CP1,CP4}, {CP2,CP3}, {CP2,CP4}, {CP1,CP3}, {CP3,CP4} will be transformed into a four-parameter motion model represented by the upper-left and upper-right control points. III. The combinations of affine candidates that have been constructed are inserted into the candidate list in the following order: {CP1,CP2,CP3},{CP1,CP2,CP4},{CP1,CP3,CP4},{CP2,CP3,CP4},{CP1,CP2},{CP1,CP3},{CP2,CP3},{CP1,CP4},{CP2,CP4},{CP3,CP4}. i. For a combination reference list X (where X is 0 or 1), the reference index with the highest utilization rate among the control points is selected as the reference index for list X, and the motion vector pointing to the difference reference picture is scaled accordingly. After a candidate is derived, a full pruning process is performed to check if the same candidate has already been inserted into the list. If the same candidate exists, the derived candidate is discarded.

[0083] 3) Padding using zero motion vectors If the number of candidates in the affine merge candidate list is less than 5, zero-motion vectors with zero reference indices are inserted into the candidate list until the list is full. More specifically, for the subblock merge candidate list, the 4-parameter merge candidate has its MV set to (0,0) and its prediction direction set to one-sided prediction (for P-slices) and two-sided prediction (for B-slices) from list 0.

[0084] [2.3.4 Example of Merge by Motion Vector Difference (MMVD)] JVET-L0054 presents the Ultimate Motion Vector Expression (UMVE), also known as Merge mode with Motion Vector Difference (MMVD). UMVE is used in either skip or merge mode according to the proposed motion vector representation method.

[0085] UMVE reuses merge candidates in the same way as those included in the normal merge candidate list in VVC. From these merge candidates, a base candidate is selectable and further extended by the proposed motion vector representation.

[0086] UMVE provides a new motion vector difference (MVD) representation in which the starting point, magnitude of motion, and direction of motion are used to represent the MVD.

[0087] This proposed technique uses the merge candidate list as is. However, only candidates with the default merge type (MRG_TYPE_DEFAULT_N) are considered for UMVE extension.

[0088] The basic candidate index defines the starting point. The basic candidate index indicates the best candidate among the candidates in the list, as follows: [Table 1]

[0089] If the number of basic candidates is equal to 1, the basic candidate IDX is not signaled.

[0090] The distance index represents information about the magnitude of the movement. The distance index indicates a predefined distance from the starting point. The predefined distances are as follows: [Table 2]

[0091] The direction index represents the direction of the MVD relative to the starting point. The direction index can represent four directions, as shown below. [Table 3]

[0092] In some embodiments, the UMVE flag is signaled immediately after the skip and merge flags are sent. The UMVE flag is parsed if the skip and merge flags are true. The UMVE syntax is parsed if the UMVE flag is equal to 1. However, if it is not 1, the AFFINE flag is parsed. If the AFFINE flag is equal to 1, it is affine mode. However, if it is not 1, the skip / merge index is parsed for the VTM's skip / merge mode.

[0093] An additional line buffer is not required for UMVE candidates, as the software's skip / merge candidates are used directly as the base candidates. Using the input UMVE index, MV interpolation is determined immediately before motion compensation. Therefore, there is no need to maintain a long line buffer.

[0094] Under current general test conditions, either the first or second merge candidate in the merge candidate list may be selected as the primary candidate.

[0095] [2.3.5 Example of Decoder-Side Motion Vector Refinement (DMVR)] In dual-prediction operation, two prediction blocks, each formed using the motion vector (MV) from List 0 and the MV from List 1, are combined to form a single prediction signal for predicting one block region. In decoder-side motion vector refinement (DMVR) method, the two motion vectors of the dual prediction are further refined.

[0096] In JEM design, motion vectors are refined by a bilateral template matching process. Bilateral template matching is applied in the decoder to perform a strain-based search between the bilateral template and the reconstructed samples in the reference picture in order to obtain the refined MV without the transmission of additional motion information. An example is shown in Figure 22. The bilateral template is generated as a weighted combination (i.e., average) of two predicted blocks from the initial MV0 in List 0 and MV1 in List 1, respectively, as shown in Figure 22. The template matching operation consists of calculating a cost index between the generated template and the sample region (around the initial predicted blocks) in the reference picture. For each of the two reference pictures, the MV that yields the minimum template cost is considered the updated MV in that list to replace the original MV. In JEM, nine MV candidates are searched for each list. The nine MVs include the original MV and eight surrounding MVs offset by one luma sample relative to the original MV in either the horizontal or vertical direction, or both. Ultimately, two new MVs, namely MV0' and MV1' shown in Figure 22, are used to generate the final biprediction results. The absolute difference sum (SAD) is used as the cost metric. Note that when calculating the cost of a prediction block generated by one surrounding MV, the rounded MV (to an integer Pell) is actually used to obtain the prediction block instead of the actual MV.

[0097] To further simplify the DMVR process, JVET-M0147 proposed several changes to the design in JEM. More specifically, the DMVR adopted for VTM-4.0 (to be released soon) has the following main features: ○ Early termination at position (0,0) SAD between List 0 and List 1 ○DMVR block size W×H>=64&&H>=8 ○ For DMVRs with a CU size > 16x16, divide the CU into multiple 16x16 subblocks. ○Reference block size (W+7)×(H+7) (Regarding Ruma) ○ 25-point SAD-based integer Pell search (i.e., (+-)2 narrowed search range, single stage) ○DMVR based on bilinear interpolation ○MVD mirroring between List 0 and List 1 to enable bilateral matching ○ Sub-perfect refinement based on the "Parametric error surface equation" ○(If necessary) Luma / Chroma MC with reference block padding ○Refined MV used only for MC and TMVP

[0098] [2.3.6 Example of Composite Intra-Interface Prediction (CIIP)] JVET-L0100 proposes multi-hypothesis prediction, and combined intra and inter prediction is one method for generating multiple hypotheses.

[0099] When multiple hypothesis prediction is applied to improve intra-modes, it combines one intra-prediction and one merge-indexed prediction. In merge CU, one flag signals the merge mode to select an intra-mode from the intra-candidate list if the flag is true. For the lumar component, the intra-candidate list is derived from four intra-prediction modes, including DC, planar, horizontal, and vertical modes, and the size of the intra-candidate list can be 3 or 4 depending on the block shape. If the CU width is greater than twice the CU height, the horizontal mode is excluded from the intra-mode list, and if the CU height is greater than twice the CU width, the vertical mode is excluded from the intra-list modes. The one intra-prediction mode selected by the intra-mode index and the one merge-indexed prediction selected by the merge index are combined using a weighted average. For the chroma component, DM is always applied without additional signaling. The weights for combining predictions are explained below. Equal weights are applied when DC or planar mode is selected, or when the CB width or height is less than 4. For CBs with a width and height of 4 or more, when horizontal / vertical mode is selected, one CB is initially divided into four equal area regions in the vertical / horizontal directions. Assuming i is from 1 to 4, and (w_intra1,w_inter1)=(6,2), (w_intra2,w_inter2)=(5,3), (w_intra3,w_inter3)=(3,5), and (w_intra4,w_inter4)=(2,6), (w_intra i ,w_inter iEach weight set, represented as (w_intra1,w_inter1), will be applied to the corresponding region. (w_intra4,w_inter4) is for the region closest to the reference sample, and (w_intra4,w_inter4) is for the region furthest from the reference sample. The combined prediction can then be calculated by summing the two weighted predictions and right-shifting by 3 bits. Furthermore, the intra-prediction mode for the predictor's intra-hypothesis can be saved for the reference of the next adjacent CU.

[0100] [2.4 In-loop reshaping (ILR) in JVET-M0427] The basic idea of in-loop reshaping (ILR) is to transform the original signal (predicted / reconstructed signal) from the first region into a second region (the reshaped region).

[0101] The Inloop Luma-Shaper is implemented as a pair of reference tables (LUTs), but only one of the two LUTs needs to be signaled, while the other can be computed from the signaled LUT. Each LUT is a one-dimensional, 10-bit, 1024-entry mapping table (1D-LUT). One of the LUTs is the forward LUT (FwdLUT), and the input Luma-code value Y i The changed value Y r :Y r =FwdLUT[Y i It maps to ]. The other LUT is an inverse LTU (InvLUT), and the modified code value Y r of (outside 5) Map to TIFF0007875892000010.tif9127 ( (outside 6) TIFF0007875892000011.tif9127 is Y i This represents the reconstructed value.

[0102] [2.4.1 Piecewise Linear (PWL) Model] In some embodiments, piece-wise linear (PWL) is implemented as follows:

[0103] Let x1 and x2 be the two input pivot points, and let y1 and y2 be their corresponding output pivot points in a single segment. The output value y for any input value x between x1 and x2 can be interpolated by the following equation: y=((y2-y1) / (x2-x1))×(x-x1)+y1

[0104] For a fixed-point implementation, the formula is: y=((m×x+2FP_PREC-1)>>FP_PREC)+c It can be rewritten as follows: Here, m is a scalar, c is an offset, and FP_PREC is a constant value specifying the precision.

[0105] In the CE-12 software, the PWL model is used to recalculate the 1024-entry FwdLUT and InvLUT mapping tables, but it should be noted that the PWL model also allows implementations to calculate the same mapping values on the fly without recalculating the LUTs.

[0106] [2.4.2 Test CE12-2] [2.4.2.1 Ruma Reshaping] Test 2 of inloupruma reshaping (i.e., proposed CE12-2) provides a less complex pipeline that also eliminates the decoding latency of block-by-block intra-prediction in interslice reconstruction. Intra-prediction is performed in the reshaping region for both interslice and intraslice.

[0107] Intra-prediction is always performed in the reshaping region, regardless of the slice type. According to such an arrangement, intra-prediction can be started immediately after the previous UT reconstruction has been performed. Such an arrangement can also provide an integrated process for intra-mode instead of being slice-dependent. Figure 23 shows a block diagram of the CE12-2 composite process based on the mode.

[0108] CE12-2 also tests a 16-segment piecewise linear (PWL) model for lumens and chromens residual scaling, instead of the 32-segment PWL model used in CE12-1.

[0109] Interslice reconstruction by an in-loupluma reshaper in CE12-2 (light green shaded blocks indicate signals in the reshaping region: luma residual, predicted intraluma, and reconstructed intraluma).

[0110] [2.4.2.2 Luma-dependent Chroma Residual Scaling (LCRS)] Luma-dependent chroma residual scaling (LCRS) is a multiplicative process implemented using fixed-point integer arithmetic. Chroma residual scaling compensates for the interaction between the lumern signal and the chroma signal. Chroma residual scaling is applied at the TU level. More specifically, the following is applied: ○For intranets, the reconstructed luma is averaged. ○For interoperability, the predicted luma is averaged.

[0111] The mean is used to identify the index within the PWL model. The index identifies the scaling factor cScaleInv. The chroma residual is multiplied by this number.

[0112] The chroma scaling coefficient is calculated from the forward-mapped predicted rumor values rather than from the reconstructed rumor values.

[0113] [2.4.2.3 Signaling of ILR side information] Parameters are (currently) sent in the tile group header (similar to ALF). They reportedly require 40-100 bits. The following table is based on version 9 of JVET-L1001. Added syntax is shown in larger bold. Deletions are indicated using bold double square brackets [[deleted portion]]. [Table 4] [Table 5] [Table 6] [Table 7] TIFF0007875892000016.tif171157

[0114] [2.4.2.4 Using ILR] On the encoder side, each picture (or tile group) is first converted into a reshaped region. All coding processes are then performed within this reshaped region. For intra predictions, adjacent blocks reside in the reshaped region. For inter predictions, the reference block (generated from the original region via the decoded picture buffer) is first converted into a reshaped region. The residual is then generated and coded into the bitstream.

[0115] After the entire picture (or tile group) has finished encoding / decoding, the samples within the reshaped region are converted back to their original region, and then deblocking filters and other filters are applied.

[0116] Forward reshaping of the prediction signal is disabled in the following cases: ○The current block is intracoded. ○ The current block is coded as CPR (current picture referencing) (also known as intrablock copy (IBC)). ○The current block is coded as a composite inter-intra mode (CIIP), and forward reshaping is disabled for intra-predictive blocks.

[0117] [JVET-N0805] To avoid signaling ILR side information in the tile group header, JVET-N0805 proposes signaling it in the APS. This involves the following main ideas: - Optionally, send LMCS parameters via SPS. - Defines APS types for ALF and LMCS parameters. Each APS can only have one type. - When the LMCS tool is enabled, TGH will have a flag indicating the presence or absence of an LMCS aps_id. If signaling is not performed, the SPS parameter will be used. *A semantic constraint needs to be added to ensure that there is always a valid reference when the tool is enabled.

[0118] [2.5.2.5.1 Implementation of the proposed design on JVET-M1001 (VVC Working Draft 4)] [Table 8] TIFF0007875892000018.tif250168TIFF0007875892000019.tif238156

[0119] [2.4.2.6 JVET-N0138] This contribution proposes the extended use of an adaptation parameter set (APS) for carrying reshaper model parameters in addition to ALF parameters. At the last conference, it was decided that ALF parameters would be carried by the APS instead of the tile group header to improve coding efficiency by avoiding unnecessary redundant signaling of parameters across multiple tile groups. For the same reason, it is proposed that reshaper model parameters be carried by the APS instead of the tile group header. APS type information, along with the APS ID, is required in the APS syntax to identify the type of parameter within the APS (whether it is ALS or a reshaper model).

[0120] [Table 9]

[0121] [2.5 Virtual Pipeline Data Unit (VPDU)] A virtual pipeline data unit (VPDU) is defined as a non-overlapping M×M-lumen (L) / N×N-chromen (C) unit within a picture. In a hardware decoder, consecutive VPDUs are processed simultaneously by multiple pipeline stages, with different stages processing different VPDUs concurrently. Since the VPDU size is roughly proportional to the buffer size in most pipeline stages, keeping the VPDU size small is extremely important. In an HEVC hardware decoder, the VPDU size is set to the maximum translated block (TB) size. Expanding the maximum TB size from 32×32-L / 16×16-C (similar to HEVC) to 64×64-L / 32×32-C (similar to current VVC) can yield a coding gain that, as expected, results in four times the VPDU size (64×64-L / 32×32-C) compared to HEVC. However, in addition to quadtree (QT) coding unit (CU) partitioning, ternary trees (TT) and binary trees (BT) are employed in VVC to achieve further coding gains, and TT and BT partitioning can be recursively applied to 128×128-L / 64×64-C coding tree blocks (CTUs). This is said to result in a VPDU size 16 times larger (128×128-L / 64×64-C) compared to HEVC.

[0122] In the current VVC design, VPDU is defined as 64×64-L / 32×32-C.

[0123] [2.6 Adaptive Parameter Set] The Adaptive Parameter Set (APS) is used in VVC to carry ALF parameters. The tile group header contains an aps_id, which is conditionally present when ALF is enabled. The APS contains the aps_id and ALF parameters. A new NUT (a NAL unit type as seen in AVC and HEVC) value is assigned to the APS (from JVET-M0132). For common test conditions in VTM-4.0 (as shown), it is proposed to send the APS with each picture using aps_id=0. For the time being, the range of APS IDs is 0 to 31, and APS can be shared across pictures (they may be different in different tile groups within a picture). The ID value should be fixed-length encoded if present. The ID value cannot be reused in different content within the same picture.

[0124] [2.7 Related Tools] [2.7.1 Diffusion Filter (DF)] JVET-L0157 proposes a diffusion filter, and the intra / interpretation signal of the CU may be further modified by the diffusion filter.

[0125] Uniform diffusion filter. A uniform diffusion filter is defined as follows: I as or h IV This is achieved by convolving the prediction signal with a fixed mask provided as such. In addition to the prediction signal itself, one line of reconstructed samples to the left and above the block is used as input for the filtered signal, and the use of these reconstructed signals can be avoided in interblocks.

[0126] Let pred be the predicted signal in a given block, obtained by intra or motion-compensated prediction. To handle the filter boundary values, the predicted signal is the predicted signal pred ext It needs to be extended to this. This extended prediction can be formed in two ways.

[0127] As an intermediate step, one line of the reconstructed sample to the left and above the block is added to the prediction signal, and then the resulting signal is mirrored in all directions. Alternatively, only the prediction signal itself is mirrored in all directions. The latter extension is used for interblocks. In this case, only the prediction signal itself is mirrored in all directions. ext It does not have input for that purpose.

[0128] filter h I If it should be used, the predicted signal pred is set to h using the boundary extension described above. I It is proposed to replace with ×pred. Here, the filter mask h I teeth,

number

[0129] filter h IV If it should be used, the prediction signal should be h IV It is proposed to replace with ×pred. Here, the filter h IV teeth, h IV =h I ×h I ×h I ×h I It is given as follows.

[0130] Directional diffuse filter. Instead of using a signal-adaptive spread filter, use a directional filter, i.e., a horizontal filter h hor and vertical filter h ver These are used, and they still have a fixed mask. More precisely, the mask h from the previous section. IThe corresponding uniform diffusion filtering is simply restricted to being applied only along the vertical or horizontal direction. Vertical filtering is achieved by applying a fixed filter mask to the prediction signal:

number

[0131] [2.7.2 Bilateral Filter (BF)] The bilateral filter, proposed in JVET-L0406, is always applied to rumor blocks with non-zero transformation coefficients and slice quantization parameters greater than 17. Therefore, there is no need to signal the use of the bilateral filter. The bilateral filter, when applied, is performed on the decoded samples immediately after the inverse transformation. Furthermore, the filter parameters, i.e., the weights, are explicitly derived from the coded information.

[0132] The filtering process is:

number

[0133] More specifically, the weight W associated with the k-th neighboring sample. k (x) is defined as follows:

number

[0134] To better capture the statistical characteristics of the video signal and improve filter performance, the weight function obtained from equation (2) is set to depend on the block partitioning (minimum size) parameter and coding mode, as listed in Table 4, σ d It is adjusted by parameters. [Table 10]

[0135] To further improve coding performance, for intercoded blocks where the TU is not split, the strength difference between the current sample and one of its adjacent samples is replaced by the representative strength difference between the two windows covering the current sample and the adjacent sample. Thus, the formula for the filtering process is:

number

[0136] [2.7.3 Hadamard transform region filter (HF)] JVET-K0068 implements an in-loop filter in a one-dimensional Hadamard transform domain, applied at the CU level after reconstruction, without multiplication. The proposed filter is applied to all CU blocks that satisfy predefined conditions, and the filter parameters are derived from coded information.

[0137] The proposed filtering is always applied to lumer reconstruction blocks with non-zero transformation coefficients when the slice quantization parameter is greater than 17, except for 4x4 blocks. The filter parameters are explicitly derived from the coded information. The proposed filter is performed on the decoded samples immediately after the inverse transformation, if applied.

[0138] For each pixel from the reconstructed block, the pixel processing involves the following steps: ○ Scan the four adjacent pixels surrounding the current processing pixel, including the one being processed, according to the scan pattern. ○ 4-point Hadamard transform of the read pixels ○The following formula

number

[0139] Here, (i) is the index of the spectral component in the Hadamard spectrum, R(i) is the spectral component of the reconstructed pixel corresponding to the index, and σ is given by the following equation:

number

[0140] An example of a scan pattern is shown in Figure 26, where A is the current pixel and {B, C, D} are within the current CU.

[0141] [3. Disadvantages of existing implementations] Existing methods of implementing ILR (Information and Live Review) may have the following drawbacks:

[0142] 1) Signaling ILR side information in the tile group header is inappropriate because it requires too many bits. Furthermore, prediction between different picture / tile groups is not permitted. Therefore, ILR side information needs to be transmitted for each tile group, which can lead to coding loss at low bitrates, especially at low resolutions.

[0143] 2) The interaction between ILR and DMVR (or other newly introduced coding tools) is unclear. For example, ILR is applied to the interpredicted signal to transform the original signal into a reshaping region, and the decoded residual is in the reshaping region. On the other hand, DMVR also relies on the predicted signal to refine the motion vector of a single block. It is unclear whether DMVR should be applied in the original region or the reshaping region.

[0144] 3) The interaction between ILR and screen content coding tools, such as palette, B-DPCM, IBC, trans-skipping, transquant-bypass, and I-PCM mode, is unclear.

[0145] 4) Luma-dependent chroma residual scaling is used in ILR. This introduces an additional delay (due to the dependency between luma and chroma) that is not beneficial to the hardware design.

[0146] 5) The goal of VPDU is to ensure that processing of one 64x64 square region is completed before processing of other 64x64 square regions begins. However, according to the ILR design, there are no restrictions on the use of ILR that could cause a violation of VPDU, as the chroma depends on the lumar's prediction signal.

[0147] 6) If all zero coefficients occur in a single CU, the prediction block and reconstruction block still perform forward and reverse reshaping processes. This wastes computational complexity.

[0148] 7) JVET-N0138 proposes signaling ILR information in the APS. Several new problems may arise from this solution. For example, two types of APS are designed, but the adaptation_parameter_set_id signaling for ILR may refer to an APS that does not contain ILR information. Similarly, the adaptation_parameter_set_id signaling for Adaptive Loop Filtering (ALF) may refer to an APS that does not contain ALF information.

[0149] [4. Exemplary Method of In-Loop Reshaping for Video Coding] The currently disclosed embodiments of the technology address the shortcomings of existing implementations, thereby bringing greater coding efficiency to video coding. In-loop reshaping methods based on the disclosed technology can enhance both existing and future video coding standards and are illustrated in the following examples, which describe various implementations. The examples of the disclosed technology provided below illustrate general concepts and are not intended to be construed as limiting. In the examples, various features described in those examples may be combined unless expressly indicated otherwise. It should be noted that some of the proposed technologies may be applied to existing candidate list construction processes.

[0150] In this specification, decoder-side motion vector derivation (DMVD) includes methods such as DMVR and FRUC, which perform motion estimation to derive or refine block / subblock motion information, and BIO, which perform sample-by-sample motion refinement.

[0151] 1. The motion information refinement process in DMVD technologies such as DMVR may rely on information in the reshaping domain. a. For example, the prediction blocks generated from the reference picture in the original region may first be converted into a reshaped region before being used for motion information refinement. i. Alternatively, cost calculations (e.g., SAD, MR-SAD) / gradient calculations are performed in the reshaping region. ii. Alternatively, and furthermore, after the motion information has been refined, the reshaping process is disabled for the predicted blocks generated by the refined motion information. b. Alternatively, the motion information refinement process in DMVD technologies such as DMVR may rely on information from the original domain. i. The DMVD process may be invoked by the prediction block in the original region. ii. For example, after motion information refinement, the predicted blocks obtained from the refined motion information, or the final predicted blocks (e.g., a weighted average of two predicted blocks), may be further transformed into reshaping regions to generate the final reconstructed blocks. iii. Alternatively, and furthermore, after the motion information has been refined, the reshaping process is disabled for the predicted blocks generated by the refined motion information.

[0152] 2. It is proposed to align the sample regions (either the original region or the reshaped region) derived from the reference picture used to derive the sample and local illumination compensation (LIC) parameters within the current tile / tile group / picture. a. For example, the reshaping region is used to derive the LIC parameters. i. Alternatively, furthermore, the samples (e.g., the reference sample in the reference picture (with or without interpolation) and the adjacent / non-adjacent samples of the reference sample (with or without interpolation)) may first be converted into reshaped regions before being used to derive the LIC parameters. b. For example, the original region is used to derive the LIC parameters. i. Alternatively, and furthermore, spatially adjacent / non-adjacent samples of the current block (e.g., within the current tile group / picture / tile) may first be converted back to their original regions before being used to derive the LIC parameters. c. If the LIC parameter is derived in one region, the same region of the prediction block should be used when applying the LIC parameter to that prediction block. i. For example, when a. above is called, the reference block may be converted into a reformed region, and the LIC model is applied to the reformed reference block. ii. For example, when b. above is called, the referenced block remains in its original region, and the LIC model applies to the referenced block in its original region. d. For example, the LIC model is applied to prediction blocks in a reshaping region (for instance, the prediction block is first transformed into a reshaping region by forward reshaping). e. For example, the LIC model may first be applied to the prediction block in the original region, and then the final prediction block, which depends on the prediction block to which the LIC has been applied, may be transformed into a reshaped region (e.g., by forward reshaping) and used to derive the reconstructed block. f. The above method may be extended to other coding tools that rely on both spatially adjacent / non-adjacent samples and reference samples within reference pictures.

[0153] 3. For filters applied to prediction blocks (e.g., diffusion filters (DF)), the filters are applied to the prediction blocks in the original region. a. Alternatively, and subsequently, reshaping is applied to the filtered prediction signal to generate a reconstructed block. b. An example of the process for intercoding is shown in Figure 27. c. Alternatively, the filter is applied to the predictive signal in the reshaping region. i. Alternatively, furthermore, reshaping may be applied first to the prediction block, and then a filtering method may be further applied to the reshaped prediction block to generate a reconstructed block. ii. An example of the process for intercoding is shown in Figure 28. d. The filter parameters may depend on whether ILR is enabled or not.

[0154] 4. For filters applied to the reconstructed block (e.g., bilateral filter (BF), Hadamard transform region filter (HF)), the filter is applied to the reconstructed block in the original region, not the reshaped region. a. Alternatively, the reconstructed block in the reshaping region may first be converted back to the original region, and then a filter may be applied and used to generate the reconstructed block. b. An example of the process for intercoding is shown in Figure 29. c. Alternatively, the filter may be applied to the reconstructed block in the reshaping region. i. Alternatively, and even before applying reverse reshaping, a filter may be applied first. The filtered reconstructed block may then be converted back to the original region. ii. An example of the process for intercoding is shown in Figure 30. d. The filter parameters may depend on whether ILR is enabled or not.

[0155] 5. It is proposed to apply a filtering process to the remodeled block in the remodeling region (for example, after intra / inter or other types of prediction methods). a. For example, the deblocking filter (DBF) process is performed in the reshaping region. In this case, reverse reshaping is not applied before the DBF. i. In this case, the DBF parameters may differ depending on whether or not reshaping is applied. ii. In one example, the DBF process may depend on whether reshaping is enabled. 1. In one example, this method is applied when DBF is called in the original region. 2. Alternatively, this method is applied when DBF is called in the reshaped region. b. In one example, the sample adaptive offset (SAO) filtering process is performed in the reshaped region. In this case, reverse reshaping is not applied before SAO. c. In one example, the adaptive loop filter (ALF) filtering process is performed in the reshaped region. In this case, reverse reshaping is not applied before ALF. d. Alternatively, furthermore, reverse reshaping may be applied to the block after DBF. e. Alternatively, furthermore, reverse reshaping may be applied to the block after SAO. [[ID=?]]f. Alternatively, furthermore, reverse reshaping may be applied to the block after ALF. g. The above filtering method may be replaced with other types of filtering methods.

[0156] 6. It is proposed to signal the ILR parameters with a new parameter set (e.g., ILR APS) instead of the tile group header. a. In one example, the tile group header may include aps_id. Alternatively, furthermore, aps_id may conditionally exist when ILR is enabled. b. In one example, the ILR APS includes aps_id and ILR parameters. c. In one example, a new NUT (NAL unit type as seen in AVC and HEVC) value is assigned to the ILR APS. d. In one example, the range of the ILR APS ID value is from 0 to M (e.g., M = 2K - 1). e. In one example, the ILR APS may be shared across pictures (it may be different for different tile groups within a picture). It seems there is a small error in the original text where "[[ID=?]]" is present. I've translated it as best as possible while maintaining the integrity of the text. If this is a known error in the original, you may want to correct it for future reference.f. In one example, the ID value, if present, may be fixed-length encoded. Alternatively, it may be encoded by exponential-Golomb (EG) coding, truncated unary, or other binarization methods. g. In one example, the ID value cannot be reused by different contents within the same picture. h. In one example, the APS for the ILR APS and ALF parameters may share the same NUT. i. Alternatively, the ILR parameter may be carried in the current APS for the ALF parameter. In this case, the above-described method for the ILR APS may be replaced by the current APS. j. Alternatively, the ILR parameter may be carried in the SPS / VPS / PPS / sequence header / picture header. k. In one example, the ILR parameter may include reshaper model information, utilization of the ILR method, and chroma residual scaling factors. l. Alternatively, furthermore, the ILR parameter may be signaled in one example (e.g., in the APS), and / or the utilization of the ILR may be further signaled at a second level (e.g., in the tile group header). m. Alternatively, furthermore, predictive coding may be applied to code the ILR parameter with different APS indexes.

[0157] 7. Instead of applying the luma-dependent chroma residual scaling (LCRS) to chroma blocks, it is proposed to apply a forward / backward reshaping process to chroma blocks to remove the dependency between luma and chroma. a. In one example, one piecewise linear (PWL) model of one partition and / or a forward / backward reference table may be used for one chroma component. Alternatively, two PWL models and / or forward / backward reference tables may be used respectively for coding two chroma components. b. For example, the PWL model and / or forward / reverse references of Chroma may be derived from the PWL model and / or forward / reverse reference tables of Luma. i. For example, it is not necessary to further signal the chroma's PWL model / reference table. c. For example, the chroma's PWL model / forward / reverse lookup tables may be signaled by SPS / VPS / APS / PPS / sequence header / picture header / tile group header / tile header / CTU row / group of CTU / region.

[0158] 8. For example, the way in which the ILR parameters of a picture / tile group are signaled may depend on the ILR parameters of the picture / tile group that were coded previously. a. For example, the ILR parameters of one picture / tile group may be predicted by the ILR parameters of one or more previously coded picture / tile groups.

[0159] 9. It is proposed to disable lumen-dependent chroma residual scaling (LCRS) for specific block dimensions / time layers / tile group types / picture types / coding modes / certain types of motion information. a. For example, even when a forward / reverse reshaping process is applied to a chroma block, LCRS does not necessarily have to be applied to the corresponding chroma block. b. Alternatively, LCRS may still be applied to the corresponding chroma block even if the forward / reverse reshaping process is not applied to the chroma block. c. For example, LCRS is not used when the cross-component linear model (CCLM) mode is applied. The CCLM mode includes LM, LM-A, and LM-L. d. For example, LCRS is not used when the cross-component linear model (CCLM) mode is not applicable. The CCLM modes include LM, LM-A, and LM-L. e. In one example, when the coded luma block exceeds one VPDU (e.g., 64×64). i. In one example, when including samples where the luma block size is smaller than M×H, e.g., 16 or 32 or 64 luma samples, LCRS is not allowed. ii. Alternatively, when the minimum size of the width and / or height of the luma block is not smaller than or larger than X, LCRS is not allowed. In one example, X is set to 8. iii. Alternatively, when the minimum size of the width and / or height of the luma block is X or more, LCRS is not allowed. In one example, X is set to 8. iv. Alternatively, when the width of the luma block > th1 or >= th1 and / or the height of the luma block > th2 or >= 2, LCRS is not allowed. In one example, th1 and / or th2 are set to 8. 1. In one example, th1 and / or th2 are set to 128. 2. In one example, th1 and / or th2 are set to 64. v. Alternatively, when the width of the luma block < th1 or <= th1 and / or the height of the luma block th2 < or <= th2, LCRS is not allowed. In one example, th1 and / or th2 are set to 8.

[0160] 10. Whether to disable ILR (forward reshaping process and / or reverse reshaping process) may depend on the coefficient. a. In one example, when one block is coded with all-zero coefficients, the forward reshaping process applied to the predicted block is skipped. b. In one example, when one block is coded with all-zero coefficients, the reverse reshaping process applied to the reconstructed block is skipped. c. In one example, when one block is coded with only one non-zero coefficient at a specific position (e.g., the DC coefficient at the upper left position of one block, the coefficient in the upper left coding group within one block), the process of forward reshaping applied to the predicted block and / or the reverse reshaping applied to the reconstructed block is skipped. d. In one example, when one block is coded with only M (e.g., M = 1) non-zero coefficients, the process of forward reshaping applied to the predicted block and / or the reverse reshaping applied to the reconstructed block is skipped.

[0161] 11. It is proposed to divide the ILR application region into virtual pipeline data units (VPDUs) when the coded block exceeds one VPDU. Each application region (e.g., having a maximum size of 64×64) is regarded as an individual CU for the ILR operation. a. In one example, when the width of the block > th1 or >= th1 and / or the height of the block > th2 or >= th2, it may be divided into sub-blocks having a width < th1 or <= th1 and / or a height < th2 or <= th2, and the ILR can be executed for each sub-block. i. In one example, the sub-blocks may have the same width and / or height. ii. In one example, the sub-blocks except those at the right boundary and / or the lower boundary may have the same width and / or height. iii. In one example, the sub-blocks except those at the left boundary and / or the upper boundary may have the same width and / or height. b. In one example, when the size of the block (i.e., width × height) > th3 or >= th3, it may be divided into sub-blocks having a size < th3 or <= th3, and the ILR can be executed for each sub-block. i. In one example, the sub-blocks may have the same size. ii. In one example, the sub-blocks except those at the right boundary and / or the lower boundary may have the same size. iii. In one example, sub-blocks excluding those at the left boundary and / or the upper boundary may have the same size. c. Alternatively, the use of ILR is restricted only to specific block dimensions. i. In one example, when the coded block exceeds one VPDU (e.g., 64×64), ILR is not allowed. ii. In one example, when the sample with a block size smaller than M×H, e.g., including 16 or 32 or 64 luma samples, ILR is not allowed. iii. Alternatively, when the minimum size of the width and / or height of the block is not smaller than or not larger than X, ILR is not allowed. In one example, X is set to 8. iv. Alternatively, when the minimum size of the width and / or height of the block is X or more, ILR is not allowed. In one example, X is set to 8. v. Alternatively, when the width of the block > th1 or >= th1 and / or the height of the block > th2 or >= th2, ILR is not allowed. In one example, th1 and / or th2 are set to 8. 1. In one example, th1 and / or th2 are set to 128. 2. In one example, th1 and / or th2 are set to 64. vi. Alternatively, when the width of the block < th1 or <= th1 and / or the height of the block th2 < or <= th2, ILR is not allowed. In one example, th1 and / or th2 are set to 8.

[0162] 12. The above methods (e.g., whether to disable ILR and / or whether to disable LCRS and / or whether to signal the PWL / reference table for chroma coding) may depend on color formats such as 4:4:4 / 4:2:0.

[0163] 13. The instruction to enable ILR (e.g., tile_group_reshaper_enable_flag) may be coded under the conditions of the instruction for the presented reshaper model (e.g., tile_group_reshaper_model_present_flag). a. Alternatively, tile_group_reshaper_model_present_flag may be coded under the condition of tile_group_reshaper_enable_flag. b. Alternatively, only one of the two syntax elements, tile_group_reshaper_model_present_flag and tile_group_reshaper_enable_flag, may be coded. The value of the other syntax element is set to be equal to the one that can be signaled.

[0164] 14. Different clipping methods may be applied to the prediction signal and reconstruction process. a. For example, an adaptive clipping method may be applied, in which the maximum and minimum values to be clipped can be defined in the reshaping region. b. For example, an adaptive clipping method may be applied to the predictive signal in the reshaping region. c. Alternatively, fixed clipping (e.g., according to bit depth) may be applied to the reconstructed block.

[0165] 15. Filter parameters (e.g., those used for DF, BF, HF) may depend on whether ILR is enabled or not.

[0166] 16. For blocks coded in palette mode, it is proposed that ILR be disabled or applied differently. a. For example, if a block is coded in palette mode, reshaping and reverse reshaping are skipped. b. Alternatively, different reshaping and reverse reshaping may be applied when the block is coded in palette mode.

[0167] 17. Alternatively, when ILR is applied, the palette mode may be coded differently. a. For example, when ILR is applied, the palette mode may be coded in the original area. b. Alternatively, when ILR is applied, the palette mode may be coded in the reshaping region. c. For example, when ILR is applied, the palette predictor may signal in the original region. d. Alternatively, the pallet predictor may be signaled in the reshaping region.

[0168] 18. For blocks coded in IBC mode, it is proposed that ILR be disabled or applied differently. a. For example, if a block is coded in IBC mode, reshaping and reverse reshaping are skipped. b. Alternatively, different reshaping and reverse reshaping may be applied when the block is coded in IBC mode.

[0169] 19. Alternatively, when ILR is applied, IBC may be coded differently. a. For example, when ILR is applied, IBC mode may run in the original domain. b. Alternatively, if ILR is applied, the IBC mode may be performed in the reshaping region.

[0170] 20. For blocks coded in B-DPCM mode, it is proposed that ILR be disabled or applied differently. a. For example, if a block is coded in B-DPCM mode, reshaping and reverse reshaping are skipped. b. Alternatively, different reshaping and reverse reshaping are applied when the block is coded in B-DPCM mode.

[0171] 21. Alternatively, when ILR is applied, the B-DPCM mode may be coded differently. a. For example, if ILR is applied, B-DPCM may run in the original region. b. Alternatively, if ILR is applied, B-DPCM may be performed in the reshaping region.

[0172] 22. For blocks coded in conversion skip mode, it is proposed that ILR be disabled or applied differently. a. For example, if a block is coded in conversion skip mode, reshaping and reverse reshaping are skipped. b. Alternatively, different reshaping and reverse reshaping may be applied when the block is coded in conversion skip mode.

[0173] 23. Alternatively, when ILR is applied, the conversion skip mode may be coded differently. a. For example, when ILR is applied, the conversion skip may be performed in the original region. b. Alternatively, when ILR is applied, the transformation skip may be performed in the reshaping region.

[0174] 24. For blocks coded in I-PCM mode, it is proposed that ILR be disabled or applied differently. a. For example, if a block is coded in palette mode, reshaping and reverse reshaping are skipped. b. Alternatively, different reshaping and reverse reshaping may be applied when the block is coded in palette mode.

[0175] 25. Alternatively, when ILR is applied, the I-PCM mode may be coded differently. a. For example, when ILR is applied, the I-PCM mode may be coded in the original domain. b. Alternatively, when ILR is applied, the I-PCM mode may be coded in the reshaping region.

[0176] 26. For blocks coded in transform quantization bypass mode, it is proposed that ILR be disabled or applied differently. a. For example, if a block is coded in transform quantization bypass mode, reshaping and reverse reshaping are skipped.

[0177] 27. Alternatively, different reshaping and reverse reshaping methods are applied when the block is coded in transform quantization bypass mode.

[0178] 28. With respect to the above bullet points, if ILR is disabled, the forward reshaping and / or reverse reshaping process may be skipped. a. Alternatively, the predicted and / or reconstructed and / or residual signals are in the original region. b. Alternatively, the predicted and / or reconstructed and / or residual signals are in the reshaping region.

[0179] 29. Multiple reshaping / re-reshaping functions (e.g., multiple PWL models) may be enabled to code one picture / one tile group / one VPDU / one region / one CTU row / multiple CUs. a. The method of selecting from a number of functions may depend on block dimensions / coding mode / picture type / low latency inspection flag / motion information / reference picture / video content, etc. b. For example, multiple sets of ILR side information (e.g., reshaping / reverse reshaping functionality) may be signaled for each SPS / VPS / PPS / sequence header / picture header / tile group header / tile header / region / VPDU / , etc. i. Alternatively, predictive coding of ILR side information may be used. c. For example, more than one aps_id may be signaled in PPS / Picture Header / Tile Group Header / Tile Header / Region / VPDU / , etc.

[0180] 30. For example, reshaping information may be signaled with a new syntax set other than VPS, SPS, PPS, or APS. For instance, reshaping information may be signaled with a set represented as inloop_reshapingu_parameter_set()(IRPS or other name). a. An example syntax design is as follows: [Table 11] b. An example syntax design is as follows: [Table 12]

[0181] 31. For example, IRL information is signaled together with ALF information in APS. a. An example syntax design is as follows: [Table 13] b. For example, one tile_group_aps_id is signaled in the tile group header to indicate the adaptation_paramter_set_id of the APS referenced by the tile group. Both ALF and ILR information for the current tile group are signaled by the specified APS. i. An example syntax design is as follows: [Table 14]

[0182] 32. For example, ILR information and ALF information are signaled by different APSs. a. The first ID (which may be called tile_group_aps_id_alf) is signaled in the tile group header to indicate the first adaptation_paramter_set_id of the first APS referenced by the tile group. ALF information for the current tile group is signaled in the specified first APS. b. The second ID (which may be called tile_group_aps_id_irps) is signaled in the tile group header to indicate the second adaptation_paramter_set_id of the second APS referenced by the tile group. ILR information for the current tile group is signaled in the specified second APS. c. For example, the first APS should contain ALF information in a conformance bitstream. d. For example, the second APS should contain ILR information in a conformance bitstream. e. An example syntax design is as follows: [Table 15]

[0183] 33. As an example, some APS that have an adaptation_paramter_set_id specified should have ALF information. As another example, some APS that have an adaptation_paramter_set_id specified should have ILR information. a. For example, an APS with an adaptation_paramter_set_id equal to 2N should have ALF information, where N is any integer. b. For example, an APS with an adaptation_paramter_set_id equal to 2N+1 should have ILR information, where N is any integer. c. An example syntax design is as follows: [Table 16] i. For example, 2×tile_group_aps_id_alf indicates the first adaptation_paramter_set_id of the first APS referenced by the tile group. ALF information for the current tile group is signaled by the specified first APS. ii. For example, 2 × tile_group_aps_id_irps + 1 indicates the second adaptation_paramter_set_id of the second APS referenced by the tile group. ILR information for the current tile group is signaled by the specified second APS.

[0184] 34. For example, a tile group cannot reference an APS (or IRPS) that is signaled before a Network Abstraction Layer (NAL) unit of a specified type that is signaled before the current tile group. a. For example, a tile group cannot reference an APS (or IRPS) that was signaled before a tile group of a specified type, which was signaled before the current tile group. b. For example, a tile group cannot reference an APS (or IRPS) that was signaled before an SPS that was signaled before the current tile group. c. For example, a tile group cannot reference an APS (or IRPS) that was signaled before a PPS that was signaled before the current tile group. d. For example, a tile group cannot reference an APS (or IRPS) that was signaled before the current tile group, or before the access unit delimiter (AUD) NAL. e. For example, a tile group cannot reference an APS (or IRPS) that was signaled before an end-of-bitstream (EoB) NAL that was signaled before the current tile group. f. For example, a tile group cannot reference an APS (or IRPS) that was signaled before an End of Sequence (EoS) NAL that was signaled before the current tile group. g. For example, a tile group cannot refer to an APS (or IRPS) that was signaled before an Instantaneous Decoding Refresh (IDR) NAL that was signaled before the current tile group. h. For example, a tile group cannot reference an APS (or IRPS) that was signaled before the current tile group, or before a Clean Random Access (CRA) NAL. i. For example, a tile group cannot reference an APS (or IRPS) that was signaled before an intra random access point (IRAP) that was signaled before the current tile group. j. For example, a tile group cannot reference an APS (or IRPS) that was signaled before the current tile group, or before a tile group (or picture, or slice). The methods disclosed in k.IDF-P1903237401H and IDF-P1903234501H are also applicable when ILR information is carried by APS or IRPS.

[0185] 35. The conformance bitstream should satisfy the requirement that default ILR parameters, such as a default model, should be defined when the in-loop reshaping method is enabled for one video data unit (e.g., a sequence). a.sps_lmcs_default_model_present_flag should be set to 1 if sps_lmcs_enabled_flag is set to 1. b. Default parameters may be signaled under the condition of the ILR enable flag instead of the default model existence flag (e.g., sps_lmcs_default_model_present_flag). c. For each tile group, the default model use flag (e.g., tile_group_lmcs_use_default_model_flag) may be signaled without referring to the SPS default model use flag. d. The conformance bitstream should satisfy the requirement that the default model be used when the corresponding APS type of ILR does not have ILR information and one video data unit (e.g., a tile group) is compelled to use ILR technology. e. Alternatively, the instruction to use the default model should be true if the conformance bitstream does not have ILR information in the corresponding APS type of ILR and one video data unit (e.g., a tile group) is compelled to use ILR technology (e.g., tile_group_lmcs_enable_flag is equal to 1), in which case tile_group_lmcs_use_default_model_flag should be 1. f. It is a constraint that the default ILR parameters (e.g., default model) should be transmitted in the video data unit (e.g., SPS). i. Alternatively, and furthermore, the default ILR parameters should be sent when the SPS flag, which indicates the use of ILR, is true. g. It is a constraint that there is at least one ILR APS transmitted in a video data unit (e.g., SPS). i. For example, at least one ILR APS includes default ILR parameters (e.g., default model).

[0186] 36. Default ILR parameters may be indicated by a single flag. If this flag indicates that default ILR parameters are to be used, no further ILR data needs to be sent.

[0187] 37. Default ILR parameters may be predefined if they are not signaled. For example, default ILR parameters may correspond to discriminative mappings.

[0188] 38. Time layer information may be signaled together with ILR parameters, for example, in ILR APS. a. For example, the time layer index may be signaled by lmcs_data(). b. For example, time layer index -1 may be signaled by lmcs_data(). c. Alternatively, and furthermore, when encoding / decoding a single tile group / tile, referencing ILR APS associated with a smaller or equal time layer index is restricted. d. Alternatively, when encoding / decoding a single tile group / tile, referencing ILR APS associated with a smaller time layer index is limited. e. Alternatively, when encoding / decoding a single tile group / tile, referencing ILR APS associated with a larger time layer index is restricted. f. Alternatively, when encoding / decoding a single tile group / tile, referencing ILR APS associated with a larger or equal time layer index is restricted. g. Alternatively, when encoding / decoding a single tile group / tile, referencing ILR APS associated with the same time layer index is restricted. h. For example, whether the above restrictions apply may depend on one piece of information that is signaled to or can be inferred by the decoder.

[0189] 39. Time layer information may be signaled together with ALF information, for example, in ALF APS. a. For example, the time layer index may be signaled by alf_data(). b. For example, time layer index -1 may be signaled by alf_data(). c. Alternatively, and furthermore, when encoding / decoding a single tile group / tile or a single CTU within a single tile group / tile, referencing an ALF APS associated with a smaller or equal time layer index is restricted. d. Alternatively, when encoding / decoding a single tile group / tile, referencing ALF APS associated with a smaller time layer index is limited. e. Alternatively, when encoding / decoding a single tile group / tile, referencing ALF APS associated with a larger time layer index is restricted. f. Alternatively, when encoding / decoding a single tile group / tile, referencing ALF APS associated with a larger or equal time layer index is restricted. g. Alternatively, when encoding / decoding a single tile group / tile, referencing ALF APS associated with the same time layer index is restricted. h. For example, whether the above restrictions apply may depend on one piece of information that is signaled to or can be inferred by the decoder.

[0190] 40. For example, the remodeling mapping between the original sample and the remodeled sample does not have to be a positive relationship; that is, one larger value is not allowed to map to a smaller value. a. For example, the remodeling mapping between the original sample and the remodeled sample may be a negative relationship, and in the case of two values, the larger value in the original region may be mapped to the smaller value in the remodeled region.

[0191] 41. In conformance bitstreams, the syntax element aps_params_type is only allowed to be one of several predefined values, such as 0 and 1. a. In other examples, only 0 and 7 are allowed.

[0192] 42. For example, default ILR information should be signaled when ILR is applied (e.g., sps_lmcs_enabled_flag is true).

[0193] The above example may be incorporated in connection with the methods described later, such as method 3100, which can be implemented in a video decoder or video encoder.

[0194] Figure 31A shows a flowchart of an example method for video processing. Method 3110 includes, in operation 3112, performing a conversion between a video containing one or more video data units and a bitstream representation of that video.

[0195] In some embodiments, the bitstream representation follows format rules specifying the inclusion of side information indicating default parameters for a coding mode applicable to video blocks of one or more video data units in which the coding mode is enabled, the side information providing parameters for structuring the video blocks based on the representation of the video blocks in the original and reshaped regions and / or luma-dependent scaling of the chroma residuals of the chroma video blocks.

[0196] In other embodiments, the bitstream representation adheres to formatting rules specifying the inclusion of side information indicating default parameters for a coding mode applicable to video blocks of one or more video data units in which the coding mode is enabled, the default parameters being used for the coding mode when there are no parameters explicitly signaled in the bitstream representation, and the coding mode having constitutes the video block based on the representation of the video block in the original and reshaped regions and / or the chroma-dependent scaling of the chroma residuals of the chroma video block.

[0197] Figure 31B shows a flowchart of an example method for video processing. Method 3120 includes, in operation 3122, configuring a bitstream representation that includes a video containing one or more video data units and a syntax element that informs of time layer information and parameters about the default coding mode of the coding mode applicable to the video blocks of the one or more video data units for conversion between the video and the bitstream representation of the video. Method 3120 further includes, in operation 3124, performing the conversion based on the configuration.

[0198] Figure 31C shows a flowchart of an example method for video processing. Method 3130 includes, in operation 3132, parsing a bitstream representation that includes syntax elements informing of time layer information and parameters about coding modes applicable to the video blocks of the one or more video data units, for conversion between a video containing one or more video data units and a bitstream representation of that video. Method 3130 further includes, in operation 3134, performing the conversion based on the parsing.

[0199] In some embodiments, the coding mode comprises configuring the video block based on luma-dependent scaling of the original region, the reshaped region, and / or the chroma residual of the chroma video block.

[0200] In other embodiments, the coding mode comprises configuring the current block of video based on a filtering process that uses adaptive loop filter (ALF) coefficients.

[0201] Figure 31D shows a flowchart of an example method for video processing. Method 3140 includes performing a conversion between a first video data unit of video and a bitstream representation of video in operation 3142.

[0202] In some embodiments, the coding mode is applicable to video blocks of a first video data unit, and the coding mode comprises configuring the video block based on luma-dependent scaling of the original region and the reshaped region and / or the chroma residual of the chroma video block, based on side information associated with the coding mode, and the side information is determined according to rules based on a time layer index.

[0203] In other embodiments, the coding mode is applicable to a video block of a first video data unit, and the coding mode comprises configuring the current block of video based on a filtering process using adaptive loop filter (ALF) coefficients, based on side information associated with the coding mode, and the side information is determined according to a rule based on a time layer index.

[0204] [5. Examples of the disclosed technology] In some embodiments, tile_group_reshaper_enable_flag is conditionally present if tile_group_reshaper_model_present_flag is enabled. [Table 17] TIFF0007875892000037.tif66159

[0205] Embodiment #2 on JVET-N0805 [Table 18] TIFF0007875892000039.tif255168TIFF0007875892000040.tif196156

[0206] Data syntax for chroma mapping with chroma scaling (LMCS) [Table 19]

[0207] Figure 32 is a block diagram of a video processing device 3200. The device 3200 may be used to implement one or more of the methods described herein. The device 3200 may be embodied in a smartphone, tablet, computer, Internet of Things (IoT) receiver, etc. The device 3200 may include one or more processors 3202, one or more memories 3204, and video processing hardware 3206. The processor 3202 may be configured to implement one or more of the methods described herein (including, but not limited to, method 3100). The memories (multiple memories) 3204 may be used to store data and code used to implement the methods and techniques described herein. The video processing hardware 3206 is a hardware circuit that may be used to implement some of the techniques described herein.

[0208] In some embodiments, the video coding method may be implemented using a device implemented on a hardware platform as described with respect to Figure 32.

[0209] Figure 33 is a block diagram showing an example video processing system 3300 in which various technologies described herein may be implemented. Various implementations may include some or all of the components of system 3300. System 3300 may include an input unit 3302 that receives video content. The video content may be received in raw or uncompressed format, for example, as 8 or 10-bit multi-component pixel values, or in a compressed or encoded format. The input unit 3302 may correspond to a network interface, peripheral bus interface, or storage interface. Examples of network interfaces include priority interfaces such as Ethernet® and Passive Optical Network (PON), and wireless interfaces such as Wi-Fi or cellular interfaces.

[0210] System 3300 may include a coding component 3304 that can implement various coding or encoding methods described herein. The coding component 3304 may reduce the average bitrate of the video from the input 3302 to the output of the coding component 3304 in order to produce a coded representation of the video. The coding technique is therefore sometimes referred to as video compression or video transcoding technique. The output of the coding component 3304 may be either stored or transmitted over a communication connection, as represented by component 3306. The stored or transmitted bitstream (or coded) representation of the video received at the input 3302 may be used by component 3308 to generate pixel values or a viewable video to be sent to the display interface 3310. The process of generating a video that can be viewed by the user from the bitstream representation is sometimes referred to as video decompression. Furthermore, certain video processing operations are referred to as “coding” operations or tools, where the coding tool or operation is used in the encoder, and the corresponding decoding tool or operation that determines the result of the coding is performed in the decoder.

[0211] Examples of peripheral bus interfaces or display interfaces may include Universal Serial Bus (USB) or High Definition Multimedia Interface (HDMI®). Examples of storage interfaces include SATA (Serial Advanced Technology Attachment), PCI, and IDE interfaces. The technologies described herein may be embodied in a variety of electronic devices, such as mobile phones, laptops, smartphones, or other devices capable of performing digital data processing and / or video display.

[0212] In some embodiments, the following technical solutions may be implemented.

[0213] A1. A method of video processing, The process includes the step of performing a conversion between a video containing one or more video data units and a bitstream representation of said video, The bitstream representation follows a formatting rule that specifies the inclusion of side information indicating default parameters for the coding mode applicable to the video blocks of the one or more video data units in which the coding mode is enabled, wherein the side information provides parameters for configuring the video blocks based on the representation of the video blocks in the original and reshaped regions and / or luma-dependent scaling of the chroma residuals of the chroma video blocks. method.

[0214] A2. The method described in Solution A1, When it is determined that the bitstream representation includes an instruction for luma-dependent scaling of chroma residuals (sps_lmcs_enabled_flag) set to 1, it also includes an instruction to use the default model (sps_lmcs_default_model_present_flag) set to 1. method.

[0215] A3. The method described in Solution A1, The adaptive parameter set (APS) in the bitstream representation excludes parameters related to the coding mode, and the coding mode is applied to the video block data using the default parameters. method.

[0216] A4. The method described in Solution A3, The one or more video data units have tile groups, and the bitstream representation further includes an instruction to use a default model for the tile group (tile_group_lmcs_use_default_model_flag) which is set to 1. method.

[0217] A5. The method described in Solution A1, The bitstream representation further includes a sequence parameter set (SPS) containing the default parameters, method.

[0218] A6. The method described in Solution A5, The bitstream representation further includes a flag in the SPS indicating that the coding mode is enabled. method.

[0219] A7. The method described in Solution A1, The bitstream representation further includes a sequence parameter set (SPS) which includes at least one adaptive parameter set (APS) for the coding mode. method.

[0220] A8. The method described in Solution A7, The at least one APS has the default parameters, method.

[0221] A9. The method described in Solution A1, The instruction to enable the default parameters for the coding mode is signaled in the bitstream representation using a single flag. method.

[0222] A10. The method described in Solution A1, When it is determined that the bitstream representation includes the lumer-dependent scaling instruction (sps_lmcs_enabled_flag) for the chroma residuals, the bitstream representation includes the default parameters, method.

[0223] A11. A method of video coding, The process includes the step of performing a conversion between a video containing one or more video data units and a bitstream representation of said video, The bitstream representation conforms to a formatting rule specifying the inclusion of side information indicating default parameters for the coding mode applicable to the video blocks of the one or more video data units in which the coding mode is enabled, the default parameters being used for the coding mode when there are no parameters explicitly notified in the bitstream representation, and the coding mode comprising structuring the video blocks based on the representation of the video blocks in the original and reshaped regions and / or chroma-dependent scaling of the chroma residuals of the chroma video blocks. method.

[0224] A12. The method described in Solution A11, The aforementioned side information has an index to a predefined value for the default parameter, method.

[0225] A13. The method described in Solution A12, The predefined values for the default parameters correspond to identity mapping. method.

[0226] A14. The method described in Solution A11, The aforementioned side information includes the default parameters, method.

[0227] A15. A method using any of the solutions A1 to A14, The coding mode is the in-loop reshaping (ILR) mode. method.

[0228] A16. A method according to any of the solutions A1 to A15, The conversion generates at least one of the one or more video data units from the bitstream representation. method.

[0229] A17. A method using any of the solutions A1 to A15, The conversion generates the bitstream representation from at least one of the one or more video data units. method.

[0230] A18. Devices within a video system, Processor and Non-temporary memory that stores instructions and It has, When the aforementioned instruction is executed by the processor, it causes the processor to implement the method described in any one of solutions A1 to A17. Device.

[0231] A19. Computer program products stored on non-temporary computer-readable media, A computer program product comprising program code for performing a method according to any one of solutions A1 through A17.

[0232] In some embodiments, the following technical solutions may be implemented.

[0233] B1. A method of video processing, The step includes performing a conversion between a first video data unit of the video and a bitstream representation of the video, The coding mode is applicable to the video block of the first video data unit, and the coding mode comprises configuring the video block based on luma-dependent scaling of the original region and the reshaped region and / or the chroma residual of the chroma video block, based on side information associated with the coding mode, the side information is determined according to rules based on a time layer index.

[0234] B2. The method described in Solution B1, The side information is determined based on the side information regarding the coding mode of the second video data unit of the video. method.

[0235] B3. The method described in Solution B2, The time layer index of the second video data unit is less than or equal to the time layer index of the first video data unit. method.

[0236] B4. The method described in Solution B2, The time layer index of the second video data unit is smaller than the time layer index of the first video data unit. method.

[0237] B5. The method described in Solution B2, The time layer index of the second video data unit is equal to the time layer index of the first video data unit. method.

[0238] B6. The method described in Solution B2, The time layer index of the second video data unit is greater than the time layer index of the first video data unit. method.

[0239] B7. The method described in Solution B1, The side information is determined based on one or more adaptive parameter sets (APS) for the coding mode. method.

[0240] B8. Solution method B7, The one or more APS mentioned above are associated with the time layer index which is below a threshold. method.

[0241] B9. The method described in Solution B7, The one or more APS mentioned above are associated with the time layer index which is greater than the threshold. method.

[0242] B10. The method described in Solution B7, The one or more APS mentioned above are associated with the time layer index that is above a threshold. method.

[0243] B11. The method described in Solution B7, The one or more APS are associated with the time layer index equal to the threshold, method.

[0244] B12. A method using any of the methods B8 to B11, The threshold is the time layer index of the first video data unit. method.

[0245] B13. A method of video processing, For conversion between a video containing one or more video data units and a bitstream representation of said video, the steps include configuring a bitstream representation that includes a syntax element that notifies time layer information and parameters for a coding mode applicable to the video blocks of the one or more video data units, A step of performing the conversion based on the above-described steps and Includes, The coding mode comprises configuring the video block based on luma-dependent scaling of the chroma residuals of the original region, the reshaped region, and / or the chroma video block. method.

[0246] B14. A method of video processing, For conversion between a video containing one or more video data units and a bitstream representation of said video, the steps include: parsing the bitstream representation which includes syntax elements that notify time layer information and parameters for coding modes applicable to the video blocks of the one or more video data units; A step of performing the conversion based on the parsing step and Includes, The coding mode comprises configuring the video block based on luma-dependent scaling of the chroma residuals of the original region, the reshaped region, and / or the chroma video block. method.

[0247] B15. The method described in Solution B13 or B14, The aforementioned time layer information includes a notified value based on a time layer index notified in the chroma residual-related data (lmcs_data()) which is scaled in a luma-dependent manner. method.

[0248] B16. The method described in Solution B15, The notified value is the time layer index. method.

[0249] B17. The method described in Solution B15, The notified value is obtained by subtracting 1 from the time layer index. method.

[0250] B18. The method described in Solution B1, B13, or B14, The step of performing the conversion is based on further information notified in the bitstream representation, method.

[0251] B19. The method described in Solution B1, B13, or B14, The step of performing the aforementioned conversion is based further on information inferred from the information notified in the bitstream representation, method.

[0252] B20. A method using any of the solutions B1 to B19, The coding mode is a chroma-scaling with lumens mapping (LMCS) mode, and the APS is an LMCS APS. method.

[0253] B21. A method using any of the solutions B1 to B20, The conversion generates the current block from the bitstream representation. method.

[0254] B22. A method using any of the solutions B1 to B20, The conversion generates the bitstream representation from the current block. method.

[0255] B23. Devices within a video system, Processor and Non-temporary memory that stores instructions and It has, When the instruction is executed by the processor, it causes the processor to implement the method described in any one of solutions B1 to B22. Device.

[0256] B24. Computer program products stored on non-temporary computer-readable media, A computer program product comprising program code for performing the method described in any one of solutions B1 through B22.

[0257] In some embodiments, the following technical solutions may be implemented.

[0258] C1. A method for video processing, The step includes performing a conversion between a first video data unit of the video and a bitstream representation of the video, The coding mode is applicable to the video block of the first video data unit, and the coding mode comprises configuring the current block of the video based on a filtering process using adaptive loop filter (ALF) coefficients, based on side information associated with the coding mode, wherein the side information is determined according to a rule based on a time layer index. method.

[0259] C2. The method described in Solution C1, The side information is determined based on the side information regarding the coding mode of the second video data unit of the video. method.

[0260] C3. The method described in Solution C2, The time layer index of the second video data unit is less than or equal to the time layer index of the first video data unit. method.

[0261] C4. The method described in Solution C2, The time layer index of the second video data unit is smaller than the time layer index of the first video data unit. method.

[0262] C5. The method described in Solution C2, The time layer index of the second video data unit is equal to the time layer index of the first video data unit. method.

[0263] C6. The method described in Solution C2, The time layer index of the second video data unit is greater than the time layer index of the first video data unit. method.

[0264] C7. The method described in Solution C1, The side information is determined based on one or more adaptive parameter sets (APS) for the coding mode. method.

[0265] C8. The method described in Solution C7, The one or more APS mentioned above are associated with the time layer index which is below a threshold. method.

[0266] C9. The method described in Solution C7, The one or more APS mentioned above are associated with the time layer index which is greater than the threshold. method.

[0267] C10. The method described in Solution C7, The one or more APS mentioned above are associated with the time layer index that is above a threshold. method.

[0268] C11. The method described in Solution C7, The one or more APS are associated with the time layer index equal to the threshold, method.

[0269] C12. A method using any of the solutions C8 to C11, The threshold is the time layer index of the first video data unit. method.

[0270] C13. A method for video processing, For conversion between a video containing one or more video data units and a bitstream representation of said video, the steps include configuring a bitstream representation that includes a syntax element that notifies time layer information and parameters for a coding mode applicable to the video blocks of the one or more video data units, Based on the steps described above, the steps include performing the conversion and It has, The coding mode comprises configuring the current block of the video based on a filtering process that uses adaptive loop filter (ALF) coefficients. method.

[0271] C14. A method for video processing, For conversion between a video containing one or more video data units and a bitstream representation of said video, the steps include: parsing the bitstream representation which includes syntax elements that notify time layer information and parameters for coding modes applicable to the video blocks of the one or more video data units; Based on the parsing step, the steps include: It has, The coding mode comprises configuring the current block of the video based on a filtering process that uses adaptive loop filter (ALF) coefficients. method.

[0272] C15. The method described in Solution C13 or C14, The aforementioned time layer information includes a notified value based on the time layer index notified in the data (alf_data()) associated with the coding mode, method.

[0273] C16. The method described in Solution C15, The notified value is the time layer index. method.

[0274] C17. The method described in Solution C15, The notified value is obtained by subtracting 1 from the time layer index. method.

[0275] C18. The method described in Solution C1, C13, or C14, The step of performing the conversion is based on further information notified in the bitstream representation, method.

[0276] C19. A method by which the solution C1, C13, or C14 is described, The step of performing the aforementioned conversion is based further on information inferred from the information notified in the bitstream representation, method.

[0277] C20. A method using any of the solutions C1 to C19, The coding mode is Adaptive Loop Filtering (ALF) mode, and the APS is ALF APS. method.

[0278] C21. A method using any of the solutions C1 to C20, The conversion generates the current block from the bitstream representation. method.

[0279] C22. A method using any of the solutions C1 to C20, The conversion generates the bitstream representation from the current block. method.

[0280] C23. Devices within a video system, Processor and Non-temporary memory that stores instructions and It has, When the aforementioned instruction is executed by the processor, it causes the processor to implement the method described in any one of the solutions C1 to C22. Device.

[0281] C24. Computer program products stored on non-temporary computer-readable media, A computer program product comprising program code for performing a method according to any one of solutions C1 through C22.

[0282] Some embodiments of the disclosed technology include making judgments or decisions to enable a video processing tool or mode. In one example, when a video processing tool or mode is enabled, the encoder will use or implement that tool or mode in processing blocks of video, but will not necessarily modify the resulting bitstream based on the use of that tool or mode. That is, the conversion from blocks of video to a bitstream representation of video will use the video processing tool or mode if it is enabled based on the judgment or decision. In another example, when a video processing tool or mode is enabled, the decoder will process the bitstream knowing that the bitstream has been modified based on the video processing tool or mode. That is, the conversion from a bitstream representation of video to blocks of video will be performed using the video processing tool or mode that was enabled based on the judgment or decision.

[0283] Some embodiments of the disclosed technology include making a judgment or decision to disable a video processing tool or mode. In one example, when a video processing tool or mode is disabled, the encoder does not use the tool or mode in converting the video blocks to a bitstream representation of the video. In another example, when a video processing tool or mode is disabled, the decoder processes the bitstream knowing that the bitstream has not been modified using the video processing tool or mode that was disabled based on the judgment or decision.

[0284] In this specification, the term “video processing” may refer to video encoding, video decoding, video transcoding, video compression, or video decompression. For example, a video compression algorithm may be applied during the conversion from a pixel representation of video to a corresponding bitstream representation, or vice versa. The bitstream representation of the current video block may correspond to bits that are scattered in different locations within the bitstream or are at the same location, as defined, for example, by syntax. For example, a macroblock may be encoded with respect to the transcoded and coded error residual values, using bits in the header and other fields in the bitstream as well. Furthermore, during the conversion, the decoder may parse the bitstream knowing, based on its decisions, whether certain fields are present or absent, as described in the solutions above. Similarly, the encoder may determine whether certain syntax fields should be included and, accordingly, generate the coded representation (bitstream representation) by including or excluding the syntax fields from the coded representation.

[0285] From the foregoing, it is recognized that while specific embodiments of the currently disclosed technology are described herein for illustrative purposes, various modifications may be made without departing from the scope of the invention. Accordingly, the currently disclosed technology is not limited except as provided for by the appended claims.

[0286] The subjects and functional operations described herein can be implemented in various systems and digital electronic circuits, or in computer software, firmware, or hardware, or in one or more combinations thereof, including the structures disclosed herein and their structural equivalents. The subjects described herein can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded in a tangible, non-temporary computer-readable medium for execution by a data processing device or for controlling the operation of a data processing device. The computer-readable medium may be a machine-readable storage device, a machine-readable storage carrier, a memory device, a composition that provides a machine-readable propagating signal, or one or more combinations thereof. The terms “data processing unit” or “data processing device” encompass all devices, machines, and equipment that process data, including, for example, a programmable processor, a computer, or multiple processors or computers. In addition to hardware, an apparatus may include code that creates an execution environment for the computer program in question, such as processor firmware, a protocol stack, a database management system, an operating system, or code that constitutes one or more combinations thereof.

[0287] Computer programs (also known as programs, software, software applications, scripts, or code) can be written in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as standalone programs or as modules, components, subroutines, or other units suitable for use in a computing environment. Computer programs do not necessarily correspond to files in a file system. A program can be stored in a single file dedicated to the program in question, or in multiple coordinated files (e.g., a file storing one or more modules, subprograms, or parts of code), or in parts of a file that hold other programs or data (e.g., one or more scripts stored in markup language documents). Computer programs can be deployed to run on one computer, or on multiple computers located in one place or distributed across multiple locations and interconnected by a communication network.

[0288] The processes and logic flows described herein are executable by one or more programmable processors that execute one or more computer programs to perform a function by acting on input data and producing an output. The processes and logic flows are also executable by dedicated logic circuits, such as FPGAs (Field Programmable Gate Arrays) or ASICs (Application-Specific Integrated Circuits), and the device can also be implemented as such.

[0289] Processors suitable for executing computer programs include, for example, one or more processors of any kind, both general-purpose and dedicated microprocessors, as well as any digital computer. Generally, a processor will receive instructions and data from read-only memory, random-access memory, or both. Essential elements of a computer are a processor that executes instructions and one or more memory devices that store instructions and data. Generally, a computer will also include one or more mass storage devices for storing data, such as magnetic, magneto-optical disks, or optical disks, or will be operablely coupled for receiving data from or transferring data to or from one or more such mass storage devices. However, a computer is not required to have such devices. Computer-readable media suitable for storing computer program instructions and data include, for example, all forms of non-volatile memory, media, and memory devices, including semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices. Processors and memory may be enhanced by or incorporated into dedicated logic circuits.

[0290] This specification contains numerous details, which should be interpreted not as limitations on the scope of any invention or any claimable invention, but rather as descriptions of features that may be specific to particular embodiments of a particular invention. Certain features described herein in relation to separate embodiments can also be implemented in combination with a single embodiment. Conversely, various features described in relation to a single embodiment can also be implemented separately in multiple embodiments or in some appropriate subcombination. Furthermore, features are described above as operating in a particular combination, and may even be initially claimed as such; however, in some cases, one or more features from a claimed combination can be removed from that combination, and the claimed combination may be directed towards subcombinations or variations of subcombinations.

[0291] Similarly, although the operations are shown in a specific order in the drawings, this should not be understood as requiring that such operations be performed in that specific order shown, or in a sequential order, or that all the operations shown be performed in order to achieve the desired result. Furthermore, the isolation of various system components in the embodiments described herein should not be understood as requiring such isolation in all embodiments.

[0292] Only a few implementations and examples are described, and other implementations, enhancements, and modifications may be made based on those described and illustrated herein.

Claims

1. A method for processing video data, The step of performing a conversion between a first video data unit having a first time layer index of the video and the bitstream of the video, A first coding tool is applicable to a first video block of the first video data unit, and the first coding tool processes the first video block based on a filtering process using adaptive loop filter coefficients, based on first side information relating to the first coding tool. The first side information is determined based on one or more first adaptive parameter sets for the first coding tool, the one or more first adaptive parameter sets relating to a second time layer index which is less than or equal to the first time layer index, A second coding tool is conditionally applied to the first video data unit, and in response to the application of the second coding tool during the conversion, at least one of a mapping process or a scaling process is applied to the second video block of the first video data unit based on second side information relating to the second coding tool. The second side information is determined based on one or more second adaptive parameter sets for the second coding tool, the one or more second adaptive parameter sets relating to a third time layer index which is less than or equal to the first time layer index, In the mapping process, a piecewise linear model is used for the lumens component of the second video block, and in the scaling process, the residual samples of the chromens component of the second video block are scaled. The one or more first adaptive parameter sets and the one or more second adaptive parameter sets include syntax elements indicating the type of adaptive parameter set parameter, The one or more first adaptive parameter sets and the one or more second adaptive parameter sets are permitted to be shared between different pictures of the video. The information of the third time layer index and the information of the second side are signaled together and reside in the same information unit. method.

2. The first video data unit has tile groups, tiles, pictures, slices, or coding tree units. The method according to claim 1.

3. The type of the adaptive parameter set parameter has a predefined value. The method according to claim 1.

4. The scale coefficient of the piecewise linear model is determined based on a first variable determined based on syntax elements included in one or more second adaptive parameter sets, and a second variable determined based on bit depth. The method according to claim 1.

5. The identifiers of the one or more second adaptive parameter sets to be used in the second video block are conditionally included in the header of the first video data unit. The method according to claim 1.

6. The aforementioned mapping process is A first mapping operation in which the predicted lumen sample of the lumen component is transformed from the original region to a reshaped region so as to generate a modified predicted lumen sample, or A second mapping operation which is the reverse operation of the first mapping operation, and which converts the reconstructed sample of the luma component in the reshaped region back into the original region. Having at least one of the following, The method according to claim 1.

7. The conversion includes encoding the first video data unit into the bitstream. The method according to any one of claims 1 to 6.

8. The conversion includes decoding the first video data unit from the bitstream. The method according to any one of claims 1 to 6.

9. A device for processing video data, Processor and Non-temporary memory that stores instructions and It has, When the instruction is executed by the processor, it causes the processor to perform a conversion between a first video data unit of the video having a first time layer index and the bitstream of the video. A first coding tool is applicable to a first video block of the first video data unit, and the first coding tool processes the first video block based on a filtering process using adaptive loop filter coefficients, based on first side information relating to the first coding tool. The first side information is determined based on one or more first adaptive parameter sets for the first coding tool, the one or more first adaptive parameter sets relating to a second time layer index which is less than or equal to the first time layer index, A second coding tool is conditionally applied to the first video data unit, and in response to the application of the second coding tool during the conversion, at least one of a mapping process or a scaling process is applied to the second video block of the first video data unit based on second side information relating to the second coding tool. The second side information is determined based on one or more second adaptive parameter sets for the second coding tool, the one or more second adaptive parameter sets relating to a third time layer index which is less than or equal to the first time layer index, In the mapping process, a piecewise linear model is used for the lumens component of the second video block, and in the scaling process, the residual samples of the chromens component of the second video block are scaled. The one or more first adaptive parameter sets and the one or more second adaptive parameter sets include syntax elements indicating the type of adaptive parameter set parameter, The one or more first adaptive parameter sets and the one or more second adaptive parameter sets are permitted to be shared between different pictures of the video. The information of the third time layer index and the information of the second side are signaled together and reside in the same information unit. Device.

10. A non-temporary computer-readable storage medium that stores instructions, The instruction causes the processor to perform a conversion between a first video data unit of the video having a first time layer index and the bitstream of the video. A first coding tool is applicable to a first video block of the first video data unit, and the first coding tool processes the first video block based on a filtering process using adaptive loop filter coefficients, based on first side information relating to the first coding tool. The first side information is determined based on one or more first adaptive parameter sets for the first coding tool, the one or more first adaptive parameter sets relating to a second time layer index which is less than or equal to the first time layer index, A second coding tool is conditionally applied to the first video data unit, and in response to the application of the second coding tool during the conversion, at least one of a mapping process or a scaling process is applied to the second video block of the first video data unit based on second side information relating to the second coding tool. The second side information is determined based on one or more second adaptive parameter sets for the second coding tool, the one or more second adaptive parameter sets relating to a third time layer index which is less than or equal to the first time layer index, In the mapping process, a piecewise linear model is used for the lumens component of the second video block, and in the scaling process, the residual samples of the chromens component of the second video block are scaled. The one or more first adaptive parameter sets and the one or more second adaptive parameter sets include syntax elements indicating the type of adaptive parameter set parameter, The one or more first adaptive parameter sets and the one or more second adaptive parameter sets are permitted to be shared between different pictures of the video. The information of the third time layer index and the information of the second side are signaled together and reside in the same information unit. A non-temporary computer-readable storage medium.

11. A method for storing a video bitstream, The process includes generating a bitstream based on a first video data unit having a first time layer index of the video, and storing the bitstream on a non-temporary computer-readable recording medium. A first coding tool is applicable to a first video block of the first video data unit, and the first coding tool processes the first video block based on a filtering process using adaptive loop filter coefficients, based on first side information relating to the first coding tool. The first side information is determined based on one or more first adaptive parameter sets for the first coding tool, the one or more first adaptive parameter sets relating to a second time layer index which is less than or equal to the first time layer index, A second coding tool is conditionally applied to the first video data unit, and in response to applying the second coding tool to the first video data unit, at least one of a mapping process or a scaling process is applied to the second video block of the first video data unit based on second side information relating to the second coding tool. The second side information is determined based on one or more second adaptive parameter sets for the second coding tool, the one or more second adaptive parameter sets relating to a third time layer index which is less than or equal to the first time layer index, In the mapping process, a piecewise linear model is used for the lumens component of the second video block, and in the scaling process, the residual samples of the chromens component of the second video block are scaled. The one or more first adaptive parameter sets and the one or more second adaptive parameter sets include syntax elements indicating the type of adaptive parameter set parameter, The one or more first adaptive parameter sets and the one or more second adaptive parameter sets are permitted to be shared between different pictures of the video. The information of the third time layer index and the information of the second side are signaled together and reside in the same information unit. method.