A method for encoding / decoding video, a method for transmitting a bitstream, and a recording medium for storing a bitstream.

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The video encoding/decoding method addresses high-resolution video efficiency challenges by utilizing NNPFC SEI messages to enhance encoding/decoding efficiency and reduce decoder errors, improving coding quality and reducing costs.

JP7876078B2Active Publication Date: 2026-06-18LG ELECTRONICS INC

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: LG ELECTRONICS INC
Filing Date: 2024-04-05
Publication Date: 2026-06-18

Application Information

Patent Timeline

05 Apr 2024

Application

18 Jun 2026

Publication

JP7876078B2

IPC: H04N19/70; H04N19/117; H04N19/85

CPC: H04N19/117; H04N19/172; H04N19/70; H04N19/85

AI Tagging

Application Domain

Digital video signal modification

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

The increasing demand for high-resolution, high-quality video leads to higher transmission and storage costs due to the increased amount of information, necessitating highly efficient video compression technology.

Method used

A video encoding/decoding method that includes obtaining and signaling neural-network post-filter characteristics (NNPFC) SEI messages to improve encoding/decoding efficiency, clarify output picture information, and reduce decoder errors by adjusting the signaling order of information related to the NNPF output picture.

Benefits of technology

This method enhances encoding/decoding efficiency, improves coding quality, and reduces decoder errors by clarifying the meaning of NNPF output picture information through SEI messages.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007876078000030
Figure 0007876078000031
Figure 0007876078000032

Patent Text Reader

Abstract

A video encoding / decoding method, a bitstream transmission method, and a computer-readable recording medium for storing a bitstream are provided. The video decoding method according to this disclosure includes the steps of obtaining post-filter-based corresponding output picture information for an input picture from an NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message, and obtaining a corresponding output picture for the input picture based on the corresponding output picture information, wherein the corresponding output picture information may include output picture generation information indicating whether or not a post-filter corresponding output picture has been generated for the input picture.

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This disclosure relates to a video encoding / decoding method, a bitstream transmission method, and a recording medium storing a bitstream, and more particularly to a video encoding / decoding method, a bitstream transmission method, and a recording medium storing a bitstream related to a neural network post-processing filter. [Background technology]

[0002] In recent years, demand for high-resolution, high-quality video, such as HD (High Definition) and UHD (Ultra High Definition) video, has been increasing in various fields. The higher the resolution and quality of video data, the greater the amount of information or bits transmitted compared to existing video data. This increase in the amount of information or bits transmitted leads to increased transmission and storage costs.

[0003] Therefore, highly efficient video compression technology is desired to effectively transmit, store, and play back high-resolution, high-quality video information. [Overview of the project] [Problems that the invention aims to solve]

[0004] The purpose of this disclosure is to provide a video encoding / decoding method and apparatus with improved encoding / decoding efficiency.

[0005] Furthermore, this disclosure aims to provide a method for processing NNPF-related SEI messages (NNPFC SEI and NNPFA SEI).

[0006] Furthermore, this disclosure aims to more clearly identify the NNPF output picture through NNPF-related SEI messages.

[0007] Furthermore, this disclosure aims to clarify the meaning of information related to the output picture of NNPF.

[0008] Furthermore, this disclosure aims to reduce decoder errors by clarifying the meaning of information related to the output picture of the NNPF.

[0009] Furthermore, this disclosure aims to improve coding quality and efficiency by clarifying the meaning of information related to the output picture in NNPF.

[0010] Furthermore, this disclosure aims to improve efficiency by adjusting the signaling order of information related to the output picture of NNPF.

[0011] Furthermore, this disclosure aims to provide a non-temporary computer-readable recording medium for storing a bitstream generated by the video encoding method relating to this disclosure.

[0012] Furthermore, this disclosure aims to provide a non-temporary computer-readable recording medium that stores a bitstream that is received and decoded by the video decoding device relating to this disclosure and used for restoring video.

[0013] Furthermore, this disclosure aims to provide a method for transmitting a bitstream generated by the video encoding method relating to this disclosure.

[0014] The technical challenges addressed in this disclosure are not limited to those mentioned above, and other technical challenges not mentioned above will be clearly understood by those with ordinary skill in the art to which this disclosure pertains from the following description. [Means for solving the problem]

[0015] The video decoding method performed by a video decoding apparatus according to an embodiment of the present disclosure includes obtaining, from an NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message, post-filter-based corresponding output picture information for an input picture, and obtaining a corresponding output picture for the input picture based on the corresponding output picture information, where the corresponding output picture information may include (may be provided with; may be configured with; may be constructed with; may be set with; may include by inclusion; may include; may contain; may have) output picture generation information regarding the presence or absence of generation of a corresponding output picture of a post-filter for the input picture.

[0016] On the other hand, according to an embodiment of the present disclosure, the value of the output picture generation information may be restricted to a specific value based on a specific condition.

[0017] On the other hand, according to an embodiment of the present disclosure, the specific condition may be associated with the purpose of the post-filter.

[0018] On the other hand, according to an embodiment of the present disclosure, the specific condition may be further associated with whether the purpose of the post-filter is picture rate upsampling.

[0019] On the other hand, according to an embodiment of the present disclosure, the specific condition may be further associated with the picture index of the input picture.

[0020] On the other hand, according to an embodiment of the present disclosure, based on the specific condition, the value of the output picture generation information for at least one of the input pictures within a specific range may be restricted to 1.

[0021] On the other hand, according to an embodiment of the present disclosure, the specific range is associated with the number of the input pictures, and information associated with the number of the input pictures may be signaled.

[0022] On the one hand, according to an embodiment of the present disclosure, if the picture index of the input picture is 0, the value of the output picture generation information for the input picture may be limited to 1.

[0023] On the one hand, according to an embodiment of the present disclosure, the specific condition may be associated with the number of input pictures.

[0024] On the one hand, according to an embodiment of the present disclosure, based on the existence of the corresponding output picture for the input picture, the input picture may be replaced by the corresponding output picture in the bitstream.

[0025] On the one hand, according to an embodiment of the present disclosure, the output picture generation information may be signaled regardless of whether the purpose of the post-filter includes picture rate upsampling.

[0026] A video encoding method performed by a video encoding apparatus according to an embodiment of the present disclosure includes a step of determining post-filter-based corresponding output picture information for an input picture,

[0027] a step of signaling the corresponding output picture information in a NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message, and the corresponding output picture information may include output picture generation information regarding the presence or absence of generation of the corresponding output picture of the post-filter for the input picture.

[0028] Also, according to the present disclosure, there may be provided a non-temporary computer-readable recording medium for storing a bitstream generated by the video encoding method according to the present disclosure.

[0029] Furthermore, this disclosure may provide a non-temporary computer-readable recording medium for storing a bitstream that is received and decoded by the video decoding device relating to this disclosure and used for restoring the video.

[0030] Furthermore, this disclosure may provide a method for transmitting a bitstream generated by a video encoding method.

[0031] The features of this disclosure briefly summarized above are merely illustrative examples of the detailed description of this disclosure described below, and do not limit the scope of this disclosure. [Effects of the Invention]

[0032] According to this disclosure, it is possible to provide a video encoding / decoding method and apparatus with improved encoding / decoding efficiency.

[0033] Furthermore, according to this disclosure, modifying the semantics of information within NNPFC SEI messages enables clearer communication of meaning.

[0034] Furthermore, according to this disclosure, decoder errors can be reduced by correcting the semantics of the information within the NNPFC SEI message.

[0035] Furthermore, according to this disclosure, efficiency can be improved by adjusting the information signaling order within NNPFC SEI messages.

[0036] Furthermore, according to this disclosure, efficiency can be improved by more clearly identifying information about the NNPF output picture through NNPF-related SEI messages.

[0037] Furthermore, according to this disclosure, NNPF can improve coding quality and efficiency by clarifying the meaning of information related to the output picture.

[0038] Furthermore, this disclosure makes it possible to provide a non-temporary computer-readable recording medium for storing a bitstream generated by the video encoding method relating to this disclosure.

[0039] Furthermore, this disclosure provides a non-temporary computer-readable recording medium for storing a bitstream that is received and decoded by the video decoding device relating to this disclosure and used for restoring video.

[0040] Furthermore, this disclosure provides a method for transmitting a bitstream generated by a video encoding method.

[0041] The effects obtained from this disclosure are not limited to those mentioned above, and any other effects not mentioned above will be clearly understood by a person with ordinary skill in the art to which this disclosure pertains from the following description. [Brief explanation of the drawing]

[0042] [Figure 1] This is a schematic diagram showing a video coding system to which the embodiments of this disclosure can be applied. [Figure 2] This is a schematic diagram showing a video encoding device to which the embodiments of this disclosure can be applied. [Figure 3] This is a schematic diagram showing an image decoding device to which the embodiments of this disclosure can be applied. [Figure 4] This diagram illustrates the interleaved method for lumen channel induction. [Figure 5] This is a diagram illustrating the output picture of an NNPF (Neural-network post filter). [Figure 6] This is a flowchart illustrating a video decoding method to which the embodiments of this disclosure can be applied. [Figure 7] This is a flowchart illustrating a video encoding method to which the embodiments of this disclosure can be applied. [Figure 8]This figure illustrates a content streaming system to which the embodiments of this disclosure can be applied. [Modes for carrying out the invention]

[0043] Hereafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings, so that they can be easily implemented by a person with ordinary skill in the art to which the present disclosure pertains. However, the present disclosure may be embodied in various other forms and is not limited to the embodiments described herein.

[0044] In describing embodiments of this disclosure, if a specific description of a known configuration or function is deemed to obscure the gist of this disclosure, such detailed description will be omitted. In the figures, parts unrelated to the description of this disclosure will be omitted, and similar parts will be denoted by similar reference numerals.

[0045] In this disclosure, when one component is described as being “linked,” “joined,” or “connected” to another component, this may include not only direct linkages but also indirect linkages where other components exist in between. Furthermore, when one component is described as “containing” or “having” another component, this means, unless otherwise specified, that it may contain further other components rather than excluding them.

[0046] In this disclosure, terms such as "first," "second," etc., are used solely to distinguish one component from another, and do not limit the order or importance of the components unless otherwise specified. Therefore, within the scope of this disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly, a second component in one embodiment may be referred to as a first component in another embodiment.

[0047] In this disclosure, components are distinguished from each other solely to clearly describe their respective characteristics, and this does not necessarily mean that these components are separate. That is, multiple components may be integrated to constitute a single hardware or software unit, or a single component may be distributed to constitute multiple hardware or software units. Therefore, such integrated or distributed embodiments are also included in the scope of this disclosure, even without specific mention.

[0048] In this disclosure, the components described in various embodiments are not necessarily essential components, and some may be optional components. Therefore, embodiments consisting of a subset of the components described in one embodiment are also included in the scope of this disclosure. Furthermore, embodiments that further include other components in addition to the components described in various embodiments are also included in the scope of this disclosure.

[0049] This disclosure relates to the encoding and decoding of video, and unless otherwise defined herein, the terms used herein may have their ordinary meanings in the art to which this disclosure pertains.

[0050] In this disclosure, "picture" generally refers to a unit representing a single video image for a specific time period, and "slice / tile" is an encoding unit that constitutes a part of a picture. A single picture may consist of one or more slices / tiles. A slice / tile may also contain one or more CTUs (coding tree units).

[0051] In this disclosure, “pixel” or “pel” can mean the smallest unit that constitutes a picture (or video). The term “sample” may also be used as a counterpart to pixel. A sample may generally represent a pixel or a pixel value, or it may represent only the pixel / pixel value of the luma component, or only the pixel / pixel value of the chroma component.

[0052] In this disclosure, “unit” may represent a basic unit of image processing. A unit may include at least one of a specific region of a picture and information associated with that region. A unit may, as it may be, be replaced by terms such as “sample array,” “block,” or “area.” In general, an MxN block may include a sample (or sample array) or a set (or array) of transform coefficients consisting of M columns and N rows.

[0053] In this disclosure, “current block” can mean one of the following: “current coding block,” “current coding unit,” “block to encode,” “block to decode,” or “block to process.” When prediction is performed, “current block” can mean “current prediction block” or “block to predict.” When transformation (inverse transformation) / quantization (inverse quantization) is performed, “current block” can mean “current transformation block” or “block to transform.” When filtering is performed, “current block” can mean “block to filter.”

[0054] In this disclosure, "current block" may mean a block containing both a luma component block and a chroma component block, or "the luma block of the current block," unless otherwise explicitly stated as a chroma block. The luma component block of the current block may be expressed with an explicit mention of a luma component block, such as "luma block" or "current luma block." Similarly, the chroma component block of the current block may be expressed with an explicit mention of a chroma component block, such as "chroma block" or "current chroma block."

[0055] In this disclosure, " / " and "," may be interpreted as "and / or." For example, "A / B" and "A, B" may be interpreted as "A and / or B." Also, "A / B / C" and "A, B, C" may mean "at least one of A, B and / or C."

[0056] In this disclosure, “or” may be interpreted as “and / or.” For example, “A or B” may mean 1) “A” only, 2) “B” only, or 3) “A and B.” Alternatively, in this disclosure, “or” may mean “additionally or alternatively.”

[0057] Overview of the video coding system

[0058] Figure 1 is a schematic diagram showing a video coding system to which the embodiments of this disclosure can be applied.

[0059] A video coding system according to one embodiment may include an encoding device 10 and a decoding device 20. The encoding device 10 can transmit encoded video and / or image information or data to the decoding device 20 in file or streaming form via a digital storage medium or network.

[0060] An encoding device 10 according to one embodiment may include a video source generation unit 11, an encoding unit 12, and a transmission unit 13. A decoding device 20 according to one embodiment may include a receiving unit 21, a decoding unit 22, and a rendering unit 23. The encoding unit 12 may be called a video / image encoding unit, and the decoding unit 22 may be called a video / image decoding unit. The transmission unit 13 may be included in the encoding unit 12. The receiving unit 21 may be included in the decoding unit 22. The rendering unit 23 may include a display unit, and the display unit may be composed of a separate device or external component.

[0061] The video source generation unit 11 can acquire video / images through video / image capture, synthesis, or generation processes. The video source generation unit 11 may include a video / image capture device and / or a video / image generation device. The video / image capture device may include, for example, one or more cameras, or a video / image archive containing previously captured video / images. The video / image generation device may include, for example, a computer, a tablet, and a smartphone, and can generate video / images (electronically). For example, virtual video / images may be generated by a computer, in which case the video / image capture process may be replaced by a process in which related data is generated.

[0062] The encoding unit 12 can encode the input video / image data. The encoding unit 12 can perform a series of procedures such as prediction, transformation, and quantization for compression and encoding efficiency. The encoding unit 12 can output the encoded data (encoded video / image information) in the form of a bitstream.

[0063] The transmitting unit 13 can acquire encoded video / image information or data output in bitstream form and transmit it in file or streaming form to the receiving unit 21 of the decoding device 20 or other external object via a digital storage medium or network. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray (registered trademark: same hereinafter), HDD, SSD, etc. The transmitting unit 13 may include elements for generating media files in a predetermined file format and may include elements for transmission via a broadcast / communication network. The transmitting unit 13 may be provided as a transmission device separate from the encoding unit 12, in which case the transmission device may include at least one processor that acquires encoded video / image information or data output in bitstream form and a transmitting unit that transmits it in file or streaming form. The receiving unit 21 can extract / receive the bitstream from the storage medium or network and transmit it to the decoding unit 22.

[0064] The decoding unit 22 can decode the video / image by performing a series of procedures such as inverse quantization, inverse transform, and prediction, which correspond to the operation of the encoding unit 12.

[0065] The rendering unit 23 can render the decoded video / image. The rendered video / image may be displayed through the display unit.

[0066] Overview of video encoding equipment

[0067] Figure 2 is a schematic diagram showing a video encoding device to which the embodiments of this disclosure can be applied.

[0068] As shown in Figure 2, the video encoding device 100 may include a video splitting unit 110, a subtraction unit 115, a conversion unit 120, a quantization unit 130, an inverse quantization unit 140, an inverse conversion unit 150, an addition unit 155, a filtering unit 160, a memory 170, an inter-prediction unit 180, an intra-prediction unit 185, and an entropy encoding unit 190. The inter-prediction unit 180 and the intra-prediction unit 185 may be collectively called the "prediction unit". The conversion unit 120, the quantization unit 130, the inverse quantization unit 140, and the inverse conversion unit 150 may be included in the residual processing unit. The residual processing unit may further include a subtraction unit 115.

[0069] Depending on the embodiment, all or at least some of the multiple components constituting the video encoding device 100 may be embodied as a single hardware component (e.g., an encoder or a processor). Furthermore, the memory 170 may include a DPB (decoded picture buffer) and may be embodied by a digital storage medium.

[0070] The video splitting unit 110 can split the input video (or picture, frame) input to the video encoding device 100 into one or more processing units. For example, the processing units may be called coding units (CUs). Coding units can be obtained by recursively splitting a coding tree unit (CTU) or the largest coding unit (LCU) using a QT / BT / TT (Quad-tree / binary-tree / ternary-tree) structure. For example, one coding unit may be split into multiple coding units of deeper depth based on a quad-tree structure, a binary-tree structure, and / or a ternary-tree structure. For the splitting of coding units, a quad-tree structure may be applied first, followed by a binary-tree structure and / or a ternary-tree structure. The coding procedure according to this disclosure may be performed based on the final coding unit that is not further split. The maximum coding unit may be used directly as the final coding unit, or a lower-depth coding unit obtained by dividing the maximum coding unit may be used as the final coding unit. Here, the coding procedure may include procedures such as prediction, transformation, and / or restoration, which will be described later. As another example, the processing unit of the coding procedure may be a prediction unit (PU) or a transformation unit (TU). The prediction unit and the transformation unit may be divided or partitioned from the final coding unit, respectively. The prediction unit may be a unit of sample prediction, and the transformation unit may be a unit that derives transformation coefficients and / or a unit that derives a residual signal from transformation coefficients.

[0071] The prediction unit (inter-prediction unit 180 or intra-prediction unit 185) can make predictions for the block to be processed (current block) and generate a predicted block that includes prediction samples for the current block. The prediction unit can determine whether intra-prediction or inter-prediction is applied to the current block or on a CU basis. The prediction unit can generate various information regarding the prediction of the current block and transmit it to the entropy encoding unit 190. The prediction information may be encoded by the entropy encoding unit 190 and output in the form of a bitstream.

[0072] The intra-prediction unit 185 can predict the current block by referring to a sample in the current picture. The referenced sample may be located in the vicinity of the current block or at a distance from it, depending on the intra-prediction mode and / or intra-prediction method. The intra-prediction mode may include a plurality of non-directional modes and a plurality of directional modes. The non-directional modes may include, for example, a DC mode and a planar mode. The directional modes may include, for example, 33 directional prediction modes or 65 directional prediction modes, depending on the accuracy of the prediction direction. However, this is an example, and more or fewer directional prediction modes may be used depending on the settings. The intra-prediction unit 185 can also determine the prediction mode to be applied to the current block using the prediction modes applied to the surrounding blocks.

[0073] The interprediction unit 180 can derive a predicted block relative to the current block based on a reference block (reference sample array) identified by motion vectors on the reference picture. In this case, in order to reduce the amount of motion information transmitted in interprediction mode, motion information can be predicted in units of blocks, subblocks, or samples based on the correlation of motion information between the surrounding blocks and the current block. The motion information may include motion vectors and reference picture indices. The motion information may further include interprediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of interprediction, the surrounding blocks may include spatial neighboring blocks existing in the current picture and temporal neighboring blocks existing in the reference picture. The reference picture containing the reference block and the reference picture containing the temporal neighboring block may be the same or different from each other. The temporal neighboring block may be called a collocated reference block, colCU, etc. The reference picture containing the temporal neighboring block may be called a collocated picture (colPic). For example, the interpretation unit 180 can construct a motion information candidate list based on surrounding blocks and generate information indicating which candidate is used to derive the motion vector and / or reference picture index of the current block. Interpretation may be performed based on various prediction modes; for example, in skip mode and merge mode, the interpretation unit 180 can use the motion information of surrounding blocks as the motion information of the current block. In skip mode, unlike merge mode, the residual signal does not need to be transmitted.In motion vector prediction (MVP) mode, the motion vectors of surrounding blocks are used as motion vector predictors, and the motion vector of the current block can be signaled by encoding the motion vector difference and an indicator for the motion vector predictor. The motion vector difference represents the difference between the motion vector of the current block and the motion vector predictor.

[0074] The prediction unit can generate a prediction signal based on various prediction methods and / or prediction techniques described later. For example, the prediction unit may apply intra-prediction or inter-prediction to predict the current block, or it may apply intra-prediction and inter-prediction simultaneously. A prediction method that applies intra-prediction and inter-prediction simultaneously to predict the current block may be called CIIP (combined inter and intra prediction). The prediction unit can also perform intra-block copy (IBC) to predict the current block. Intra-block copy may be used, for example, for coding content images / videos such as games, as in SCC (screen content coding). IBC is a method of predicting the current block using a reference block that has already been restored in the current picture at a predetermined distance from the current block. When IBC is applied, the position of the reference block in the current picture may be encoded as a vector (block vector) corresponding to the predetermined distance. IBC basically performs prediction within the current picture, but it may be performed similarly to inter-prediction in that it derives the reference block within the current picture. In other words, IBC can use at least one of the interpretation methods described in this disclosure.

[0075] The predicted signal generated by the prediction unit may be used to generate a restored signal or a residual signal. The subtraction unit 115 can generate a residual signal (residual block, residual sample array) by subtracting the predicted signal output from the prediction unit (predicted block, predicted sample array) from the input video signal (original block, original sample array). The generated residual signal may be transmitted to the conversion unit 120.

[0076] The transformation unit 120 can generate transformation coefficients by applying a transformation method to the residual signal. For example, the transformation method may include at least one of the following: DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT (Karhunen-Loeve Transform), GBT (Graph-Based Transform), or CNT (Conditionally Non-linear Transform). Here, GBT refers to the transformation obtained from a graph when the relationship information between pixels is represented by this graph. CNT refers to the transformation obtained by generating a prediction signal using all previously reconstructed pixels and obtaining a transformation based on it. The transformation process may be applied to pixel blocks of the same size and square shape, or to blocks of a variable size instead of square shape.

[0077] The quantization unit 130 can quantize the conversion coefficients and transmit them to the entropy encoding unit 190. The entropy encoding unit 190 can encode the quantized signal (information about the quantized conversion coefficients) and output it as a bitstream. The information about the quantized conversion coefficients may be called residual information. The quantization unit 130 can rearrange the block-shaped quantized conversion coefficients into a one-dimensional vector form based on the coefficient scan order, and can also generate information about the quantized conversion coefficients based on the one-dimensional vector form of the quantized conversion coefficients.

[0078] The entropy encoding unit 190 can perform various encoding methods, such as exponential Golomb, CAVLC (context-adaptive variable length coding), and CABAC (context-adaptive binary arithmetic coding). In addition to the quantized conversion coefficients, the entropy encoding unit 190 can also encode information necessary for video / image restoration (e.g., the values of syntax elements) together or separately. The encoded information (e.g., encoded video / image information) may be transmitted or stored in the form of a bitstream in units of NAL (network abstraction layer) units. The video / image information may further include information about various parameter sets, such as an adaptation parameter set (APS), picture parameter set (PPS), sequence parameter set (SPS), or video parameter set (VPS). The video / image information may also further include general constraint information. The signaling information, transmitted information, and / or syntax elements referred to in this disclosure may be encoded by the encoding procedure described above and included in the bitstream.

[0079] The bitstream may be transmitted over a network or stored on a digital storage medium. Here, the network may include broadcasting networks and / or communication networks, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmitting unit (not shown) for transmitting the signal output from the entropy encoding unit 190 and / or a storage unit (not shown) for storing it may be provided as an internal / external element of the video encoding device 100, or the transmitting unit may be provided as a component of the entropy encoding unit 190.

[0080] The quantized conversion coefficients output from the quantization unit 130 may be used to generate a resistive signal. For example, by applying inverse quantization and inverse transformation to the quantized conversion coefficients in the inverse quantization unit 140 and the inverse transformation unit 150, a resistive signal (residual block or resistive sample) can be reconstructed.

[0081] The adder 155 can generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) by adding the reconstructed residual signal to the predicted signal output from the inter-prediction unit 180 or the intra-prediction unit 185. When there is no residual for the block to be processed, such as when skip mode is applied, the predicted block may be used as the reconstructed block. The adder 155 may be called the reconstruction unit or the reconstructed block generation unit. The generated reconstructed signal may be used for intra-prediction of the next block to be processed in the current picture, or, as described later, may be used for inter-prediction of the next picture after filtering.

[0082] The filtering unit 160 can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit 160 can apply various filtering methods to the restored picture to generate a modified restored picture, and store the modified restored picture in the memory 170, specifically in the DPB of the memory 170. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, and bilateral filter. The filtering unit 160 can generate various filtering information and transmit it to the entropy encoding unit 190, as will be described later in the description of each filtering method. The filtering information may be encoded by the entropy encoding unit 190 and output in the form of a bitstream.

[0083] The corrected restored picture transmitted to memory 170 may be used as a reference picture in the interpretation unit 180. This allows the video encoding device 100 to avoid prediction mismatches between the video encoding device 100 and the video decoding device when interpretation is applied, and also improves encoding efficiency.

[0084] The DPB in memory 170 can store the corrected restored picture for use as a reference picture in the inter-prediction unit 180. Memory 170 can store motion information of blocks from which motion information in the current picture has been derived (or encoded) and / or motion information of blocks in the picture that have already been restored. The stored motion information may be transmitted to the inter-prediction unit 180 for use as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. Memory 170 can store restored samples of restored blocks in the current picture and transmit them to the intra-prediction unit 185.

[0085] Overview of the video decoding device

[0086] Figure 3 is a schematic diagram showing an image decoding device to which the embodiments of this disclosure can be applied.

[0087] As shown in Figure 3, the video decoding device 200 may include an entropy decoding unit 210, an inverse quantization unit 220, an inverse transformation unit 230, an addition unit 235, a filtering unit 240, a memory 250, an inter-prediction unit 260, and an intra-prediction unit 265. The inter-prediction unit 260 and the intra-prediction unit 265 can be collectively referred to as the "prediction unit". The inverse quantization unit 220 and the inverse transformation unit 230 may be included in the residual processing unit.

[0088] All or at least some of the multiple components constituting the video decoding device 200 may be embodied as a single hardware component (e.g., a decoder or processor) depending on the embodiment. Furthermore, the memory 170 may include a DPB and may be embodied by a digital storage medium.

[0089] A video decoding device 200 that receives a bitstream containing video / image information can restore the image by performing a process corresponding to the process performed by the video encoding device 100 in Figure 2. For example, the video decoding device 200 can perform decoding using the processing unit applied in the video encoding device. Therefore, the decoding processing unit may be, for example, a coding unit. The coding unit may be a coding tree unit, or it may be obtained by dividing the largest coding unit. The restored video signal decoded and output by the video decoding device 200 may then be played back by a playback device (not shown).

[0090] The video decoding device 200 can receive the signal output from the video encoding device shown in Figure 2 in the form of a bitstream. The received signal may be decoded by the entropy decoding unit 210. For example, the entropy decoding unit 210 can parse the bitstream to derive information necessary for video restoration (or picture restoration) (e.g., video / image information). The video / image information may further include information about various parameter sets, such as an adaptation parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). The video / image information may also further include general constraint information. The video decoding device may further utilize the parameter set information and / or the general constraint information to decode the video. The signaling information, received information, and / or syntax elements referred to in this disclosure may be obtained from the bitstream by decoding through the decoding procedure. For example, the entropy decoding unit 210 can decode information in the bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, and output the values of syntax elements necessary for image restoration and the quantized values of conversion coefficients related to the residual. More specifically, the CABAC entropy decoding method receives bins corresponding to each syntax element in the bitstream, determines a context model using the syntax element information to be decoded and the decoding information of the surrounding blocks and the blocks to be decoded, or symbol / bin information decoded in a previous stage, predicts the probability of bin occurrence based on the determined context model, performs arithmetic decoding of the bins, and generates symbols corresponding to the values of each syntax element.In this case, the CABAC entropy decoding method can update the context model using the decoded symbol / bin information for the context model of the next symbol / bin after determining the context model. Information related to prediction from the information decoded by the entropy decoding unit 210 is provided to the prediction unit (inter-prediction unit 260 and intra-prediction unit 265), and residual values that have been entropy decoded by the entropy decoding unit 210, i.e., quantized conversion coefficients and related parameter information, may be input to the inverse quantization unit 220. In addition, information related to filtering from the information decoded by the entropy decoding unit 210 may be provided to the filtering unit 240. On the other hand, a receiving unit (not shown) that receives signals output from the video encoding device may be further provided as an internal / external element of the video decoding device 200, or the receiving unit may be provided as a component of the entropy decoding unit 210.

[0091] On the other hand, the video decoding device according to this disclosure may be called a video / image / picture decoding device. The video decoding device may include an information decoder (video / image / picture information decoder) and / or a sample decoder (video / image / picture sample decoder). The information decoder may include an entropy decoding unit 210, and the sample decoder may include at least one of an inverse quantization unit 220, an inverse transformation unit 230, an addition unit 235, a filtering unit 240, a memory 250, an inter-prediction unit 260, and an intra-prediction unit 265.

[0092] The inverse quantization unit 220 can inverse quantize the quantized transformation coefficients and output the transformation coefficients. The inverse quantization unit 220 can rearrange the quantized transformation coefficients in a two-dimensional block form. In this case, the rearrangement may be performed based on the coefficient scan order performed by the video encoding device. The inverse quantization unit 220 can perform inverse quantization on the quantized transformation coefficients using quantization parameters (e.g., quantization step size information) and obtain the transformation coefficients.

[0093] The inverse conversion unit 230 can inversely convert the conversion coefficients to obtain residual signals (residual blocks, residual sample arrays).

[0094] The prediction unit can make predictions for the current block and generate a predicted block containing prediction samples for the current block. Based on the prediction information output from the entropy decoding unit 210, the prediction unit can determine whether intra-prediction or inter-prediction is applied to the current block and can determine a specific intra / inter-prediction mode (prediction method).

[0095] As mentioned in the description of the prediction unit of the video coding device 100, the prediction unit can generate prediction signals based on various prediction methods (techniques) described later.

[0096] The intra-prediction unit 265 can predict the current block by referring to the samples in the current picture. The description of the intra-prediction unit 185 may also apply to the intra-prediction unit 265.

[0097] The interprediction unit 260 can derive a predicted block relative to the current block based on a reference block (reference sample array) identified by motion vectors on the reference picture. In this case, in order to reduce the amount of motion information transmitted in interprediction mode, motion information can be predicted in block, subblock, or sample units based on the correlation of motion information between the surrounding block and the current block. The motion information may include motion vectors and reference picture indices. The motion information may further include interprediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of interprediction, the surrounding block may include spatially neighboring blocks present in the current picture and temporally neighboring blocks present in the reference picture. For example, the interprediction unit 260 can construct a motion information candidate list based on the surrounding blocks and derive the motion vector and / or reference picture index of the current block based on the received candidate selection information. Interprediction may be performed based on various prediction modes (methods), and the prediction information may include information indicating the mode (method) of interprediction for the current block.

[0098] The adder 235 can generate a restored signal (restored picture, restored block, restored sample array) by adding the acquired residual signal to the predicted signal (predicted block, predicted sample array) output from the prediction unit (including the inter-prediction unit 260 and / or intra-prediction unit 265). When there is no residual for the block to be processed, such as when skip mode is applied, the predicted block may be used as the restored block. The description of the adder 155 may also apply to the adder 235. The adder 235 may be called the restore unit or the restored block generation unit. The generated restored signal may be used for intra-prediction of the next block to be processed in the current picture, or, as described later, may be used for inter-prediction of the next picture after filtering.

[0099] The filtering unit 240 can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit 240 can apply various filtering methods to the restored picture to generate a modified restored picture, and the modified restored picture can be stored in the memory 250, specifically in the DPB of the memory 250. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, and bilateral filter.

[0100] The restored picture stored (modified) in the DPB of memory 250 may be used as a reference picture in the inter-prediction unit 260. Memory 250 can store motion information of blocks from which motion information in the current picture has been derived (or decoded) and / or motion information of blocks in the picture that have already been restored. The stored motion information can be transmitted to the inter-prediction unit 260 for use as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. Memory 250 can store restored samples of restored blocks in the current picture and transmit them to the intra-prediction unit 265.

[0101] In this specification, the embodiments described for the filtering unit 160, inter-prediction unit 180, and intra-prediction unit 185 of the video encoding device 100 may be applied identically or in a corresponding manner to the filtering unit 240, inter-prediction unit 260, and intra-prediction unit 265 of the video decoding device 200, respectively.

[0102] Neural network post-filter characteristics (NNPFC)

[0103] The combinations in Tables 1 to 3 represent the NNPFC syntax structure.

[0104] [Table 1]

[0105] [Table 2]

[0106] [Table 3]

[0107] The NNPFC syntax structures shown in Tables 1 to 3 may be signaled in the form of SEI (supplemental enhancement information) messages. SEI messages that signal the NNPFC syntax structures shown in Tables 1 to 3 can be called NNPFC SEI messages.

[0108] NNPFC SEI messages can identify neural networks available as post-processing filters. The use of identified post-processing filters for a particular picture can be indicated using neural-network post-filter activation (NNPFA) SEI messages. Here, "post-processing filter" and "post-filter" may have the same meaning.

[0109] To use such SEI messages, you may need to define variables like the following:

[0110] - The width and height of the input picture may be cropped in lumen samples, and these widths and heights can be represented by CroppedWidth and CroppedHeight, respectively.

[0111] - The lumens sample array of the input picture, CroppedYPic[idx], and the chromens sample arrays, CroppedCbPic[idx] and CroppedCrPic[idx], may be used as input to the NNPF if they exist, and the index idx may be in the range of 0 to numInputPics-1.

[0112] - BitDepth Y This can show the bit depth of the input picture relative to the lumens sample array.

[0113] - BitDepth C This can show the bit depth of the chroma sample array (if any) of the input picture.

[0114] - ChromaFormatIdc can indicate a chroma format identifier.

[0115] - If the value of nnpfc_auxiliary_inp_idc is 1, the filtering strength control value StrengthControlVal must be a real number in the range of 0 to 1.

[0116] An input picture with index 0 may be a picture whose NNPF, defined by an NNPFC SEI message, has been activated by an NNPFA SEI message. An input picture whose index i is within the range of 1 to numInputPics-1 may take precedence over an input picture with index i-1 in the output order.

[0117] If nnpfc_purpose & 0x08 is not identical to 0, and an input picture with index 0 is associated with a frame packing array SEI message having the same fp_arrangement_type as 5, then all input pictures may be associated with a frame packing array SEI message having the same fp_arrangement_type as 5, and may have the same value as fp_current_frame_is_frame0_flag.

[0118] There may be two or more NNPFC SEI messages for the same picture. If two or more NNPFC SEI messages with different nnpfc_id values exist or are activated for the same picture, the two or more NNPFC SEI messages may have the same or different nnpfc_purpose and nnpfc_mode_idc values.

[0119] nnpfc_purpose can indicate the purpose of the NNPF as shown in Table 4. The value of nnpfc_purpose may be restricted to being within the range of 0 to 63 in the bitstream. Values for nnpfc_purpose in the range of 64 to 65535 may be reserved for future use. The decoder must ignore NNPFC SEI messages with nnpfc_purpose in the range of 64 to 65535. If a value of nnpfc_purpose is reserved for future use, the syntax elements of this SEI message may be extended to syntax elements that exist on the condition that nnpfc_purpose is identical to that value. If ChromaFormatIdc is identical to 3, then nnpfc_purpose & 0x02 must be identical to 0. If ChromaFormatIdc or nnpfc_purpose & 0x02 is not identical to 0, then nnpfc_purpose & 0x20 must be identical to 0.

[0120] [Table 4]

[0121] nnpfc_id may contain an identification number that can be used to identify NNPF. The nnpfc_id value is between 0 and 2 32 It must be within the range of -2. The range is 256~511 and 2 31 ~2 32 -2 range nnpfc_id values may be reserved for future use. Decoders are in the range of 256~511 or 2 31 ~2 32 NNPFC SEI messages with an nnpfc_id in the -2 range must be ignored.

[0122] If an NNPFC SEI message is currently the first NNPFC SEI message in the decoding sequence that has a specific nnpfc_id value within CLVS, the following may apply:

[0123] - The aforementioned SEI message can indicate base NNPF.

[0124] - The SEI message may be associated in output order with the currently decoded picture and all subsequent decoded pictures of the current layer until the CLVS finishes.

[0125] An NNPFC SEI message may be a repetition of a previous NNPFC SEI message in the CLVS in the decoding order, and the subsequent semantics may be applied as if this SEI message were the only NNPFC SEI message in the CLVS that has the same content.

[0126] A value of 0 for nnpfc_mode_idc indicates that the SEI message contains a bitstream representing the basic NNPF, or that it represents an update related to the basic NNPF having the same nnpfc_id value.

[0127] If an NNPFC SEI message is the first NNPFC SEI message in a decoding sequence currently having a specific nnpfc_id value within the CLVS, a value of 1 for nnpfc_mode_idc may indicate that the basic NNPF associated with the nnpfc_id value is a neural network, and the neural network may be identified by a URI represented by nnpfc_uri using the format identified by the tag URI nnpfc_tag_uri.

[0128] If an NNPFC SEI message is neither the first NNPFC SEI message in a decoding sequence currently having a specific nnpfc_id value within the CLVS, nor a repetition of the first NNPFC SEI message, then a value of 1 for nnpfc_mode_idc may indicate that an update related to the underlying NNPF having the same nnpfc_id value is defined by a URI represented by nnpfc_uri using the tag URI nnpfc_tag_uri.

[0129] The value of nnpfc_mode_idc may be restricted to being in the range of 0 to 1 in the bitstream. Values in the range of 2 to 255 for nnpfc_mode_idc may be reserved for future use and do not need to be present in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_mode_idc in the range of 2 to 255. Values of nnpfc_mode_idc greater than 255 do not need to be present in the bitstream and do not need to be reserved for future use.

[0130] If the aforementioned SEI message is the first NNPFC SEI message in the decoding order that currently has a specific nnpfc_id value within CLVS, then the NNPF PostProcessingFilter() may be assigned the same as the basic NNPF.

[0131] If the aforementioned SEI message is not the first NNPFC SEI message in the decoding sequence currently having a specific nnpfc_id value within CLVS, nor is it an iteration of the first NNPFC SEI message, then the NNPF PostProcessingFilter() may apply the update defined by the SEI message to the base NNPF and retrieve it.

[0132] Updates are not cumulative; rather, each update may be applied to the base NNPF which is the NNPF specified by the first NNPFC SEI message in the decoding order that currently has a specific nnpfc_id value within CLVS.

[0133] nnpfc_reserved_zero_bit_a may be restricted to have the same value as 0 by bitstream restrictions. The decoder may be restricted to ignore NNPFC SEI messages where the value of nnpfc_reserved_zero_bit_a is not 0.

[0134] The nnpfc_tag_uri may contain a tag URI having syntax and semantics specified in IETF RFC 4151 that identifies the neural network used as the base NNPF or an update to the base NNPF using the nnpfc_id value identified by the nnpfc_uri. Using nnpfc_tag_uri, the format of the neural network data specified by nnrpf_uri can be uniquely identified without a central registration authority. The same nnpfc_tag_uri as "tag:iso.org,2023:15938-17" can indicate that the neural network data identified by nnpfc_uri complies with ISO / IEC 15938-17.

[0135] nnpfc_uri may contain a URI having syntax and semantics specified in IETF Internet Standard 66 that identifies a neural network used as a base NNPF or an update associated with a base NNPF that uses the same nnpfc_id value.

[0136] A value of 1 for nnpfc_property_present_flag can indicate the presence of syntax elements related to the filter's purpose, input formatting, output formatting, and complexity. A value of 0 for nnpfc_property_present_flag can indicate the absence of syntax elements related to the filter's purpose, input formatting, output formatting, and complexity. The value of nnpfc_property_present_flag may be restricted to being identical to 1 if the SEI message is the first NNPFC SEI message in the decoding order and currently has a specific nnpfc_id value in CLVS. When the value of nnpfc_property_present_flag is identical to 0, the values of all syntax elements that exist only when the value of nnpfc_property_present_flag is 1 and for which no inferred value has been specified may be inferred to be identical to the corresponding syntax elements in the NNPFC SEI message containing the base NNPF for which the SEI provides updates.

[0137] A value of 1 for nnpfc_base_flag indicates that the SEI message is a basic NNPF. A value of 0 for nnpfc_base_flag indicates that the SEI message is an update related to a basic NNPF. If nnpfc_base_flag is not present, its value can be inferred to be 0.

[0138] The following restrictions may apply to the value of nnpfc_base_flag:

[0139] - If an NNPFC SEI message is the first NNPFC SEI message in the CLVS that currently has a specific nnpfc_id value in the decoding order, the value of nnpfc_base_flag may be the same as 1.

[0140] - If NNPFC SEI message nnpfcB is not the first NNPFC SEI message in the CLVS that currently has a specific nnpfc_id value in the decoding order, and the value of nnpfc_base_flag is the same as 1, then the NNPFC SEI message may be a repetition of the first NNPFC SEI message nnpfcA that has the same nnpfc_id in the decoding order. That is, the payload condensates of nnpfcB may be the same as the payload condensates of nnpfcA.

[0141] The following may apply if an NNPFC SEI message is not the first NNPFC SEI message in the CLVS that currently has a specific nnpfc_id value in the decoding order, and does not correspond to an iteration of the first NNPFC SEI message with a specific nnpfc_id value.

[0142] - SEI messages can define updates related to the preceding basic NNPF that have the same nnpfc_id value and are in the decoding order.

[0143] - SEI messages are related in output order to the current restored picture of the current layer and all subsequent restored pictures, up to the end of the current CLVS or up to the next restored picture after the current restored picture within the current CLVS, and in decoding order to subsequent NNPFC SEI messages that have an earlier value among the specific nnpfc_id values within the current CLVS.

[0144] The following restrictions may apply if an NNPFC SEI message nnpfcCurr is not the first NNPFC SEI message in the CLVS that currently has a specific nnpfc_id value in the decoding order, nor is it an iteration of the first NNPFC SEI message that has a specific nnpfc_id value (i.e., the value of nnpfc_base_flag is 0), and the value of nnpfc_property_present_flag is 1.

[0145] - The value of nnpfc_purpose in an NNPFC SEI message must be identical to the value of nnpfc_purpose in the first NNPFC SEI message that currently has a specific nnpfc_id value in the CLVS in the decoding order.

[0146] - The values of the syntax elements nnpfc_base_flag and preceding nnpfc_complexity_info_present_flag within an NNPFC SEI message must be identical to the values of the corresponding syntax elements in the first NNPFC SEI message that currently has a specific nnpfc_id value in the CLVS in the decoding order.

[0147] - In the decoding order, the nnpfc_complexity_info_present_flag in the first NNPFC SEI message that currently has a specific nnpfc_id value in CLVS must be equal to 0, or both must be equal to 1, and the following may apply:

[0148] (1) The nnpfc_parameter_parameter_type_idc in nnpfcCurr must be identical to the nnpfc_parameter_parameter_type_idc in nnpfcBase.

[0149] (2) If nnpfc_log2_parameter_bit_length_minus3 exists in nnpfcCurr, then nnpfc_log2_parameter_bit_length_minus3 in nnpfcCurr must be less than or equal to nnpfc_log2_parameter_bit_length_minus3 in nnpfcBase.

[0150] (3) If nnpfc_num_parameters_idc in nnpfcBase is the same as 0, then nnpfc_num_parameters_idc in nnpfcCurr must also be the same as 0.

[0151] (4) Otherwise (if nnpfc_num_parameters_idc in nnpfcBase is greater than 0), then nnpfc_num_parameters_idc in nnpfcCurr must be greater than 0 or less than or equal to nnpfc_num_parameters_idc in nnpfcBase.

[0152] (5) If nnpfc_num_kmac_operations_idc in nnpfcBase is the same as 0, then nnpfc_num_kmac_operations_idc in nnpfcCurr must also be the same as 0.

[0153] (6) If not (where nnpfc_num_kmac_operations_idc in nnpfcBase is greater than 0), then nnpfc_num_kmac_operations_idc in nnpfcCurr must be greater than 0 and less than or equal to nnpfc_num_kmac_operations_idc in nnpfcBase.

[0154] (7) If nnpfc_total_kilobyte_size in nnpfcBase is equal to 0, then nnpfc_total_kilobyte_size in nnpfcCurr must also be equal to 0.

[0155] (8) Otherwise (where nnpfc_total_kilobyte_size in nnpfcBase is greater than 0), nnpfc_total_kilobyte_size in nnpfcCurr must be greater than 0 or less than or equal to nnpfc_total_kilobyte_size in nnpfcBase.

[0156] nnpfc_out_sub_c_flag can indicate the values of the variables outSubWidthC and outSubHeightC if nnpfc_purpose & 0x02 is not equal to 0. A value of 1 for nnpfc_out_sub_c_flag indicates that the value of outSubWidthC is 1 and the value of outSubHeightC is 1. A value of 0 for nnpfc_out_sub_c_flag indicates that the value of outSubWidthC is 2 and the value of outSubHeightC is 1. If the value of ChromaFormatIdc is 2 and nnpfc_out_sub_c_flag exists, the value of nnpfc_out_sub_c_flag must be the same as 1.

[0157] nnpfc_out_colour_format_idc can indicate the NNPFC output color format and the resulting values of the variables outSubWidthC and outSubHeightC, provided that nnpfc_purpose & 0x20 is not equal to 0. A value of 1 for nnpfc_out_colour_format_idc indicates that the NNPFC output color format is 4:2:0, and both outSubWidthC and outSubHeightC are equal to 2. A value of 2 for nnpfc_out_colour_format_idc indicates that the NNPFC output color format is 4:2:2, with outSubWidthC being 2 and outSubHeightC being 1. A value of 3 for nnpfc_out_colour_format_idc indicates that the NNPFC output color format is 4:2:4, and both outSubWidthC and outSubHeightC are 1. The value of nnpfc_out_colour_format_idc may be restricted to not be equal to 0.

[0158] If both nnpfc_purpose & 0x02 and nnpfc_purpose & 0x20 are the same as 0, then outSubWidthC and outSubHeightC can be inferred to be the same as SubWidthC and SubHeightC, respectively.

[0159] nnpfc_pic_width_in_luma_samples and nnpfc_pic_height_in_luma_samples can indicate the width and height of the luma sample array of the picture, respectively, resulting from applying the NNPF identified by nnpfc_id to the cropped, decoded output picture. If nnpfc_pic_width_in_luma_samples and nnpfc_pic_height_in_luma_samples are not present, they may be inferred to be the same as CroppedWidth and CroppedHeight, respectively. The value of nnpfc_pic_width_in_luma_samples should be in the range of CroppedWidth to CroppedWidth*16-1. The value of nnpfc_pic_height_in_luma_samples should be in the range of CroppedHeight to CroppedHeight*16-1.

[0160] nnpfc_num_input_pics_minus1+1 can indicate the number of decoded output pictures used as input to NNPF. The value of nnpfc_num_input_pics_minus1 may be restricted to be within the range of 0 to 63.

[0161] nnpfc_interpolated_pics[i] can indicate the number of interpolated pictures generated by NNPF between the i-th picture and the (i+1)-th picture used as input to NNPF. The value of nnpfc_interpolated_pics[i] may be restricted to the range of 0 to 63. The value of nnpfc_interpolated_pics[i] may be restricted to being greater than 0 for at least one i in the range of 0 to nnpfc_num_input_pics_minus1-1.

[0162] A value of 1 for nnpfc_input_pic_output_flag[i] indicates that NNPF will generate a corresponding output picture for the i-th input picture. A value of 0 for nnpfc_input_pic_output_flag[i] indicates that NNPF will not generate a corresponding output picture for the i-th input picture.

[0163] The variables `numInputPics`, which indicates the number of pictures used as input to NNPF, and `numOutputPics`, which indicates the total number of pictures generated as a result of NNPF, may be derived as shown in Table 5.

[0164] [Table 5]

[0165] A value of 1 for nnpfc_component_last_flag indicates that the last dimension of the input tensor for NNPF and the output tensor, outputTensor (the result of NNPF), are currently used for the channel. A value of 0 for nnpfc_component_last_flag indicates that the third dimension of the input tensor for NNPF and the output tensor, outputTensor (the result of NNPF), are currently used for the channel.

[0166] The first dimension of the input and output tensors may be used as a batch index, as used in some neural network frameworks. The formula within the semantics of this SEI message uses a batch size corresponding to a batch index such as 0, but the batch size used as input for neural network inference may be determined by the implementation of post-processing.

[0167] For example, when the value of nnpfc_inp_order_idc is the same as 3 and the value of nnpfc_auxiliary_inp_idc is the same as 1, the input tensor may have 7 channels, including 4 lumer matrices, 2 chroma matrices, and 1 auxiliary input matrix. In this case, the DeriveInputTensors() process can induce each of the 7 channels of the input tensor one by one, and when a particular channel is processed among these channels, that channel may be called the current channel during the process.

[0168] nnpfc_inp_format_idc can indicate how to convert the sample values of the cropped decoded output picture into NNPF input values. If nnpfc_inp_format_idc is 0, the input values for NNPF are real numbers, and the InpY() and InpC() functions may be specified as shown in Equation 1.

[0169]

number

[0170] If the value of nnpfc_inp_format_idc is 1, then the input values for NNPF are unsigned integer numbers, and the InpY() and InpC() functions may be induced as shown in Table 6.

[0171] [Table 6]

[0172] The variable inpTensorBitDepth Y may be derived from the syntax element nnpfc_inp_tensor_luma_bitdepth_minus8 described below. inpTensorBitDepth C may be derived from the syntax element nnpfc_inp_tensor_chroma_bitdepth_minus8 described below.

[0173] Values of nnpfc_inp_format_idc greater than 1 may be reserved for future use and may not be present in the bitstream. The decoder must ignore NNPFC SEI messages containing reserved values of nnpfc_inp_format_idc.

[0174] nnpfc_inp_tensor_luma_bitlength_minus8 + 8 can indicate the bit depth of luma sample values in the input integer tensor. inpTensorBitDepth Y The value of

[0175]

Number

[0176] The value of nnpfc_inp_tensor_luma_bitlength_minus8 may be restricted to exist in the range of 0 to 24.

[0177] nnpfc_inp_tensor_chroma_bitdepth_minus8 + 8 can indicate the bit depth of chroma sample values in the input integer tensor. inpTensorBitDepth C The value of

[0178]

Number

[0179] The value of nnpfc_inp_tensor_chroma_bitdepth_minus8 may be restricted to a range of 0 to 24.

[0180] nnpfc_inp_order_idc can specify how the sample array of the cropped, decoded output picture is aligned to one of the input pictures for NNPF.

[0181] The value of nnpfc_inp_order_idc must be in the range of 0 to 3 in the bitstream. Values of nnpfc_inp_order_idc between 4 and 255 do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_inp_order_idc in the range of 4 to 255. Values of nnpfc_inp_order_idc greater than 255 do not exist in the bitstream and are not reserved for future use.

[0182] If the value of ChromaFormatIdc is not 1, then the value of nnpfc_inp_order_idc must not be 3.

[0183] Table 7 contains explanations regarding the nnpfc_inp_order_idc value.

[0184] [Table 7]

[0185] A patch may be a rectangular array of samples from the picture components (e.g., lumens or chroma components).

[0186] A value of nnpfc_auxiliary_inp_idc greater than 0 indicates that auxiliary input data exists in the NNPF input tensor. A value of nnpfc_auxiliary_inp_idc of 0 indicates that auxiliary input data does not exist in the input tensor. A value of nnpfc_auxiliary_inp_idc of 1 indicates that auxiliary input data is induced by the methods disclosed in Tables 8 to 10.

[0187] The value of nnpfc_auxiliary_inp_idc must be in the range of 0 to 1 in the bitstream. Values of nnpfc_inp_order_idc between 2 and 255 do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_inp_order_idc in the range of 2 to 255. Values of nnpfc_inp_order_idc greater than 255 do not exist in the bitstream and are not reserved for future use.

[0188] If the value of nnpfc_auxiliary_inp_idc is the same as 1, the variable strengthControlScaledVal may be derived as shown in equation 4.

[0189]

number

[0190] The process DeriveInputTensors() for deriving the input tensor inputTensor for given vertical sample coordinates cTop and horizontal sample coordinates cLeft specifying the upper-left sample position of the sample patch included in the input tensor can be shown as the join in Tables 8-10.

[0191] [Table 8]

[0192] [Table 9]

[0193] [Table 10]

[0194] A value of 1 for nnpfc_separate_colour_description_present_flag indicates that the unique combination of color primaries, transformation properties, and matrix coefficients for the picture, as determined by NNPF, is specified in the SEI message syntax structure. A value of 0 for nnfpc_separate_colour_description_present_flag indicates that the combination of color primaries, transformation properties, and matrix coefficients for the picture, as determined by NNPF, is identical to that displayed in the CLVS VUI parameters.

[0195] nnpfc_colour_primaries may have the same semantics as defined for the vui_colour_primaries syntax element, except as follows:

[0196] - nnpfc_colour_primaries can indicate the primary colors of a picture that appear as a result of applying the NNPF specified in the SEI message, rather than the primary colors used in CLVS.

[0197] - If nnpfc_colour_primaries is not present in the NNPFC SEI message, the value of nnpfc_colour_primaries may be inferred to be the same as the value of vui_colour_primaries.

[0198] nnpfc_transfer_characteristics may have the same semantics as defined for the vui_transfer_characteristics syntax element, except as follows:

[0199] - nnpfc_transfer_characteristics can indicate the transformation characteristics of the picture that appear as a result of applying the NNPF specified in the SEI message, rather than the transformation characteristics used in CLVS.

[0200] - If nnpfc_transfer_characteristics is not present in the NNPFC SEI message, the value of nnpfc_transfer_characteristics may be inferred to be the same as the value of vui_transfer_characteristics.

[0201] nnpfc_matrix_coeffs may have the same semantics as specified for the vui_matrix_coeffs syntax element, except as follows:

[0202] - nnpfc_matrix_coeffs can indicate the matrix coefficients of the picture that appear as a result of applying the NNPF specified in the SEI message, rather than the matrix coefficients used in CLVS.

[0203] - If nnpfc_matrix_coeffs is not present in the NNPFC SEI message, the value of nnpfc_matrix_coeffs can be inferred to be the same as the value of vui_matrix_coeffs.

[0204] - The acceptable values for nnpfc_matrix_coeffs do not need to be restricted by the chroma format of the decoded video picture, as shown by the ChromaFormatIdc value for the semantics of the VUI parameter.

[0205] - If the value of nnpfc_matrix_coeffs is the same as 0, the value of nnpfc_out_order_idc must not be the same as 1 or 3.

[0206] The value 0 of nnpfc_out_format_idc can indicate that for the bit depth bitDepth required for subsequent post-processing or display, the sample values output by the NNPF are real numbers linearly mapped from the range of values from 0 to 1 to the range of unsigned integer values from 0 to (1 << bitDepth)-1. The value 1 of nnpfc_out_format_idc can indicate that the luma sample values output by the NNPF are unsigned integers in the range from 0 to (1 << (nnpfc_out_tensor_luma_bitlength_minus8 + 8))-1, and the chroma sample values output by the NNPF can be indicated as unsigned integers in the range from 0 to (1 << (nnpfc_out_tensor_chroma_bitlength_minus8 + 8))-1.

[0207] Values of nnpfc_out_format_idc greater than 1 may be reserved for future use and do not exist in the bitstream. The decoder must ignore NNPFC SEI messages containing reserved values of nnpfc_out_format_idc.

[0208] nnpfc_out_tensor_luma_bitdepth_minus8 + 8 can indicate the bit depth of the luma sample values in the output integer tensor. The value of nnpfc_out_tensor_luma_bitdepth_minus8 must exist in the range from 0 to 24.

[0209] nnpfc_out_tensor_chroma_bitdepth_minus8 + 8 can indicate the bit depth of the chroma sample values in the output integer tensor. The value of nnpfc_out_tensor_chroma_bitdepth_minus8 must exist in the range from 0 to 24.

[0210] If nnpfc_purpose & 0x10 is not equal to 0, then the value of nnpfc_out_format_idc must be equal to 1, and at least one of the following restrictions may be true.

[0211] - nnpfc_out_tensor_luma_bitdepth_minus8+8 is BitDepth Y bigger

[0212] - nnpfc_out_tensor_chroma_bitdepth_minus8+8 is BitDepth C bigger

[0213] nnpfc_out_order_idc can indicate the output order of samples output from NNPF. The value of nnpfc_out_order_idc must be in the range of 0 to 3 in the bitstream. Values of nnpfc_out_order_idc from 4 to 255 do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_out_order_idc in the range of 4 to 255. Values of nnpfc_out_order_idc greater than 255 do not exist in the bitstream and are not reserved for future use. If the value of nnpfc_purpose & 0x02 is 0, the value of nnpfc_out_order_idc must not be identical to 3.

[0214] Table 11 provides explanations for the values of nnpfc_out_order_idc.

[0215] [Table 11]

[0216] The StoreOutputTensors() process for deriving sample values in the output sample arrays FilteredYPic, FilteredCbPic, and FilteredCrPic, filtered from the output tensor outputTensor, which is based on the given vertical sample coordinate cTop and the horizontal sample coordinate cLeft indicating the upper-left sample position for the patch of samples contained in the input tensor, may be expressed as the joins in Tables 12 and 13.

[0217] [Table 12]

[0218] [Table 13]

[0219] nnpfc_overlap can indicate the number of horizontal and vertical overlapping samples of adjacent input tensors in NNPF. The value of nnpfc_overlap must be within the range of 0 to 16383.

[0220] A value of 1 for nnpfc_constant_patch_size_flag indicates that NNPF accepts the exact patch size specified by nnpfc_patch_width_minus1 and nnpfc_patch_height_minus1 as input. A value of 0 for nnpfc_constant_patch_size_flag indicates that NNPF accepts any patch size with width inpPatchWidth and height inpPatchHeight as input. Here, the width of the extended patch (i.e., the patch plus the overlapping area) is the same as inpPatchWidth+2*nnpfc_overlap, and the height of the extended patch is the same as inpPatchHeight+2*nnpfc_overlap, and the height of the extended patch is the same as nnpfc_extended_patch_height_cd_delta_minus1+1+2*nnpfc_overlap, and the height of the extended patch is the same as inpPatchHeight+2*nnpfc_overlap, and the height of the extended patch is the same as nnpfc_extended_patch_height_cd_delta_minus1+1+2*nnpfc_overlap.

[0221] npfc_patch_width_minus1+1 can indicate the number of horizontal samples required for the patch size input to NNPF when the value of nnpfc_constant_patch_size_flag is 1. The value of nnpfc_patch_width_minus1 must be in the range of 0 to Min(32766,CroppedWidth-1).

[0222] npfc_patch_height_minus1+1 can indicate the number of vertical samples required for the patch size input to NNPF when the value of nnpfc_constant_patch_size_flag is 1. The value of nnpfc_patch_height_minus1 must be in the range of 0 to Min(32766,CroppedHeight-1).

[0223] nnpfc_extended_patch_width_cd_delta_minus1+1+2*nnpfc_overlap can represent the common divisor of the allowed values for the extended patch width required when inputting into NNPF when the value of nnpfc_constant_patch_size_flag is 0. The value of nnpfc_extended_patch_width_cd_delta_minus1 must be in the range of 0 to Min(32766,CroppedWidth-1).

[0224] nnpfc_extended_patch_height_cd_delta_minus1+1+2*nnpfc_overlap can represent the common divisor of the allowed values for the extended patch height required when inputting into NNPF when the value of nnpfc_constant_patch_size_flag is 0. The value of nnpfc_extended_patch_height_cd_delta_minus1 must be in the range of 0 to Min(32766,CroppedHeight-1).

[0225] The variables inpPatchWidth and inpPatchHeight may be set to the patch size width and patch size height, respectively.

[0226] If the value of nnpfc_constant_patch_size_flag is 0, the following may be applied.

[0227] - The values of inpPatchWidth and inpPatchHeight may be provided by an external means or set by the post-processor itself.

[0228] - The value of inpPatchWidth + 2 * nnpfc_overlap must be a positive integer multiple of nnpfc_extended_patch_width_cd_delta_minus1 + 1 + 2 * nnpfc_overlap, and inpPatchWidth must be less than or equal to CroppedWidth. The value of inpPatchHeight + 2 * nnpfc_overlap must be a positive integer multiple of nnpfc_extended_patch_height_cd_delta_minus1 + 1 + 2 * nnpfc_overlap, and inpPatchHeight must be less than or equal to CroppedHeight.

[0229] Otherwise, (if the value of nnpfc_constant_patch_size_flag is 1), the value of inpPatchWidth may be set to the same as nnpfc_patch_width_minus1+1, and the value of inpPatchHeight may be set to the same as nnpfc_patch_height_minus1+1.

[0230] The variables outPatchWidth, outPatchHeight, horCScaling, verCScaling, outPatchCWidth, and outPatchCHeight may be derived as shown in Table 14.

[0231] [Table 14]

[0232] The requirement for bitstream conformance is that outPatchWidth*CroppedWidth must be the same as nnpfc_pic_width_in_luma_samples*inpPatchWidth, and outPatchHeight*CroppedHeight must be the same as nnpfc_pic_height_in_luma_samples*inpPatchHeight.

[0233] nnpfc_padding_type can indicate the padding process when referencing sample locations outside the boundaries of the cropped decoded output picture, as described in Table 15. The value of nnpfc_padding_type must be in the range of 0 to 15.

[0234] [Table 15]

[0235] nnpfc_luma_padding_val can indicate the luma value to be used for padding when the value of nnpfc_padding_type is 4.

[0236] nnpfc_cb_padding_val can indicate the Cb value to be used for padding when the value of nnpfc_padding_type is 4.

[0237] nnpfc_cr_padding_val can indicate the Cr value to be used for padding when the value of nnpfc_padding_type is 4.

[0238] The InpSampleVal(y,x,picHeight,picWidth,CroppedPic) function, whose inputs are the vertical sample position y, the horizontal sample position x, the picture height picHeight, the picture width picWidth, and the sample array CroppedPic, can return the derived SampleVal value as shown in Table 16.

[0239] For inputs to the InpSampleVal() function, vertical positions may be placed before horizontal positions for compatibility with the input tensor rules of some inference engines.

[0240] [Table 16]

[0241] The processes in Table 17 may be used to perform patch filtering using NNPF PostProcessingFilter() to generate filtered and / or interpolated pictures, which may include a Y sample array FilteredYPic, a Cb sample array FilteredCbPic, and a Cr sample array FilteredCrPic, as shown by nnpfc_out_order_idc.

[0242] [Table 17]

[0243] The order of the pictures in the saved output tensor may be the output order, and the output order generated by applying NNPF to the output order may be analyzed as an output order that does not conflict with the output order of the input pictures.

[0244] A value of 1 for nnpfc_complexity_info_present_flag indicates that there is one or more syntax elements that indicate the complexity of NNPF associated with nnpfc_id. A value of 0 for nnpfc_complexity_info_present_flag indicates that there are no syntax elements that indicate the complexity of NNPF associated with nnpfc_id.

[0245] A value of 0 for nnpfc_parameter_type_idc can indicate that the neural network uses only integer parameters. A value of 1 for nnpfc_parameter_type_flag can indicate that the neural network can use floating-point or integer parameters. A value of 2 for nnpfc_parameter_type_idc can indicate that the neural network uses only binary parameters. A value of 3 for nnpfc_parameter_type_idc may be reserved for future use and is not present in the bitstream. The decoder must ignore NNPFC SEI messages where the value of nnpfc_parameter_type_idc is 3.

[0246] The values 0, 1, 2, and 3 for nnpfc_log2_parameter_bit_length_minus3 indicate that the neural network will not use parameters with bit lengths greater than 8, 16, 32, and 64, respectively. If nnpfc_parameter_type_idc exists and nnpfc_log2_parameter_bit_length_minus3 does not exist, the neural network does not need to use parameters with bit lengths greater than 1.

[0247] nnpfc_num_parameters_idc can indicate the maximum number of neural network parameters for NNPF in units of 2048. A value of 0 for nnpfc_num_parameters_idc indicates that the maximum number of neural network parameters is unknown. The value of nnpfc_num_parameters_idc must be in the range of 0 to 52. Values of nnpfc_num_parameters_idc greater than 52 do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_num_parameters_idc greater than 52.

[0248] If the value of nnpfc_num_parameters_idc is greater than 0, the maxNumParameters variable may be derived as shown in equation 5.

[0249]

Number

[0250] The number of neural network parameters of NNPF may be limited to a number less than or equal to maxNumParameters.

[0251] A nnpfc_num_kmac_operations_idc greater than 0 can indicate that the maximum number of multiply - accumulate operations per sample of NNPF is less than or equal to nnpfc_num_kmac_operations_idc * 1000. A value of 0 for nnpfc_num_kmac_operations_idc can indicate that the maximum number of multiply - accumulate operations of the network is unknown. The value of nnpfc_num_kmac_operations_idc must exist in the range of 0 to 2 32 - 2.

[0252] A nnpfc_total_kilobyte_size greater than 0 can indicate the total size (in kilobytes) required to store the uncompressed parameters of the neural network. The total size in bits can be a number greater than or equal to the sum of the bits used to store each parameter. nnpfc_total_kilobyte_size may be the result of rounding up the total size (in bits) divided by 8000. A value of 0 for nnpfc_total_kilobyte_size can indicate that the overall size required to store the parameters for the neural network is unknown. The value of nnpfc_total_kilobyte_size must exist in the range of 0 to 2 32 - 2.

[0253] nnpfc_reserved_zero_bit_b shall be identical to 0 in the bitstream. The decoder shall ignore NNPFC SEI messages for which nnpfc_reserved_zero_bit_b is not 0.

[0254] nnpfc_payload_byte[i] may contain the i-th byte of the bitstream. The byte sequence nnpfc_payload_byte[i] for all existing values of i shall be a complete bitstream compliant with ISO / IEC 15938-17.

[0255] Neural network post-filter activation (NNFPA)

[0256] The syntax structure for NNFPA is shown in Table 18.

[0257]

Table 18

[0258] The NNPFA syntax structure in Table 18 may be signaled in the form of an SEI message. The SEI message that signals the NNPFA syntax structure in Table 18 may be called an NNPFA SEI message.

[0259] The NNPFA SEI message can activate or deactivate the possible use of the target neural network post-processing filter (NNPF) identified by nnpfa_target_id for picture set post-processing filtering. For a particular picture for which the NNPF is activated, the target NNPF may be the NNPF specified by the last NNPFC SEI message having the same nnpfc_id as nnpfa_target_id. Here, the last NNPFC SEI message may precede the first VCL NAL unit of the current picture in the decoding order and may not correspond to the repetition of NNPFC SEI messages including the basic NNPF.

[0260] Multiple NNPFA SEI messages may exist for the same picture if NNPF is used for a different purpose or to filter out a different color component.

[0261] nnpfa_target_id can indicate an NNPF specified by one or more NNPFC SEI messages that currently have the same nnpfc_id as nnfpa_target_id in relation to the picture.

[0262] The value of nnpfa_target_id is 0-2 32 It must be within the range of -2. The range of 256~511 and 2 31 ~2 32 nnpfa_target_id values within the range of -2 may be reserved for future use. Decoders are 256~511 or 2 31 ~2 32 NNPFA SEI messages with an nnpfa_target_id within the range of -2 must be ignored.

[0263] An NNPFA SEI message with a specific value for nnpfa_target_id must not currently exist in the PU unless one or both of the following conditions are true:

[0264] - Currently, within CLVS, there exists an NNPFC SEI message with the same nnpfc_id as a specific value of nnpfa_target_id in the PU that precedes the current PU in the decoding order.

[0265] - Currently, there exists an NNPFC SEI message with an nnpfc_id that matches a specific value of nnpfa_target_id in the PU.

[0266] If a PU contains all NNPFC SEI messages with a specific value for nnpfc_id and all NNPFA SEI messages with the same nnpfa_target_id as the specific value for nnpfc_id, then the NNPFC SEI messages must precede the NNPFA SEI messages in the decoding order.

[0267] A value of 1 for nnpfa_cancel_flag can indicate that the persistence of the target NNPF, which was set by any previous NNPFA SEI message having the same nnpfa_target_id as the current SEI message, is cancelled. That is, the target NNPF will not be used any further unless it is activated by another NNPFA SEI message having the same nnpfa_target_id as the current SEI message and the same nnpfa_cancel_flag of 0. A value of 0 for nnpfa_cancel_flag can indicate that nnpfa_persistence_flag will continue.

[0268] The nnpfa_persistence_flag can indicate the persistence of the target NNPF for the current layer. A value of nnpfa_persistence_flag of 0 indicates that the target NNPF may only be used for post-processing filtering on the current picture. A value of nnpfa_persistence_flag of 1 indicates that the target NNPF may be used for post-processing filtering on the current picture and all subsequent pictures in the current layer in output order until one or more of the following conditions are true:

[0269] - A new CLVS for the current layer is started.

[0270] - Bitstream ends

[0271] - The picture in the current layer associated with the NNPFA SEI message having the same nnpfa_target_id as the current SEI message and the same nnpfa_cancel_flag as 1 is output next to the current picture in the output order.

[0272] The target NNPF is not applied to subsequent pictures in the current layer associated with the NNPFA SEI message having the same nnpfa_target_id as the current SEI message and the same nnpfa_cancel_flag as 1.

[0273] nnpfcTargetPictures may be a set of pictures associated with the last NNPFC SEI message having the same nnpfc_id as nnpfa_target_id while preceding the current NNPFA SEI message in the decoding order. nnpfaTargetPictures may be a set of pictures activated by the target NNPF by the current NNPFA SEI message. All optional pictures included in nnpfaTargetPictures should also be included in nnpfcTargetPictures.

[0274] Post-filter hint

[0275] The syntax structure for the post-filter hint is shown in Table 19.

[0276]

Table 19

[0277] The post-filter hint syntax structure in Table 19 may be signaled in the form of an SEI message. The SEI message that signals the post-filter hint syntax structure in Table 19 can be called a post-filter hint SEI message.

[0278] Post-filter hint SEI messages can provide post-filter coefficients or correlation information for post-filter design, potentially allowing the decoded and output picture set to be used in post-processing to obtain improved display quality.

[0279] A value of 1 for `filter_hint_cancel_flag` indicates that the persistence of a previous post-filter hint SEI message is canceled in the output order in which the SEI message is applied to the current layer. A value of 0 for `filter_hint_cancel_flag` indicates that post-filter hint information follows.

[0280] The `filter_hint_persistence_flag` can indicate the persistence of post-filter hint SEI messages for the current layer. A value of 0 for `filter_hint_persistence_flag` indicates that the post-filter hint applies only to the currently decoded picture. A value of 1 for `filter_hint_persistence_flag` indicates that the post-filter hint SEI message applies to the currently decoded picture and persists for all subsequent pictures in the current layer by output order until one or more of the following conditions are true:

[0281] - A new CLVS for the current layer is started.

[0282] - Bitstream ends

[0283] - Post-filter hints: Pictures in the current layer of the AU associated with SEI messages are output after the current picture in the output order.

[0284] `filter_hint_size_y` can represent the filter coefficient or the vertical size of the correlation array. The value of `filter_hint_size_y` must be in the range of 1 to 15.

[0285] `filter_hint_size_x` can represent the filter coefficient or the horizontal size of the correlation array. The value of `filter_hint_size_x` must be in the range of 1 to 15.

[0286] `filter_hint_type` can indicate the type of filter hint transmitted, as shown in Table 20. The value of `filter_hint_type` must be in the range of 0 to 2. A `filter_hint_type` value equal to 3 does not exist in the bitstream. The decoder must ignore post-filter hint SEI messages where `filter_hint_type` is 3.

[0287] [Table 20]

[0288] A value of 1 for filter_hint_chroma_coeff_present_flag indicates that a filter coefficient exists for the chroma. A value of 0 for filter_hint_chroma_coeff_present_flag indicates that no filter coefficient exists for the chroma.

[0289] `filter_hint_value[cIdx][cy][cx]` can represent the filter coefficients, or the cross-correlation matrix elements between the original signal and the decoded signal, with 16-bit precision. The value of `filter_hint_value[cIdx][cy][cx]` is -2 31 +1~2 31 It must be within the range of -1. cIdx may indicate the associated color element, cy may indicate the vertical counter, and cx may indicate the horizontal counter. Depending on the value of filter_hint_type, the following may be applied:

[0290] - If the value of filter_hint_type is 0, the coefficients of a 2D FIR (Finite Impulse Response) filter of size filter_hint_size_y * filter_hint_size_x may be transmitted.

[0291] - On the other hand, if the value of filter_hint_type is 1, the filter coefficients of two one-dimensional FIR filters may be transmitted. In this case, the value of filter_hint_size_y must be 2. An index cy of 0 can indicate the filter coefficient of a horizontal filter, and a cy of 1 can indicate the filter coefficient of a vertical filter. In the filtering process, the horizontal filter may be applied first, and the result may be filtered by the vertical filter.

[0292] - Otherwise (if the value of filter_hint_type is 2), the transmitted hint can represent the cross-correlation matrix between the original signal s and the decoded signal s'.

[0293] A normalized cross-correlation matrix for related color components identified by cIdx of size filter_hint_size_y*filter_hint_size_x may be defined as shown in Equation 6.

[0294]

number

[0295] In Equation 6, s represents the sample array of the color component cIdx of the original picture, s’ represents the corresponding array of the decoded picture, h represents the vertical height of the associated color component, w represents the horizontal width of the associated color component, and bitDepth represents the bit depth of the color component. Also, OffsetY is the same as (filter_hint_size_y>>1), OffsetX is the same as (filter_hint_size_x>>1), the range of cy is 0 <= cy < filter_hint_size_y, and the range of cx is 0 <= cx < filter_hint_size_x.

[0296] The decoder can derive a Wiener post-filter from the cross-correlation matrix of the original signal and the decoded signal and the auto-cross-correlation matrix of the decoded signal.

[0297] Problems with conventional technology

[0298] According to the current design for the input and output pictures in the NNPFC SEI message, the following problems may occur.

[0299] 1. Regarding the semantics of the output picture related information (for example, nnpfc_input_pic_output_flag[i])

[0300] When the purpose of NNPF (neural-network post-filter) is picture rate upsampling, in addition to signaling the nnpfc_interpolated_pics[i] flag, which specifies the number of interpolated pictures generated between any i-th and i+1-th input pictures, the NNPFC SEI message may also include nnpfc_input_pic_output_flag[i], which is output picture generation information that specifies whether or not the i-th input picture is output. On the other hand, the current semantics of nnpfc_input_pic_output_flag[i] are unknown and may lead to one or more different analyses, but the following problems may arise in particular in relation to the bitstream after the filtering process. This will be explained with reference to Figure 5. Figure 5 is a diagram to illustrate the NNPF output picture.

[0301] (1) For example, considering Figure 5, there may be two NNPFC SEI messages with NNPFC identifiers, i.e., nnpfc_id, which are either 0 or 1, and each may be activated. These two NNPFs are virtually the same filter, but may have some differences in the signaling of the input or output picture. Here, assuming that the i-1th and i-3rd input pictures are associated with the case where nnpfc_input_pic_output_flag[i] is 0, two analyses are possible as follows.

[0302] 1) If an input picture is associated with nnpfc_input_pic_output_flag[i] with a value of 0, this can mean that the picture has not been modified or filtered by the NNPF filtering process and is still part of the final bitstream (i.e., the filtered bitstream). As can be seen in Figure 5, according to such an analysis, after the filtering process the bitstream may consist of zero or more pictures modified by the filtering process and one or more new pictures (i.e., interpolated pictures) generated by the filtering process.

[0303] 2) If an input picture is associated with nnpfc_input_pic_output_flag[i] which has a value of 0, this indicates that the picture is not output by the NNPFC filtering process and is not part of the final bitstream. In other words, it indicates that the input picture is removed after the filtering process. In Figure 5, according to such analysis, the bitstream after the filtering process may consist of one or more pictures modified by the filtering process and one or more pictures generated by the filtering process (i.e., interpolated pictures).

[0304] The explanation in Table of Contents 1) above is considered to be relatively close to the true meaning of the current syntax's semantics. However, it cannot be definitively stated that the scenario in Table of Contents 1) will always apply in the current text. Therefore, it will be necessary to clarify the meaning of nnpfc_input_pic_output_flag[i], which is the output picture generation information.

[0305] 2. In relation to the numerous output pictures unrelated to the purpose of NNPF

[0306] In NNPFC SEI messages, the output picture can represent a modified or filtered version of the input picture, and may be intended to replace a specific input picture in the final bitstream. In this case, if there is one or more input pictures, there may be one or more filtered pictures, regardless of the purpose of NNPF.

[0307] On the other hand, when one or more input pictures are input to an NNPF, and the purpose of the NNPF is not picture rate upsampling, scenarios with a large number of filtered pictures may not be clearly supported by current technology. This restricts such scenarios without clear reason, even though they are quite possible, and therefore a slightly more flexible alternative is needed.

[0308] 3. In relation to the output picture when the purpose of NNPF is not picture rate upsampling.

[0309] Assuming that the output picture in an NNPFC SEI message is a modified / filtered version of the input picture and is intended to replace a specific input picture in the final bitstream, there may be one or more filtered pictures, regardless of the purpose of NNPF, when there is one or more input pictures.

[0310] On the other hand, if the purpose of NNPF does not include picture rate upsampling, signaling for the output picture may not be necessary even if the number of output pictures is set to 1. With such a design, the handling of cases where NNPF has one or more input pictures can be unclear. In this case, the output picture may correspond to the first input picture, which is a picture in the same access unit, for the NNPFA SEI message that activates NNPF. However, without a clear explanation, it can be difficult to determine whether this is intended, and problems may arise.

[0311] Summary of Examples

[0312] This disclosure proposes various embodiments that can solve the problems of conventional designs, including those mentioned above. The embodiments described below may be used independently, in combination with each other, or in combination with other embodiments, and these may also be considered to be included in this disclosure.

[0313] 1. The following description may be added to the output picture corresponding to the input picture:

[0314] (1) An input picture having a corresponding picture output from the filtering process may be replaced with an output picture in the bitstream after the filtering process.

[0315] (2) Input pictures for which there are no corresponding pictures output from the NNPF filtering process may exist in the bitstream after the filtering process.

[0316] 2. In relation to output picture signaling, the following improvements may be proposed:

[0317] (1) Regardless of whether the purpose of NNPF includes picture rate upsampling or not, it may be modified so that it is always signaled whether or not an input picture is output.

[0318] (2) If the purpose of NNPF does not include picture rate upsampling, a constraint may be added that there must be at least one output picture corresponding to an input picture.

[0319] 3. Alternatively, if the purpose of NNPF does not include picture rate upsampling, the output picture generation information, nnpfc_input_pic_output_flag[0], may be modified so that its value is inferred to a first value (e.g., 1), and the value of nnpfc_input_pic_output_flag[i] (where i is a value between 1 and nnpfc_num_input_pics_minus1) is inferred to a second value (e.g., 0).

[0320] 4. As an alternative, the signaling of the output picture may include the following improvements:

[0321] (1) Regardless of whether the purpose of NNPF includes picture rate upsampling, it may be modified to signal whether the input picture is output or not (i.e., whether it is filtered or modified).

[0322] (2) If the purpose of NNPF does not include picture rate upsampling, the following condition may be added: the value of nnpfc_input_pic_output_flag[0] must be equal to the first value (for example, 1).

[0323] 5. As an alternative, the following improvements may be made to the signaling of the output picture:

[0324] (1) Regardless of whether the purpose of NNPF includes picture rate upsampling, it may be modified to signal whether the input picture is output or not (i.e., whether it is filtered or modified).

[0325] (2) If the purpose of NNPF does not include picture rate upsampling, an additional condition may be added: the value of nnpfc_input_pic_output_flag[0], which is output picture generation information, should be equal to the first value (for example, 1) regardless of the purpose of NNPF.

[0326] The following examples illustrate improvements to the input and output pictures in NNPF (neural-network post-filter) SEI messages for coded video bitstreams. The examples described below are based on standard video codecs (e.g., VVC (Versatile Video Coding)) and VSEI (Versatile Supplemental Enhancement Information Messages for Coded Video Bitstreams), but it is obvious that they may also be applicable to other video coding techniques, and this is also included in the scope of this disclosure.

[0327] On the other hand, the syntax names used when describing the embodiments below are arbitrarily designated for clarity of explanation, and therefore it is self-evident that the syntax names may be changed, and even if the syntax names are changed, they will still be included in this disclosure.

[0328] The embodiments of this application will be described in detail below with reference to the drawings.

[0329] Example 1

[0330] Example 1 provides a detailed explanation of the example described in Table of Contents 1 above. The VSEI message syntax and semantics are described below.

[0331] As an example, the NNPFC (Neural-network post-filter characteristics) SEI message syntax and semantics may be modified as follows.

[0332] For example, an NNPFC SEI message may include information about an output picture, and this information may include information about whether or not an output picture corresponding to an input picture is generated, i.e., output picture generation information. For example, the output picture generation information may be nnpfc_input_pic_output_flag[i], where i is an index representing the picture. For example, nnpfc_input_pic_output_flag[i], which is output picture generation information, can indicate whether or not a corresponding output picture is generated based on NNPF (neural-network post-filter) for the i-th input picture. On the other hand, if the value of nnpfc_input_pic_output_flag[i] is a first value (e.g., 1), it can indicate that NNPF will generate a corresponding output picture, and if the value of nnpfc_input_pic_output_flag[i] is a second value (e.g., 0), it can indicate that NNPF will not generate a corresponding output picture.

[0333] On the other hand, input pictures that have a corresponding output picture may be replaced with an output picture after the filtering process, and the output picture may exist within the bitstream. On the other hand, input pictures that do not have a corresponding output picture may exist directly within the bitstream after the filtering process.

[0334] Example 2

[0335] Example 2 provides a detailed explanation of the example described in Table of Contents 2 in the overview of the above example. The VSEI message syntax and semantics are described below.

[0336] As an example, the NNPFC (Neural-network post-filter characteristics) SEI message syntax and semantics may be modified as follows. The NNPFC SEI message syntax may be signaled as shown in the following table. According to this application, the signaling order of nnpfc_input_pic_output_flag, which is output picture generation information, may be changed.

[0337] [Table 21]

[0338] For example, the nnpfc_purpose syntax may correspond to information indicating the purpose of NNPFC. For example, the purpose of NNPFC may include picture rate upsampling. This is as described above, and a redundant explanation will be omitted.

[0339] For example, the nnpfc_id syntax may correspond to information indicating an NNPFC identifier. This is as explained above, and a redundant explanation will be omitted.

[0340] For example, the nnpfc_property_present_flag syntax may correspond to information about syntax elements associated with filter attributes (e.g., filtering purpose, input formatting, output formatting, and / or complexity). This is as described above, and a redundant explanation is omitted.

[0341] For example, the nnpfc_base_flag syntax can indicate information about basic NNPF, which is as described above, and a redundant explanation will be omitted.

[0342] For example, the nnpfc_num_input_pics_minus1 syntax may be information about the number of decoded output pictures used as input to NNPF. For example, the value indicated by nnpfc_num_input_pics_minus1 plus 1 may indicate the number of decoded output pictures used as input to NNPF. On the other hand, for example, the value of nnpfc_num_input_pics_minus1 may be in the range of 0 to 63, and if the value of nnpfc_purpose&0x08 is not 0, then the value of nnpfc_num_input_pics_minus1 must be greater than 0.

[0343] For example, an NNPFC SEI message may include information about the output picture, and this information may include information about whether or not an output picture corresponding to an input picture is generated, i.e., output picture generation information. For example, the output picture generation information may be nnpfc_input_pic_output_flag[i], where i is an index representing the picture. For example, nnpfc_input_pic_output_flag[i], which is the output picture generation information, can indicate whether or not a corresponding output picture is generated based on NNPF (neural-network post-filter) for the i-th input picture. On the other hand, if the value of nnpfc_input_pic_output_flag[i] is a first value (e.g., 1), it can indicate that NNPF will generate a corresponding output picture, but if the value of nnpfc_input_pic_output_flag[i] is a second value (e.g., 0), it can indicate that NNPF will not generate a corresponding output picture. On the other hand, the output picture generation information may be included in the SEI message and signaled regardless of the purpose of NNPF, i.e., regardless of the purpose of NNPF. However, it may be signaled based on information about the number of input pictures (e.g., nnpfc_num_input_pics_minus1). On the other hand, for example, the value of nnpfc_input_pic_output_flag[i], which is output picture generation information, may be restricted to a specific value based on other information. For example, the value of nnpfc_input_pic_output_flag[i] may be restricted to a specific value based on the purpose of NNPFC, etc. Here, if the value of nnpfc_purpose, i.e., information about the purpose of NNPFC, is a specific value, and 0x08 is that specific value, then the value of nnpfc_input_pic_output_flag[i], which is output picture generation information, must be a value predefined based on the picture index i.For example, if the purpose of NNPFC does not include picture rate upsampling, for instance, if nnpfc_purpose&0x08 is 0, then the output picture generation information, nnpfc_input_pic_output_flag[i], should be a specific value (e.g., 1) for a certain range of i. For example, if the purpose of NNPFC does not include picture rate upsampling, for instance, if nnpfc_purpose&0x08 is 0, then the output picture generation information, nnpfc_input_pic_output_flag[i], may be determined to be a specific value (e.g., 1) for at least one i within a certain range of i. Here, for example, the information i used to identify a picture may be a value within the range of 0 to a value related to the number of input pictures (e.g., nnpfc_num_input_pics_minus1).

[0344] For example, the nnpfc_out_sub_c_flag syntax may be associated with outSubWidthC and outSubHeightC under certain conditions, as explained above, and a redundant explanation will be omitted.

[0345] For example, the nnpfc_out_colour_format_idc syntax is often related to the color format of NNPFC output, as explained above, and a redundant explanation will be omitted.

[0346] As an example, the nnpfc_pic_width_in_luma_samples syntax and the nnpfc_pic_height_in_luma_samples syntax can indicate the width and height of the luma sample array of the picture resulting from applying NNPF identified by nnpfc_id to the cropped, decoded output picture, respectively. This is as described above, and a redundant explanation will be omitted.

[0347] For example, the nnpfc_interpolated_pics[i] syntax may be information indicating the number of interpolated pictures generated by NNPF between the i-th picture and the (i+1)th picture used as input to NNPF. This is as described above, and a redundant explanation is omitted.

[0348] On the other hand, the variables numInputPics, which indicate the number of pictures input to NNPF, and numOutputPics, which indicate the total number of pictures output from NNPF, can be derived as follows.

[0349] [Table 22]

[0350] Example 3

[0351] Example 3 provides a detailed explanation of the example described in Table of Contents 3 above. The VSEI message syntax and semantics are described below.

[0352] As an example, the NNPFC (Neural-network post-filter characteristics) SEI message syntax and semantics may be modified as follows.

[0353] For example, an NNPFC SEI message may include information about the output picture, and this information may include information about whether or not an output picture corresponding to an input picture is generated, i.e., output picture generation information. For example, the output picture generation information may be nnpfc_input_pic_output_flag[i], where i is an index representing the picture. For example, nnpfc_input_pic_output_flag[i], which is the output picture generation information, can indicate whether or not a corresponding output picture is generated based on NNPF (neural-network post-filter) for the i-th input picture. On the other hand, if the value of nnpfc_input_pic_output_flag[i] is a first value (e.g., 1), it can indicate that NNPF will generate a corresponding output picture, and if the value of nnpfc_input_pic_output_flag[i] is a second value (e.g., 0), it can indicate that NNPF will not generate a corresponding output picture. On the other hand, the value of nnpfc_input_pic_output_flag[i], which is the output picture generation information, may be derived based on other information. For example, the value of nnpfc_input_pic_output_flag[i] may be derived based on the purpose of NNPFC, etc. Here, if the value of nnpfc_purpose, i.e., information about the purpose of NNPFC, is a specific value, and 0x08 is that specific value, then the value of nnpfc_input_pic_output_flag[i], which is output picture generation information, may be derived to a predefined value based on the picture index i. For example, the value of nnpfc_input_pic_output_flag[0] and the other values of nnpfc_input_pic_output_flag[i] may be derived to different values from each other.For example, if the purpose of NNPFC does not include picture rate upsampling, for instance, if nnpfc_purpose&0x08 is 0, the value of nnpfc_input_pic_output_flag[0], which is output picture generation information, may be inferred to a specific value (e.g., 1), and the remaining nnpfc_input_pic_output_flag[i] value may be inferred to another value (e.g., 0). For example, i, which is information for identifying a picture, may be a value within the range of 1 to a value corresponding to the number of input pictures (e.g., nnpfc_num_input_pics_minus1).

[0354] Example 4

[0355] Example 4 provides a detailed explanation of the example described in Table of Contents 4 above. The VSEI message syntax and semantics are described below.

[0356] As an example, the NNPFC (Neural-network post-filter characteristics) SEI message syntax and semantics may be modified as follows. The NNPFC SEI message syntax may be signaled as shown in the following table. According to this application, the signaling order of nnpfc_input_pic_output_flag, which is output picture generation information, may be changed.

[0357] [Table 23]

[0358] As an example, the nnpfc_purpose syntax, nnpfc_id syntax, nnpfc_property_present_flag syntax, nnpfc_base_flag syntax, nnpfc_num_input_pics_minus1 syntax, nnpfc_out_sub_c_flag syntax, nnpfc_out_colour_format_idc syntax, nnpfc_pic_width_in_luma_samples syntax, nnpfc_pic_height_in_luma_samples syntax, and nnpfc_interpolated_pics[i] syntax are as described above, and their redundant explanations will be omitted.

[0359] For example, an NNPFC SEI message may include information about the output picture, and this information may include information about whether or not an output picture corresponding to an input picture is generated, i.e., output picture generation information. For example, the output picture generation information may be nnpfc_input_pic_output_flag[i], where i is an index representing the picture. For example, nnpfc_input_pic_output_flag[i], which is the output picture generation information, can indicate whether or not a corresponding output picture is generated for the i-th input picture based on NNPF (neural-network post-filter). On the other hand, if the value of nnpfc_input_pic_output_flag[i] is a first value (e.g., 1), it can indicate that NNPF will generate a corresponding output picture, and if the value of nnpfc_input_pic_output_flag[i] is a second value (e.g., 0), it can indicate that NNPF will not generate a corresponding output picture. On the other hand, the output picture generation information may be included in the SEI message and signaled regardless of the purpose of NNPF, i.e., regardless of the purpose of NNPF. However, it may be signaled based on information about the number of input pictures (e.g., nnpfc_num_input_pics_minus1). On the other hand, for example, the value of nnpfc_input_pic_output_flag[i], which is output picture generation information, may be specified to a specific value based on other information. For example, the value of nnpfc_input_pic_output_flag[i] may be specified to a specific value based on the purpose of NNPFC, etc. Here, if the value of nnpfc_purpose, i.e., information about the purpose of NNPFC, is a specific value, and 0x08 is that specific value, then the value of nnpfc_input_pic_output_flag[i], which is output picture generation information, should be a value predefined based on the picture index i.For example, if the purpose of NNPFC does not include picture rate upsampling, for instance, if nnpfc_purpose&0x08 is 0, then the value of nnpfc_input_pic_output_flag[0], which is output picture generation information, should be a specific value (e.g., 1). In other words, it may be limited to a specific value. On the other hand, as another example, if the value of nnpfc_purpose, i.e., information about the purpose of NNPFC, is a specific value, and 0x08 is that specific value, then the value of nnpfc_input_pic_output_flag[i], which is output picture generation information, may be inferred to be a predefined value based on the picture index i. For example, if the purpose of NNPFC does not include picture rate upsampling, for instance, if nnpfc_purpose&0x08 is 0, then the value of nnpfc_input_pic_output_flag[0], which is the output picture generation information, may be inferred to a specific value (e.g., 1). In other words, it may be induced to a specific value.

[0360] On the other hand, the variables numInputPics, which indicates the number of pictures input to NNPF, and numOutputPics, which indicates the total number of pictures output from NNPF, may be derived as shown in Table 22 above.

[0361] According to the embodiments described in this disclosure, in addition to solving the problems of the prior art described above, it is possible to reduce decoder errors and improve coding quality and efficiency by adjusting the signaling order of information and clarifying the semantics of information.

[0362] Examples of video decoding methods

[0363] The following describes video encoding and decoding methods according to various embodiments of the present invention. The video decoding method in Figure 6 may be performed by the video decoding device 200, and the video encoding method in Figure 7 may be performed by the video encoding device 100. Furthermore, the video decoding and encoding methods in Figures 6 and 7 may be performed based on the embodiments described above (including embodiments 1 to 4).

[0364] Figure 6 is a diagram illustrating a video decoding method performed by a video decoding device according to one embodiment of the present disclosure.

[0365] First, as an example, post-filter-based corresponding output picture information for an input picture may be obtained from the NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message (S610). Post-filter-based corresponding output picture information may refer to NNPF-related information. NNPF-related information is as explained in the table above.

[0366] Subsequently, based on the acquired corresponding output picture information, a corresponding output picture for the input picture may be acquired (S620). In this case, if the corresponding output picture information indicates that there is no corresponding output picture for the input picture, the corresponding output picture does not need to be acquired, and if the corresponding output picture information indicates that there is a corresponding output picture for the input picture, the corresponding output picture may be acquired. Here, the corresponding output picture information may include output picture generation information regarding whether or not a corresponding output picture has been generated by the post-filter for the input picture. The output picture generation information may include the nnpfc_input_pic_output_flag described above. On the other hand, the output picture generation information may be acquired based on specific conditions. On the other hand, as an example, the value of the output picture generation information may be restricted to a specific value based on specific conditions. However, as another example, the value of the output picture generation information can also be inferred to be a specific value based on specific conditions. For example, the specific condition may be associated with the purpose of the post-filter, or it may be associated with whether or not the purpose of the post-filter is picture rate upsampling. As an example, the specific condition may be associated with the picture index of the input picture. On the other hand, based on specific conditions, the value of the output picture generation information for at least one input picture within a specific range may be restricted to a specific value. For example, based on specific conditions, the value of the output picture generation information for at least one input picture within a specific range may be restricted to 1. On the other hand, the specific range is related to the number of input pictures, and information related to the number of input pictures may be signaled. On the other hand, as an example, if the picture index of an input picture is a specific value (e.g., 0), the value of the output picture generation information for the input picture may be restricted to a specific value (e.g., 1). On the other hand, as an example, the specific condition may be related to the number of input pictures. Also, as an example, based on the existence of a corresponding output picture for an input picture, the input picture may be replaced (in the picture stream) with the corresponding output picture.On the other hand, as an example, output picture generation information may be signaled regardless of whether the purpose of the post-filter includes picture rate upsampling or not.

[0367] Although not shown in the diagram, the picture may then be restored based on the corresponding output picture information.

[0368] On the other hand, since the video decoding method shown in Figure 6 is one embodiment of the present disclosure, it is obvious that certain steps may be changed, the order of the steps may be changed, or some steps may be added or deleted, and such modifications are also included in the present disclosure.

[0369] Examples of video encoding methods

[0370] Figure 7 is a diagram illustrating a video encoding method performed by a video encoding device according to one embodiment of the present disclosure.

[0371] First, post-filter-based corresponding output picture information for the input picture may be determined (S710). Post-filter-based corresponding output picture information may refer to NNPF-related information. NNPF-related information is as explained in the table above.

[0372] Subsequently, the corresponding output picture information may be signaled with an NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message (S720). In this case, the corresponding output picture information can indicate that there is no corresponding output picture for the input picture, or conversely, that there is a corresponding output picture for the input picture. Here, the corresponding output picture information may include output picture generation information regarding whether or not a corresponding output picture was generated by the post-filter for the input picture. The output picture generation information may include the nnpfc_input_pic_output_flag mentioned above. On the other hand, the output picture generation information may be signaled based on specific conditions. On the other hand, as an example, the value of the output picture generation information may be restricted to a specific value based on specific conditions. However, as another example, the value of the output picture generation information can also be inferred to be a specific value based on specific conditions. For example, the specific condition may be associated with the purpose of the post-filter, or it may be associated with whether or not the purpose of the post-filter is picture rate upsampling. As an example, the specific condition may be associated with the picture index of the input picture. On the other hand, based on specific conditions, the value of the output picture generation information for at least one input picture within a specific range may be restricted to a specific value. For example, based on specific conditions, the value of the output picture generation information for at least one input picture within a specific range may be restricted to 1. On the other hand, as an example, if the picture index of an input picture is a specific value (e.g., 0), the value of the output picture generation information for the input picture may be restricted to a specific value (e.g., 1). On the other hand, as an example, the specific condition may be associated with the number of input pictures. Also, as an example, based on the existence of a corresponding output picture for an input picture, the input picture may be replaced (in the picture stream) with the corresponding output picture.On the other hand, as an example, output picture generation information may be signaled regardless of whether the purpose of the post-filter includes picture rate upsampling or not.

[0373] Although not shown in the diagram, the picture may then be restored based on the corresponding output picture information.

[0374] Furthermore, as an example, a computer-readable medium recording a bitstream generated by a video encoding method may be provided, and a method for transmitting a bitstream generated by a video encoding method may also be provided.

[0375] On the other hand, since the video encoding method shown in Figure 7 is one embodiment of the present disclosure, it is obvious that certain steps may be changed, the order of the steps may be changed, or some steps may be added or deleted, and such modifications are also included in the present disclosure.

[0376] According to this invention, the meaning of information that may be included in VSEI can be clarified, thereby reducing decoder errors, enabling the representation of more accurate scenarios, and improving coding quality. Furthermore, according to this invention, coding efficiency can be improved by changing the signaling order of specific information.

[0377] Figure 8 illustrates a content streaming system to which the embodiments of this disclosure can be applied.

[0378] As shown in Figure 8, a content streaming system to which an embodiment of the present disclosure is applied may broadly include an encoding server, a streaming server, a web server, media storage, user equipment, and multimedia input devices.

[0379] The encoding server is responsible for compressing content input from multimedia input devices such as smartphones, cameras, and camcorders into digital data to generate a bitstream, and transmitting this bitstream to the streaming server. As another example, if a multimedia input device such as a smartphone, camera, or camcorder directly generates the bitstream, the encoding server may be omitted.

[0380] The bitstream may be generated by a video encoding method and / or video encoding apparatus to which an embodiment of the present disclosure is applied, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

[0381] The streaming server transmits multimedia data to user devices based on user requests via a web server, and the web server can act as an intermediary to inform users of available services. When a user requests a desired service from the web server, the web server transmits it to the streaming server, and the streaming server can transmit multimedia data to the user. In this case, the content streaming system may include a separate control server, in which case the control server can play a role in controlling commands and responses between the devices within the content streaming system.

[0382] The streaming server can receive content from media storage and / or encoding servers. For example, when receiving content from the encoding server, the content can be received in real time. In this case, in order to provide a smooth streaming service, the streaming server can store the bitstream for a certain period of time.

[0383] Examples of user devices include mobile phones, smartphones, laptop computers, digital broadcasting terminals, PDAs (personal digital assistants), PMPs (portable multimedia players), navigation systems, slate PCs, tablet PCs, ultrabooks, wearable devices (such as smartwatches, smart glasses, and HMDs), digital TVs, desktop computers, and digital signage.

[0384] Each server within the aforementioned content streaming system may be operated as a distributed server, in which case the data received by each server may be processed in a distributed manner.

[0385] The scope of this disclosure includes software or machine-executable instructions (e.g., operating systems, applications, firmware, programs, etc.) that enable the operation of various embodiments to be performed on a device or computer, and non-transitory computer-readable medium on which such software or instructions are stored and executable on a device or computer.

[0386] [Industrial applicability] The embodiments described herein can be used for encoding / decoding video.

[0387] [Claims when filing an international application] [Claim 1] A video decoding method performed by a video decoding device, The process involves obtaining post-filter-based corresponding output picture information for the input picture from the NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message, and The process includes the step of obtaining a corresponding output picture for the input picture based on the corresponding output picture information, A video decoding method wherein the corresponding output picture information includes output picture generation information indicating whether or not a corresponding output picture is generated for the input picture by a post-filter. [Claim 2] The video decoding method according to claim 1, wherein the value of the output picture generation information is restricted to a specific value based on specific conditions. [Claim 3] The video decoding method according to claim 1, wherein the aforementioned specific conditions are associated with the purpose of the post-filter. [Claim 4] The video decoding method according to claim 3, wherein the aforementioned specific condition is further associated with whether or not the purpose of the post-filter is picture rate upsampling. [Claim 5] The video decoding method according to claim 3, wherein the specified conditions are further associated with the picture index of the input picture. [Claim 6] The video decoding method according to claim 5, wherein, based on the specified conditions, the value of the output picture generation information for at least one input picture within a specified range is limited to 1. [Claim 7] The video decoding method according to claim 6, wherein the specific range is determined based on information regarding the number of input pictures signaled in the bitstream. [Claim 8] The video decoding method according to claim 5, wherein if the picture index of the input picture is 0, the value of the output picture generation information for the input picture is limited to 1. [Claim 9] The video decoding method according to claim 1, wherein the specified condition is associated with the number of input pictures. [Claim 10] The video decoding method according to claim 1, wherein the output picture generation information is signaled regardless of whether the purpose of the post-filter includes picture rate upsampling. [Claim 11] A video encoding method performed by a video encoding device, The step of determining post-filter-based corresponding output picture information for the input picture, The process includes the step of signaling the corresponding output picture information with an NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message, A video encoding method in which the corresponding output picture information includes output picture generation information indicating whether or not a corresponding output picture is generated for an input picture by a post-filter. [Claim 12] A computer-readable medium recording a bitstream generated by the video encoding method described in claim 11. [Claim 13] A method for transmitting a bitstream generated by a video encoding method, The aforementioned video encoding method is The step of determining post-filter-based corresponding output picture information for the input picture, The process includes the step of signaling the corresponding output picture information with an NNPFC (neural-network post-filter characteristics) SEI (supplemental enhancement information) message, The method wherein the corresponding output picture information includes output picture generation information indicating whether or not a corresponding output picture was generated for the input picture by a post-filter.

Claims

1. An image decoding device, memory, The system comprises at least one processor connected to the memory, The aforementioned at least one processor is Retrieve the NNPFC (Neural Network Post-Filter Characteristics) SEI (Supplemental Enhancement Information) message; The NNPFC SEI message comprises one or more picture filtering flags corresponding to one or more input pictures, and purpose information indicating the purpose of the post-filter. Based on the picture filtering flags corresponding to the input picture included in the one or more picture filtering flags, it is determined whether or not a corresponding output picture for the input picture is generated by the post-filter; and The value of the picture filtering flag being equal to 1 indicates that the post-filter generates the corresponding output picture for the input picture, and A value equal to 0 for the picture filtering flag indicates that the post-filter does not generate the corresponding output picture for the input picture. Based on the above decision, the post-filter is configured to generate the corresponding output picture with respect to the input picture; Based on the existence of multiple input pictures and the fact that the target information is not related to picture rate upsampling, at least one of the one or more picture filtering flags has a value of 1. Regardless of whether the objective information of the post-filter is related to the picture rate upsampling, the one or more picture filtering flags are included in the acquired NNPFC SEI message of the video decoding device.

2. The video decoding apparatus according to claim 1, wherein the number of one or more input pictures and the number of one or more picture filtering flags are determined based on numerical information included in the NNPFC SEI message.

3. The video decoding apparatus according to claim 1, wherein, based on the number of one or more input pictures being equal to 1, the one or more picture filtering flags include corresponding picture filtering flags having a value equal to 1.

4. An image encoding device, memory, The system comprises at least one processor connected to the memory, The aforementioned at least one processor is Generates NNPFC (Neural-Network Post-Filter Characteristics) SEI (Supplemental Enhancement Information) messages; and The NNPFC SEI message comprises one or more picture filtering flags corresponding to one or more input pictures, and purpose information indicating the purpose of the post-filter. The picture filtering flag is used to determine whether a corresponding output picture for an input picture is generated by the post-filter. The value of the picture filtering flag being equal to 1 indicates that the post-filter generates the corresponding output picture for the input picture, and A value equal to 0 for the picture filtering flag indicates that the post-filter does not generate the corresponding output picture for the input picture. It is configured to encode video information including the aforementioned NNPFC SEI message; Based on the existence of multiple input pictures and the fact that the target information is not related to picture rate upsampling, at least one of the one or more picture filtering flags has a value of 1. A video encoding device in which, regardless of whether the post-filter objective information is related to the picture rate upsampling, one or more picture filtering flags are included in the NNPFC SEI message.

5. The video encoding apparatus according to claim 4, wherein the NNPFC SEI message includes numerical information for determining the number of one or more input pictures and the number of one or more picture filtering flags.

6. The video encoding apparatus according to claim 5, wherein, based on the number of one or more input pictures being equal to 1, the one or more picture filtering flags include corresponding picture filtering flags having a value equal to 1.

7. A device for transmitting data relating to an image, At least one processor configured to acquire a bitstream, The system comprises a transmitter configured to transmit the bitstream, The bitstream is generated by an image encoding method. The aforementioned video encoding method is Steps to generate an NNPFC (Neural-Network Post-Filter Characteristics) SEI (Supplemental Enhancement Information) message; and The NNPFC SEI message comprises one or more picture filtering flags corresponding to one or more input pictures, and purpose information indicating the purpose of the post-filter. The picture filtering flag is used to determine whether a corresponding output picture for an input picture is generated by the post-filter. The value of the picture filtering flag being equal to 1 indicates that the post-filter generates the corresponding output picture for the input picture, and A value equal to 0 indicates that the post-filter does not produce the corresponding output picture for the input picture. The step of encoding video information including the NNPFC SEI message; Based on the existence of multiple input pictures and the fact that the target information is not related to picture rate upsampling, at least one of the one or more picture filtering flags has a value of 1. Regardless of whether the post-filter objective information relates to the picture rate upsampling, the one or more picture filtering flags are included in the NNPFC SEI message of the transmission device.