Cross-component sample adaptation offset
CCSAO enhances video encoding efficiency by applying cross-component sample adaptation to luminance and chroma components, optimizing compression and reducing bitrate.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
- Filing Date
- 2022-11-25
- Publication Date
- 2026-06-26
AI Technical Summary
Existing video encoding technologies face inefficiencies in encoding luminance and chroma components, leading to suboptimal compression and bitrate usage.
Implementing methods and apparatus that utilize cross-component sample adaptation (CCSAO) to enhance encoding efficiency by determining classifiers for luminance and chroma components, applying band offsets, and correcting samples based on collocated samples.
Improves encoding efficiency by optimizing the compression of luminance and chroma components, reducing bitrate requirements while maintaining video quality.
Smart Images

Figure 0007880965000079 
Figure 0007880965000080 
Figure 0007880965000081
Abstract
Description
[Technical Field]
[0001] This disclosure relates generally to video encoding and compression, and more specifically to methods and apparatus for improving the efficiency of both luminance encoding and chroma encoding. [Background technology]
[0002] Digital video is supported by a variety of electronic devices, including digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video game consoles, smartphones, video conferencing equipment, and video streaming devices. These electronic devices transmit and receive digital video data over communication networks, or otherwise communicate and / or store digital video data in storage devices. Due to the limited bandwidth capacity of communication networks and the limited memory resources of storage devices, video data may be compressed before transmission or storage according to one or more video coding standards using video coding. Video coding standards include, for example, Versatile Video Coding (VVC), Joint Exploration Test Model (JEM), High Efficiency Video Coding (HEVC / H.265), Advanced Video Coding (AVC / H.264), and Moving Picture Expert Group (MPEG) coding. AOMedia Video 1 (AV1) was developed as a successor to the preceding standard VP9. Audio Video Coding (AVS), which refers to digital audio and digital video compression standards, is another series of video compression standards. Video coding generally utilizes prediction techniques (inter-prediction, or intra-prediction, etc.) that take advantage of the redundancy inherent in video data. The goal of video coding is to compress video data into a format that uses a lower bitrate while avoiding or minimizing degradation of video quality. [Overview of the Initiative] [Problems that the invention aims to solve]
[0003] This disclosure describes implementation examples of methods and apparatus for improving the encoding efficiency of both luminance and chroma components, including encoding and decoding video data, and more specifically, improving encoding efficiency by searching for cross-component relationships between luminance and chroma components. [Means for solving the problem]
[0004] A video decoding method is provided according to a first aspect of this application. The method may include a decoder receiving a picture frame containing one or more components, where the one or more components may include a first component and a second component. The method may further include determining a classifier for each sample of the second component according to collocated samples of the first component, obtaining an indicator showing one or more bandNum segments of the one or more components in response to a determination of whether the classifier is a band classifier or a joint classifier including a band classifier, and determining one or more bandNum segments according to the indicator. The method may further include the decoder determining a band offset according to one or more bandNum segments and correcting each sample of the second component according to a sample offset including the band offset.
[0005] A second aspect of this application provides a video coding method, comprising an encoder capable of determining a classifier for each sample of a second component according to identically positioned samples of a first component, wherein a picture frame may contain one or more components, including the first and second components. Furthermore, in response to a determination of whether the classifier is a band classifier or a combined classifier including band classifiers, the encoder may predefine or signal an indicator in the bitstream, where the indicator indicates one or more bandNum segments of one or more components.
[0006] A third aspect of this application provides a video decoding device. This device may include one or more processors and a memory coupled to the one or more processors and configured to store instructions executable by the one or more processors. The one or more processors are configured to execute the method according to the first aspect when they execute an instruction.
[0007] A fourth aspect of this application provides a video encoding apparatus. This apparatus may include one or more processors and a memory coupled to the one or more processors and configured to store instructions executable by the one or more processors. The one or more processors are configured to execute the method according to the second aspect when they execute an instruction.
[0008] According to a fifth aspect of the present invention, a non-temporary computer-readable storage medium is provided which, when executed by one or more computer processors, stores computer-executable instructions that cause one or more computer processors to receive a bitstream and execute the method according to the first aspect based on the bitstream.
[0009] According to a sixth aspect of the present invention, a non-temporary computer-readable storage medium is provided which, when executed by one or more computer processors, causes one or more computer processors to perform the method according to the second aspect to transmit a bitstream.
[0010] Please understand that the above overview and the detailed descriptions below are merely illustrative and do not limit this disclosure.
[0011] The accompanying drawings incorporated herein and forming part of this specification illustrate embodiments consistent with the present disclosure and, together with the description, are used to explain the principles of the present disclosure. [Brief explanation of the drawing]
[0012] [Figure 1] This block diagram shows an exemplary system for encoding and decoding video blocks, based on several implementations of the present disclosure. [Figure 2] A block diagram showing exemplary video encoders based on several implementation examples of the present disclosure. [Figure 3] A block diagram showing exemplary video decoders based on several implementation examples of the present disclosure. [Figure 4A] This block diagram shows how a frame is recursively partitioned into multiple video blocks of different sizes and shapes, according to some implementation examples of the present disclosure. [Figure 4B] This block diagram shows how a frame is recursively partitioned into multiple video blocks of different sizes and shapes, according to some implementation examples of the present disclosure. [Figure 4C] This block diagram shows how a frame is recursively partitioned into multiple video blocks of different sizes and shapes, according to some implementation examples of the present disclosure. [Figure 4D] This block diagram shows how a frame is recursively partitioned into multiple video blocks of different sizes and shapes, according to some implementation examples of the present disclosure. [Figure 4E] This block diagram shows how a frame is recursively partitioned into multiple video blocks of different sizes and shapes, according to some implementation examples of the present disclosure. [Figure 4F] An intra-mode block defined in VVC. [Figure 4G]This is a block diagram showing multiple reference lines for intranet prediction. [Figure 5A] This block diagram shows four gradient patterns used in sample adaptive offset (SAO) in several implementation examples of this disclosure. [Figure 5B] This block diagram shows decoders for deblocking filters (DBFs) combined with the proposed SAO filtering SAOV and SAOH, based on several implementation examples of the present disclosure. [Figure 6] The block diagram shows that, in some implementation examples of this disclosure, both the proposed bilateral filter (BIF) and SAO use samples from the deblocking stage as input. [Figure 7] This block diagram shows the naming conventions for samples surrounding a central sample, based on several implementation examples of this disclosure. [Figure 8A] A block diagram showing a 5x5 diamond-shaped ALF filter applied to the chroma component in some implementation examples of the present disclosure. [Figure 8B] This is a block diagram showing a 7x7 diamond-shaped ALF filter applied to the luma component in some implementation examples of the present disclosure. [Figure 9A] This figure shows the Laplacian calculation using partial samples, based on several implementation examples of this disclosure. [Figure 9B] This figure shows the Laplacian calculation using partial samples, based on several implementation examples of this disclosure. [Figure 9C] This figure shows the Laplacian calculation using partial samples, based on several implementation examples of this disclosure. [Figure 9D] This figure shows the Laplacian calculation using partial samples, based on several implementation examples of this disclosure. [Figure 10A] This block diagram shows system-level diagrams of CC-ALF processes relating to SAO, Luma ALF, and Chroma ALF processes, based on several implementation examples of this disclosure. [Figure 10B]This figure shows that filtering in CC-ALF is achieved by applying a linear diamond-shaped filter to the luma channel in some implementation examples of this disclosure. [Figure 11] This figure shows the modified block classification at a virtual boundary, based on several implementation examples of the present disclosure. [Figure 12] This figure shows modified ALF filtering for luma components at virtual boundaries, based on several implementation examples of the present disclosure. [Figure 13A] This figure shows a CCSAO applied to a chroma sample using DBF Y as input, based on several implementation examples of this disclosure. [Figure 13B] This figure shows several implementation examples of the CCSAO applied to luma and chroma samples, using DBF Y / Cb / Cr as input. [Figure 13C] This figure shows several implementation examples of the CCSAO that function independently. [Figure 13D] This figure shows a recursively applied CCSAO in several implementation examples of the present disclosure. [Figure 13E] This figure shows the parallel application of SAO and BIF in several implementation examples of this disclosure. [Figure 13F] This figure shows how SAO can be replaced and applied in parallel to BIF by some implementation examples of this disclosure. [Figure 14] This figure shows that CCSAO can be applied in parallel with other coding tools, based on some implementation examples of this disclosure. [Figure 15A] This figure shows that the CCSAO is located after the SAO in some implementation examples of this disclosure. [Figure 15B] This figure shows a CCSAO that functions independently without CCALF, based on several implementation examples of this disclosure. [Figure 15C] This figure shows that CCSAO acts as a post-reconstruction filter in several implementation examples of this disclosure. [Figure 16]This figure shows CCSAO applied in parallel with CCALF, based on several implementation examples of this disclosure. [Figure 17] This figure shows that some implementation examples of this disclosure use an alternative Luma sample location as an alternative classifier, instead of the C0 classification. [Figure 18A] This figure shows different candidate shapes to which constraints may be applied, based on several implementation examples of the present disclosure. [Figure 18B] This figure shows different candidate shapes to which constraints may be applied, based on several implementation examples of the present disclosure. [Figure 18C] This figure shows different candidate shapes to which constraints may be applied, based on several implementation examples of the present disclosure. [Figure 18D] This figure shows different candidate shapes to which constraints may be applied, based on several implementation examples of the present disclosure. [Figure 18E] This figure shows different candidate shapes to which constraints may be applied, based on several implementation examples of the present disclosure. [Figure 18F] This figure shows different candidate shapes to which constraints may be applied, based on several implementation examples of the present disclosure. [Figure 18G] This figure shows different candidate shapes to which constraints may be applied, based on several implementation examples of the present disclosure. [Figure 19] This figure shows that, in some implementation examples of this disclosure, other cross-component chromatic samples at the same position and adjacent to each other can also be supplied to CCSAO classification in addition to Luma. [Figure 20A] This figure shows that, in some implementation examples of this disclosure, co-located luma sample values can be replaced with phase-corrected values by weighting adjacent luma samples. [Figure 20B] This figure shows that, in some implementation examples of this disclosure, co-located luma sample values can be replaced with phase-corrected values by weighting adjacent luma samples. [Figure 21A]This figure shows that, in some implementation examples of this disclosure, co-located luma sample values can be replaced with phase-corrected values by weighting adjacent luma samples. [Figure 21B] This figure shows that, in some implementation examples of this disclosure, co-located luma sample values can be replaced with phase-corrected values by weighting adjacent luma samples. [Figure 22A] This figure shows an example of classifying c using edge strength, based on several implementations of the present disclosure. [Figure 22B] This figure shows an example of classifying c using edge strength, based on several implementations of the present disclosure. [Figure 23A] This figure shows that CCSAO is not applied to the current chroma sample if either the same-location or adjacent chroma sample used for classification is outside the current picture, according to some implementation examples of this disclosure. [Figure 23B] This figure shows that CCSAO is not applied to the current chroma sample if either the same-location or adjacent chroma sample used for classification is outside the current picture, according to some implementation examples of this disclosure. [Figure 24A] This figure shows how, in some implementation examples of this disclosure, if either the same-location and adjacent Luma samples used for classification are outside the current picture, the lost samples are reused or mirror-padding is used to create samples for classification. [Figure 24B] This figure shows how, in some implementation examples of this disclosure, if either the same-location and adjacent Luma samples used for classification are outside the current picture, the lost samples are reused or mirror-padding is used to create samples for classification. [Figure 25] This figure shows that in some implementation examples of this disclosure, nine luma candidate CCSAOs can increase the number of additional luma line buffers by two in AVS. [Figure 26A] This figure shows that in a VVC, nine luma candidate CCSAOs can increase the luma line buffer by one additional line, based on some implementation examples of this disclosure. [Figure 26B] This figure shows that in some implementation examples of this disclosure, when identical and adjacent luma samples are used to classify the current luma sample, the selected chroma candidates span across VB and may require an additional chroma line buffer. [Figure 27A] This figure shows that in some implementation examples of this disclosure, in AVS and VVC, if any of the chroma candidate chroma sample spans across VB (is outside the current chroma sample VB), CCSAO is disabled for the chroma sample. [Figure 27B] This figure shows that in some implementation examples of this disclosure, in AVS and VVC, if any of the chroma candidate chroma sample spans across VB (is outside the current chroma sample VB), CCSAO is disabled for the chroma sample. [Figure 27C] This figure shows that in some implementation examples of this disclosure, in AVS and VVC, if any of the chroma candidate chroma sample spans across VB (is outside the current chroma sample VB), CCSAO is disabled for the chroma sample. [Figure 28A] This figure shows examples of virtual boundaries for C0 having nine Luma position candidates, based on several implementations of the present disclosure. [Figure 28B] This figure shows examples of virtual boundaries for C0 having nine Luma position candidates, based on several implementations of the present disclosure. [Figure 29A] This figure shows that in AVS and VVC, in some implementation examples of this disclosure, if any of the chroma candidate chroma sample spans across VB (is outside the current chroma sample VB), CCSAO is enabled for the chroma sample using iterative padding. [Figure 29B] This figure shows that in some implementation examples of this disclosure, if any of the chroma candidate chroma samples in AVS and VVC span across VB (are outside the current chroma sample VB), CCSAO is enabled for the chroma sample using iterative padding. [Figure 29C] This figure shows that in some implementation examples of this disclosure, if any of the chroma candidate chroma samples in AVS and VVC span across VB (are outside the current chroma sample VB), CCSAO is enabled for the chroma sample using iterative padding. [Figure 30A] This figure shows that in AVS and VVC, in some implementation examples of this disclosure, CCSAO is enabled for chroma samples using mirror padding if any of the chroma sample luma candidates span across VB (are outside the current chroma sample VB). [Figure 30B] This figure shows that in AVS and VVC, in some implementation examples of this disclosure, CCSAO is enabled for chroma samples using mirror padding if any of the chroma sample luma candidates span across VB (are outside the current chroma sample VB). [Figure 30C] This figure shows that in AVS and VVC, in some implementation examples of this disclosure, CCSAO is enabled for chroma samples using mirror padding if any of the chroma sample luma candidates span across VB (are outside the current chroma sample VB). [Figure 31A] This figure shows that in some implementation examples of this disclosure, in AVS and VVC, when one side is outside VB, CCSAO is enabled using bilateral symmetrical padding. [Figure 31B] This figure shows that in some implementation examples of this disclosure, in AVS and VVC, when one side is outside VB, CCSAO is enabled using bilateral symmetrical padding. [Figure 32A] This figure shows that iterative padding or mirror padding can be applied to Luma samples outside the virtual boundary, based on some implementation examples of this disclosure. [Figure 32B] This figure shows that iterative padding or mirror padding can be applied to Luma samples outside the virtual boundary, based on some implementation examples of this disclosure. [Figure 33A]This figure shows some implementation examples of the present disclosure, and the limitations applied to reduce the line buffer required by CCSAO and simplify boundary processing condition checks. [Figure 33B] This figure shows some implementation examples of the present disclosure, and the limitations applied to reduce the line buffer required by CCSAO and simplify boundary processing condition checks. [Figure 34] This figure shows that the CCSAO application area is not aligned with the CTB boundary in some implementation examples of this disclosure. [Figure 35] This figure shows that the CCSAO application area frame partition can be fixed in some implementation examples of this disclosure. [Figure 36] This figure shows that, in some implementation examples of this disclosure, the CCSAO application area partitions are dynamic and can be switched at the picture level. [Figure 37] This figure shows that when multiple classifiers are used in a single frame, as demonstrated by some implementation examples of this disclosure, the method of applying the classifier set index can be switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock level. [Figure 38] This figure shows that the CCSAO application area can be divided into BT / QT / TT levels from the frame / slice / CTB level, based on some implementation examples of this disclosure. [Figure 39] This figure shows a CCSAO classifier that takes into account the encoding information of the current component or cross-component, based on several implementation examples of this disclosure. [Figure 40A] This block diagram shows how the SAO classification method presented in this disclosure acts as a post-predictive filter, based on several implementation examples of this disclosure. [Figure 40B] In some implementation examples of this disclosure, the block shows that, with respect to the post-predicted SAO filter, each component can be used for classification of the current sample and adjacent samples. [Figure 40C]In some implementation examples of this disclosure, the block shows that, with respect to the post-predicted SAO filter, each component can be used for classification of the current sample and adjacent samples. [Figure 40D] In some implementation examples of this disclosure, the block shows that, with respect to the post-predicted SAO filter, each component can be used for classification of the current sample and adjacent samples. [Figure 41] This figure shows a computing environment coupled to a user interface, based on several implementation examples of the present disclosure. [Figure 42] This flowchart illustrates a video decoding method using several implementation examples of the disclosure. [Figure 43] This block diagram shows video encoding using several implementation examples of this disclosure. [Modes for carrying out the invention]
[0013] Next, specific implementation examples will be referenced in detail. These examples are shown in the attached drawings. In the following detailed description, many non-limiting specific details will be presented to aid in understanding the subject matter presented herein. However, it will be apparent to those skilled in the art that various alternatives are available without departing from the claims, and that the subject matter is implementable without these specific details. For example, it will be apparent to those skilled in the art that the subject matter presented herein can be implemented on many types of electronic devices having digital video capabilities.
[0014] The terms used in this disclosure are adopted solely for the purpose of describing specific embodiments and are not intended to limit this disclosure. The singular forms “A / an,” “said,” and “the” in this disclosure and the attached claims are intended to include the plural form unless otherwise explicitly indicated throughout this disclosure. Furthermore, the terms “and / or” used in this disclosure should be understood to refer to and include one of the multiple related matters listed, or any or all possible combinations thereof.
[0015] Throughout this specification, any reference to “one embodiment,” “embodiment,” “example,” “several embodiments,” “several examples,” or similar terms means that any particular feature, structure, or characteristic described is included in at least one embodiment or example. Features, structures, elements, or characteristics described in relation to one or more embodiments are applicable to other embodiments unless expressly otherwise specified.
[0016] Throughout this disclosure, terms such as “First,” “Second,” and “Third” are merely designations for the purposes of referring to relevant elements, such as apparatus, components, composition, steps, etc., and do not imply any spatial or temporal order unless otherwise specified. For example, “First Apparatus” and “Second Apparatus” may refer to two separately formed apparatuses, or two parts, components, or operating states of the same apparatus, and are arbitrarily named.
[0017] The terms “module,” “submodule,” “circuit,” “subcircuit,” “circuit configuration,” “subcircuit configuration,” “unit,” and “subunit” may include memory (shared memory, dedicated memory, memory clusters) that stores code or instructions executable by one or more processors. A module may include one or more circuits that may or may not contain stored code or instructions. A module or circuit may include one or more components that are directly or indirectly connected to each other. These components may or may not be physically attached to each other, or located adjacent to each other.
[0018] As used herein, the terms “if” and “if” may be understood, depending on the context, to mean “if” or “in response to.” Where these terms appear in a claim, they may not indicate that the relevant limitation or feature is conditional or optional. For example, a method may include the step of i) if condition X exists, or if condition X is present, then a function or action X' is performed; and ii) if condition Y exists, or if condition Y is present, then a function or action Y' is performed. The method may be implemented with the ability to perform both the function or action X' and the function or action Y'. Thus, functions X' and Y' may both be performed at different points in time in multiple executions of the method.
[0019] A unit or module may be implemented solely by software, solely by hardware, or by a combination of hardware and software. In a software-only implementation, for example, a unit or module may contain functionally related code blocks or software components that are directly or indirectly linked to each other in order to perform a specific function.
[0020] The first generation of AVS standards includes the Chinese national standards "Information Technology, Advanced Audio Video Coding, Part 2: Video" (known as AVS1) and "Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video" (known as AVS+). Compared to the MPEG-2® standard, it can save approximately 50% bitrate at the same perceived quality. The second generation of AVS standards includes the Chinese national standard "Information Technology, Efficient Multimedia Coding" (known as AVS2) series, which is mainly aimed at the transmission of extra HD television programs. The coding efficiency of AVS2 is twice that of AVS+. Meanwhile, the video part of the AVS2 standard was submitted by the Institute of Electrical and Electronics Engineers (IEEE) as an international standard for applications. The AVS3 standard is a new generation video encoding standard for UHD video applications that aims to surpass the encoding efficiency of the latest international standard, HEVC, offering approximately 30% bitrate savings compared to the HEVC standard. The AVS3-P2 baseline was completed at the 68th AVS Conference in March 2019, offering approximately 30% bitrate savings compared to the HEVC standard. Currently, a standard software called the High Performance Model (HPM) is maintained by the AVS Group to demonstrate the standard implementation of the AVS3 standard. Similar to HEVC, the AVS3 standard is built on a block-based hybrid video encoding framework.
[0021] Figure 1 is a block diagram illustrating an exemplary system 10 for parallel encoding and decoding of video blocks, according to some implementations of the present disclosure. As shown in Figure 1, the system 10 includes a source device 12 which generates and encodes video data that will later be decoded by a destination device 14. The source device 12 and destination device 14 can constitute any variety of electronic devices, including desktop or laptop computers, tablet computers, smartphones, set-top boxes, digital televisions, cameras, display devices, digital media players, video gaming consoles, video streaming devices, and the like. In some implementations, the source device 12 and destination device 14 are equipped with wireless communication capabilities.
[0022] In some implementations, the destination device 14 can receive encoded video data to be decoded via link 16. Link 16 can comprise any type of communication medium or device capable of moving the encoded video data from the source device 12 to the destination device 14. In one example, link 16 may include a communication medium that allows the source device 12 to directly transmit the encoded video data to the destination device 14 in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the destination device 14. The communication medium can include wireless or wired communication media, such as the radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network, such as a local area network, a wide area network, or a global network like the Internet. The communication medium may include routers, switches, base stations, or other equipment useful for facilitating communication from the source device 12 to the destination device 14.
[0023] In several other implementation examples, encoded video data can be transmitted from the output interface 22 to the storage device 32. The encoded video data in the storage device 32 can then be accessed by the destination device 14 via the input interface 28. The storage device 32 may include any of the various distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, digital multipurpose disc (DVD), read-only compact disc memory (CD-ROM), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In further examples, the storage device 32 may communicate with a file server or another intermediate storage device capable of holding the encoded video data generated by the source device 12. The destination device 14 can access the video data stored in the storage device 32 by streaming or downloading. The file server may be any type of computer capable of storing and transmitting encoded video data to the destination device 14. Exemplary file servers include web servers (for websites, etc.), File Transfer Protocol (FTP) servers, Network Attached Storage (NAS) devices, or local disk drives. The destination device 14 can access the encoded video data stored on the file server via any standard data connection, including a wireless channel (e.g., Wireless Fidelity (Wi-Fi)® connection), a wired connection (e.g., digital subscriber line (DSL), cable modem, etc.), or a combination of both, which is suitable for accessing the encoded video data stored on the file server. Transmission of the encoded video data from the storage device 32 may be via streaming transmission, download transmission, or a combination of both.
[0024] As shown in Figure 1, the source device 12 includes a video source 18, a video encoder 20, and an output interface 22. The video source 18 may include a video recording device such as a video camera, a video archive containing previously recorded video, a video supply interface for receiving video from a video content provider, and / or a computer graphics system for generating computer graphics data as source video or a combination of such sources. For example, if the video source 18 is a video camera in a security surveillance system, the source device 12 and destination device 14 may take the form of a camera phone or video phone. However, the implementation examples described in this application are applicable to video coding in general and to wireless and / or wired applications.
[0025] Captured, pre-captured, or computer-generated video can be encoded by the video encoder 20. The encoded video data can be transmitted directly to the destination device 14 via the output interface 22 of the source device. The encoded video data may also (or alternatively) be stored on the storage device 32 for later access by the destination device 14 or other devices for decoding and / or playback. The output interface 22 may further include a modem and / or transmitter.
[0026] The destination device 14 includes an input interface 28, a video decoder 30, and a display device 34. The input interface 28 includes a receiver and / or modem and is capable of receiving encoded video data through link 16. The encoded video data communicated via link 16 or provided to the storage device 32 may include various syntactic elements generated by the video encoder 20 for use by the video decoder when decoding the video data. Such syntactic elements may be included in the encoded video data transmitted over the communication medium and stored on the storage medium or on a file server.
[0027] In some implementations, the destination device 14 may include a display device 34, which may be an integrated display device or an external display device configured to communicate with the destination device 14. The display device 34 displays the decoded video data to the user and may include any of the various display devices, such as a liquid crystal display (LCD), plasma display, organic light-emitting diode (OLED) display, or another type of display device.
[0028] The video encoder 20 and video decoder 30 can operate in accordance with proprietary or industry standards such as VVC, HEVC, MPEG-4, Part 10, AVC, AVS, or extensions of such standards. It should be understood that this application is not limited to any specific video encoding / decoding standard and may be applicable to other video encoding / decoding standards. In general, the video encoder 20 of the source device 12 is intended to be configured to encode video data in accordance with any current or future standard. Similarly, the video decoder 30 of the destination device 14 is also intended to be able to decode video data in accordance with any of these current and future standards.
[0029] The video encoder 20 and video decoder 30 can each be implemented as one or more suitable encoder and / or decoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. If partially implemented in software, the electronic device can store instructions for its software in a suitable non-temporary computer-readable medium and execute those instructions in hardware using one or more processors to perform the video encoding / decoding operations described herein. Each of the video encoder 20 and video decoder 30 may be contained within one or more encoders or decoders, and any of these may be integrated as part of a composite encoder / decoder (CODEC) within each device.
[0030] Figure 2 is a block diagram showing an exemplary video encoder 20 according to several implementation examples described in this application. The video encoder 20 can perform intra-frame predictive coding and inter-frame predictive coding of video blocks within a video frame. Intra-frame predictive coding relies on spatial prediction to reduce or eliminate spatial redundancy of video data within a given video frame or picture. Inter-frame predictive coding relies on temporal prediction to reduce or eliminate temporal redundancy in video data within adjacent video frames or pictures in a video sequence. Note that the term "frame" may be used as a synonym for "image" or "picture" in the field of video coding.
[0031] As shown in Figure 2, the video encoder 20 includes a video data memory 40, a prediction processing unit 41, a decoding picture buffer (DPB) 64, an adder 50, a transformation processing unit 52, a quantization unit 54, and an entropy coding unit 56. The prediction processing unit 41 further includes a motion estimation unit 42, a motion compensation unit 44, a partition unit 45, an intra-prediction processing unit 46, and an intra-block duplication (BC) unit 48. In some implementation examples, the video encoder 20 also includes an inverse quantization unit 58, an inverse transformation processing unit 60, and an adder 62 for video block reconstruction. An in-loop filter 63, such as a deblocking filter, can be placed between the adder 62 and the DPB 64 to filter block boundaries and remove block noise artifacts from the reconstructed video. Another in-loop filter, such as a sample-adaptive offset (SAO) filter and / or an adaptive in-loop filter (ALF), can also be used in addition to the deblocking filter to filter the output of the adder 62. In some examples, the in-loop filter may be omitted, and the decoded video blocks may be directly provided to the DPB64 by the adder 62. The video encoder 20 can take the form of a fixed or programmable hardware unit, or it can be distributed among one or more illustrated fixed or programmable hardware units.
[0032] The video data memory 40 is capable of storing video data encoded by the components of the video encoder 20. The video data in the video data memory 40 can be obtained from, for example, the video source 18 shown in Figure 1. The DPB 64 is a buffer that stores reference video data (e.g., reference frames or pictures) for use in encoding the video data by the video encoder 20 (e.g., in intra or inter predictive coding mode). The video data memory 40 and DPB 64 can be formed by any of the various memory devices. In various examples, the video data memory 40 may be on-chip with the other components of the video encoder 20 or off-chip with respect to those components.
[0033] As shown in Figure 2, after receiving video data, a partitioning unit 45 within the prediction processing unit 41 partitions the video data into video blocks. This partitioning may also involve dividing the video frame into slices, tiles (e.g., sets of video blocks), or other larger coding units (CUs) according to a predefined partitioning structure, such as a quad-tree (QT) structure, associated with the video data. A video frame is or can be considered as a two-dimensional array or matrix of samples having sample values. Samples in the array may also be called pixels or pels. The number of samples in the horizontal or vertical direction (or axis) of the array or picture defines the size and / or resolution of the video frame. A video frame can be divided into multiple video blocks, for example, using QT partitioning. A video block is also or can be considered as a two-dimensional array or matrix of samples having sample values, but with fewer dimensions than a video frame. The number of samples in the horizontal and vertical direction (or axis) of a video block defines the size of the video block. A video block can be further divided into one or more block sections, or subblocks (which can also form blocks), by repeatedly using, for example, QT sections, binary-tree (BT) sections, or triple-tree (TT) sections, or any combination thereof. Note that as used herein, the terms “block” or “video block” refer to a portion of a frame or picture, specifically a rectangular (square or non-square) portion.For example, according to HEVC and VVC, a block or video block is or can correspond to a coding tree unit (CTU), CU, prediction unit (PU), or transform unit (TU), and / or a corresponding block, such as a coding tree block (CTB), coding block (CB), prediction block (PB), or transform block (TB), and / or can correspond to a subblock.
[0034] Based on the error results (e.g., coding rate and distortion level), the prediction processing unit 41 can select one of several possible prediction coding modes for the current video block, such as one of several intra-predictive coding modes or one of several inter-predictive coding modes. The prediction processing unit 41 can provide the obtained intra- or inter-predictive coding block to the adder 50 to generate a residual block, and also to the adder 62 to reconstruct the coded block for later use as part of a reference frame. The prediction processing unit 41 also provides syntactic elements such as motion vectors, intra-mode indicators, and partition information, as well as other such syntactic information, to the entropy coding unit 56.
[0035] To select an appropriate intra-predictive coding mode for the current video block, the intra-predictive processing unit 46 within the prediction processing unit 41 can provide spatial predictions by performing intra-predictive coding of the current video block for one or more adjacent blocks in the same frame as the current block being coded. The motion estimation unit 42 and motion compensation unit 44 within the prediction processing unit 41 provide temporal predictions by performing inter-predictive coding of the current video block for one or more prediction blocks in one or more reference frames. The video encoder 20 can perform multiple coding passes to select an appropriate coding mode for each block of video data, for example.
[0036] In some implementations, the motion estimation unit 42 determines the inter-prediction mode for the current video frame by generating motion vectors that indicate the displacement of the video block in the current video frame relative to the predicted block in the reference video frame, according to a predetermined pattern in the sequence of video frames. The motion estimation performed by the motion estimation unit 42 is a process of generating motion vectors that estimate the motion of the video block. For example, the motion vectors may indicate the displacement of the video block in the current video frame or picture relative to the predicted block in the reference frame related to the current block being encoded in the current frame. The predetermined pattern allows the video frames in the sequence to be designated as P-frames or B-frames. The intra-BC unit 48 may determine vectors for intra-BC coding, such as block vectors, in a similar manner to how the motion estimation unit 42 determines motion vectors for inter-prediction, or it may determine block vectors using the motion estimation unit 42.
[0037] The prediction block for a video block is a block or reference block of a reference frame that is considered to closely match the video block to be encoded in terms of pixel difference, or may correspond to such a block. Here, the pixel difference can be determined by the Sum of Absolute Difference (SAD), Sum of Square Difference (SSD), or other difference metrics. In some implementations, the video encoder 20 may calculate values for fractional pixel positions of the reference frame stored in the DPB64. For example, the video encoder 20 can interpolate values for quarter-pixel, eighth-pixel, or other fractional pixel positions of the reference frame. Thus, the motion estimation unit 42 can perform motion searches for full and fractional pixel positions and output motion vectors with fractional pixel precision.
[0038] The motion estimation unit 42 calculates a motion vector for the video block in the interpredictive coded frame by comparing the position of the video block with the position of the predicted block in a reference frame selected from a first reference frame list (list 0) or a second reference frame list (list 1), each of which identifies one or more reference frames stored in the DPB64. The motion estimation unit 42 transmits the calculated motion vector to the motion compensation unit 44, and then to the entropy coding unit 56.
[0039] Motion compensation performed by the motion compensation unit 44 may include fetching or generating prediction blocks based on motion vectors determined by the motion estimation unit 42. Upon receiving motion vectors for the current video block, the motion compensation unit 44 may identify the location of the prediction block indicated by the motion vectors in one of the reference frame lists, retrieve the prediction block from the DPB 64, and transfer it to the adder 50. The adder 50 then subtracts the pixel values of the prediction block provided by the motion compensation unit 44 from the pixel values of the current video block to be encoded to form a residual video block of pixel difference values. The pixel difference values forming the residual video block may include difference components of either luminance (lumen) or chroma (color difference), or both. The motion compensation unit 44 may also generate relevant syntactic elements of the video block of the video frame for use by the video decoder 30 to decode the video block of the video frame. Syntactic elements may include, for example, syntactic elements that define motion vectors used to identify prediction blocks, optional flags indicating prediction modes, or other arbitrary syntactic information as described herein. Note that while the motion estimation unit 42 and the motion compensation unit 44 are highly integrable, they are shown separately for conceptual purposes.
[0040] In some implementations, the intraBC unit 48 can generate vectors and fetch predicted blocks in a similar manner to those described above in relation to the motion estimation unit 42 and the motion compensation unit 44, but the predicted blocks are in the same frame as the current block being encoded, and the vectors are called block vectors, as opposed to motion vectors. Specifically, the intraBC unit 48 can determine which intraprediction mode to use for encoding the current block. In some examples, the intraBC unit 48 can encode the current block using various intraprediction modes, for example, between separate encoding passes, and test their performance by rate-distortion analysis. The intraBC unit 48 can then select an appropriate intraprediction mode for use from among the various tested intraprediction modes and generate an intramode indicator accordingly. For example, the intraBC unit 48 can calculate rate-distortion values using rate-distortion analysis for various tested intraprediction modes and select the intraprediction mode with the best rate-distortion characteristics among the tested modes as the appropriate intraprediction mode for use. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and the original, unencoded block encoded to generate the encoded block, as well as the bit rate (i.e., number of bits) used to generate the encoded block. The intraBC unit 48 can calculate ratios from distortion and rate for various encoded blocks to determine which intraprediction mode exhibits the best rate-distortion value for that block.
[0041] In other examples, the intra-BC unit 48 may use all or part of the motion estimation unit 42 and the motion compensation unit 44 to perform such functions for intra-BC prediction as described herein. In any case, with respect to intra-block replication, the predicted block may be a block that is considered to closely match the encoded block in terms of pixel differences which can be determined by SAD, SSD, or other difference metrics, and the identification of the predicted block may include the calculation of fractional pixel position values.
[0042] Regardless of whether the predicted block originates from the same frame through intra-prediction or from another frame through inter-prediction, the video encoder 20 can form a residual video block by subtracting the pixel values of the predicted block from the pixel values of the current video block being encoded to form a pixel difference value. The pixel difference value forming the residual video block may include differences in both luma and chroma components.
[0043] As described above, the intra-prediction processing unit 46 can intra-predict the current video block as an alternative to inter-prediction performed by the motion estimation unit 42 and the motion compensation unit 44, or to intra-block replication prediction performed by the intra-BC unit 48. Specifically, the intra-prediction processing unit 46 can determine the intra-prediction mode to use for encoding the current block. To this end, the intra-prediction processing unit 46 can encode the current block using various intra-prediction modes, for example, between separate encoding passes, and the intra-prediction processing unit 46 (or a mode selection unit in some examples) can select and use an appropriate intra-prediction mode from the tested intra-prediction modes. The intra-prediction processing unit 46 can provide the entropy encoding unit 56 with information indicating the selected intra-prediction mode for that block. The entropy encoding unit 56 can encode the information indicating the selected intra-prediction mode into the bitstream.
[0044] After the prediction processing unit 41 determines the prediction block for the current video block via interpretation or intraprediction, the adder 50 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be contained in one or more TUs and provided to the transformation processing unit 52. The transformation processing unit 52 transforms the residual video data into residual transformation coefficients using a transformation such as a discrete cosine transform (DCT) or a conceptually similar transformation.
[0045] The conversion processing unit 52 can transmit the obtained conversion coefficients to the quantization unit 54. The quantization unit 54 quantizes the conversion coefficients to further reduce the bit rate. The quantization process can also reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some examples, the quantization unit 54 can then perform a scan of the matrix containing the quantized conversion coefficients. Alternatively, the entropy coding unit 56 may perform the scan.
[0046] Following quantization, the entropy coding unit 56 entropy-codes the quantized transformation coefficients into a video bitstream using, for example, Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), Syntax-based context-adaptive Binary Arithmetic Coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or another entropy coding method or technique. The coded bitstream is then transmitted to the video decoder 30 as shown in Figure 1, or can be stored in the storage device 32 as shown in Figure 1 for later transmission to or retrieval by the video decoder 30. The entropy coding unit 56 can also entropy-code the motion vector and other syntactic elements for the current video frame being coded.
[0047] The inverse quantization unit 58 and the inverse transformation processing unit 60 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual video blocks in the pixel domain to generate reference blocks for other video block predictions. As described above, the motion compensation unit 44 can generate motion-compensated prediction blocks from one or more reference blocks of frames stored in the DPB64. The motion compensation unit 44 can also apply one or more interpolation filters to the prediction blocks to calculate decimal pixel values for use in motion estimation.
[0048] The adder 62 generates a reference block for storage in the DPB 64, in addition to the reconstructed residual block, motion-compensated prediction block generated by the motion compensation unit 44. The reference block can then be used by the intraBC unit 48, the motion estimation unit 42, and the motion compensation unit 44 as a prediction block for interpreting another video block in a subsequent video frame.
[0049] Figure 3 is a block diagram showing an exemplary video decoder 30 according to several implementation examples of the present application. The video decoder 30 includes a video data memory 79, an entropy decoding unit 80, a prediction processing unit 81, an inverse quantization unit 86, an inverse transformation processing unit 88, an adder 90, and a DPB 92. The prediction processing unit 81 further includes a motion compensation unit 82, an intra-prediction unit 84, and an intra-BC unit 85. The video decoder 30 is capable of performing a decoding process that is largely the reverse of the encoding process described above with respect to the video encoder 20 in relation to Figure 2. For example, the motion compensation unit 82 can generate prediction data based on motion vectors received from the entropy decoding unit 80, and the intra-prediction unit 84 can generate prediction data based on an intra-prediction mode indicator received from the entropy decoding unit 80.
[0050] In some examples, a unit of the video decoder 30 may be given the task of performing an implementation of the present application. Also, in some examples, the implementation of the present disclosure may be distributed among one or more units of the video decoder 30. For example, the intraBC unit 85 may perform an implementation of the present application alone or in combination with other units of the video decoder 30, such as the motion compensation unit 82, the intraprediction unit 84, and the entropy decoding unit 80. In some examples, the video decoder 30 may not include the intraBC unit 85, and the functions of the intraBC unit 85 may be performed by other components of the predictive processing unit 81, such as the motion compensation unit 82.
[0051] The video data memory 79 can store video data, such as an encoded video bitstream, which is decoded by other components of the video decoder 30. The video data stored in the video data memory 79 can be obtained, for example, from a storage device 32, from a local video source such as a camera, via wired or wireless network communication of video data, or by accessing a physical data storage medium (e.g., a flash drive or hard disk). The video data memory 79 may include a coded picture buffer (CPB) that stores the encoded video data from the encoded video bitstream. The DPB 92 of the video decoder 30 stores reference video data for use by the video decoder 30 when decoding the video data (e.g., in intra or inter-predictive coding mode). The video data memory 79 and DPB 92 may be formed by any of various memory devices, such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM®), or other types of memory devices. In Figure 3, for illustrative purposes, the video data memory 79 and DPB 92 are depicted as two separate components of the video decoder 30. However, it will be apparent to those skilled in the art that the video data memory 79 and DPB 92 may be provided by the same memory device or by separate memory devices. In some examples, the video data memory 79 may be on-chip with the other components of the video decoder 30 or off-chip with respect to those components.
[0052] During the decoding process, the video decoder 30 receives an encoded video bitstream representing the video blocks and associated syntactic elements of the encoded video frame. The video decoder 30 can receive syntactic elements at the video frame level and / or video block level. The entropy decoding unit 80 of the video decoder 30 entropy-decodes the bitstream to generate quantized coefficients, motion vectors or intra-predictive mode indicators, and other syntactic elements. The entropy decoding unit 80 then transfers the motion vectors or intra-predictive mode indicators and other syntactic elements to the prediction processing unit 81.
[0053] When a video frame is encoded as an intra-predictive coding (I) frame, or encoded for an intra-coded prediction block in another type of frame, the intra-predictive processing unit 84 of the prediction processing unit 81 can generate prediction data for the video block of the current video frame based on the signaled intra-predictive mode and reference data from previously decoded blocks of the current frame.
[0054] When a video frame is encoded as an interpredictive coded (i.e., B or P) frame, the motion compensation unit 82 of the prediction processing unit 81 generates one or more prediction blocks for the video blocks of the current video frame based on the motion vector and other syntactic elements received from the entropy decoding unit 80. Each prediction block can be generated from a reference frame in one of the reference frame lists. The video decoder 30 can construct the reference frame lists, List0 and List1, using a default construction technique based on the reference frames stored in the DPB92.
[0055] In some examples, when a video block is encoded according to the intraBC mode described herein, the intraBC unit 85 of the prediction processing unit 81 generates a prediction block for the current video block based on the block vector and other syntactic elements received from the entropy decoding unit 80. The prediction block may be within the same picture reconstruction region as the current video block defined by the video encoder 20.
[0056] The motion compensation unit 82 and / or intra-BC unit 85 determine prediction information for the video blocks of the current video frame by analyzing motion vectors and other syntactic elements, and then use the prediction information to generate prediction blocks for the current video blocks to be decoded. For example, the motion compensation unit 82 uses some of the received syntactic elements to determine the prediction mode to use to encode the video blocks of the video frame (e.g., intra-predict or inter-predict), the inter-predict frame type (e.g., B or P), construction information for one or more of the frame's reference frame list, motion vectors for each inter-predict coded video block of the frame, the inter-predict state for each inter-predict coded video block of the frame, and other information for decoding the video blocks in the current video frame.
[0057] Similarly, the intraBC unit 85 can use some of the received syntactic elements, such as flags, to determine that the current video block was predicted using intraBC mode, construction information indicating which video blocks in the frame are in the reconstruction area and should be stored in the DPB 92, block vectors for each intraBC predicted video block in the frame, intraBC prediction states for each intraBC predicted video block in the frame, and other information for decoding the video blocks in the current video frame.
[0058] The motion compensation unit 82 can also perform interpolation using an interpolation filter, similar to the one used by the video encoder 20 during video block encoding, and can calculate interpolated values for the decimal pixels of the reference block. In this case, the motion compensation unit 82 can determine the interpolation filter used by the video encoder 20 from the received syntactic elements and use that interpolation filter to generate a predicted block.
[0059] The inverse quantization unit 86 provides the bitstream and inversely quantizes the quantization transformation coefficients, which have been entropy-decoded by the entropy decoding unit 80, for each video block in the video frame using the same quantization parameters calculated by the video encoder 20, thereby determining the degree of quantization. The inverse transformation processing unit 88 applies an inverse transformation, such as an inverse DCT, inverse integer transformation, or a conceptually similar inverse transformation process, to the transformation coefficients in order to reconstruct the residual blocks in the pixel domain.
[0060] After the motion compensation unit 82 or intraBC unit 85 generates a predicted block for the current video block based on vectors and other syntactic elements, the adder 90 reconstructs the decoded video block for the current video block by adding the residual block from the inverse processing unit 88 with the corresponding predicted block generated by the motion compensation unit 82 and intraBC unit 85. An in-loop filter 91, such as a deblocking filter, SAO filter, and / or ALF, may be placed between the adder 90 and the DPB 92 to further process the decoded video block. The in-loop filter 91 may be applied to the reconstructed CU before the reconstructed CU is stored in the reference picture storage device. In some examples, the in-loop filter 91 may be omitted, and the decoded video block may be provided directly to the DPB 92 by the adder 90. The decoded video block within a given frame is then stored in the DPB 92. The DPB 92 stores the reference frame used for subsequent motion compensation of the next video block. A memory device other than the DPB92 may also store the decoded video for later display on a display device such as the display device 34 in Figure 1.
[0061] In a typical video encoding process, a video sequence usually contains an ordered set of frames or pictures. Each frame can contain three sample sequences, denoted as SL, SCb, and SCr. SL is a two-dimensional sequence of luma samples. SCb is a two-dimensional sequence of Cb chroma samples. SCr is a two-dimensional sequence of Cr chroma samples. In other examples, a frame may be monochrome and therefore contain only one two-dimensional sequence of luma samples.
[0062] Like HEVC, the AVS3 standard is built on a block-based hybrid video coding framework. The input video signal is processed block by block (called coding units (CUs)). Unlike HEVC, which partitions blocks based solely on quadtrees, in AVS3, a single coding tree unit (CTU) is partitioned into CUs based on quadtree / binary / extended quadtree structures to adapt to varying local characteristics. Furthermore, the concept of multiple partition unit types in HEVC is eliminated; that is, there is no separation of CUs, prediction units (PUs), and transformation units (TUs) in AVS3. Instead, each CU is always used as the base unit for both prediction and transformation without further partitioning. In the AVS3 tree partition structure, one CTU is first partitioned based on a quadtree structure. Then, each quadtree leaf node can be further partitioned based on binary and extended quadtree structures.
[0063] As shown in Figure 4A, the video encoder 20 (or more specifically, the partitioning unit 45) generates an encoded representation of a frame by first partitioning the frame into a set of CTUs. A video frame can contain an integer number of CTUs that are sequentially ordered from left to right and top to bottom in raster scan order. Each CTU is the largest logical encoding unit, and the width and height of the CTU are signaled by the video encoder 20 in a sequence parameter set so that all CTUs in the video sequence have the same size, which is one of 128×128, 64×64, 32×32, and 16×16. However, it should be noted that this application is not necessarily limited to a specific size. As shown in Figure 4B, each CTU may comprise one CTB of chroma samples, two corresponding encoding tree blocks of chroma samples, and syntactic elements used to encode the samples of the encoding tree blocks. The syntactic elements describe the characteristics of different types of units in the pixel coding block and how the video sequence can be reconstructed by the video decoder 30, including interpretation or intraprediction, intraprediction mode, motion vector, and other parameters. For a monochrome picture or a picture having three separate color planes, the CTU may comprise a single coding tree block and syntactic elements used to encode samples of this coding tree block. The coding tree block may be an N×N block of samples.
[0064] To achieve better performance, the video encoder 20 can recursively perform tree partitioning on the encoding tree block of the CTU, such as binary, ternary, quadtree partitioning, or a combination thereof, to divide the CTU into smaller CUs. As shown in Figure 4C, the 64x64 CTU400 is first divided into four smaller CUs, each with a block size of 32x32. Of the four smaller CUs, CU410 and CU420 are each divided into four CUs with a block size of 16x16. The two 16x16 CUs, CU430 and CU440, are further divided into four CUs with a block size of 8x8. Figure 4D shows a quadtree data structure representing the final result of the partitioning process of the CTU400 shown in Figure 4C, where each leaf node of the quadtree corresponds to one CU, each with a size ranging from 32x32 to 8x8. Similar to the CTU shown in Figure 4B, each CU may comprise a CB of chroma samples, two corresponding encoded blocks of chroma samples in frames of the same size, and syntactic elements used to encode the samples in that encoded block. For a monochrome picture or a picture with three separate color planes, a CU may comprise a single encoded block and syntactic structures used to encode the samples in this encoded block. Note that the quadtree partitions shown in Figures 4C and 4D are for illustrative purposes only, and a single CTU may be partitioned into multiple CUs to accommodate local characteristics that vary based on quadtree / ternary / binary partitions. In multiple types of tree structures, a single CTU may be partitioned by a quadtree structure, and each quadtree leaf CU may be further partitioned by binary and ternary structures. As shown in Figure 4E, there are five possible partition types for an encoded block with width W and height H: 4 partitions, 2 horizontal partitions, 2 vertical partitions, 3 horizontal partitions, and 3 vertical partitions. AVS3 offers five possible partition types: 4-partition, 2-part horizontal, 2-part vertical, horizontally extended quadtree partition, and vertically extended quadtree partition.
[0065] In some implementations, the video encoder 20 can further partition the coded block of the CU into one or more M×N PBs. A PB is a rectangular (square or non-square) block of samples to which the same interpretation or intraprediction is applied. The PU of the CU may comprise a PB for lumens samples, two corresponding PBs for chromens samples, and syntactic elements used to predict these PBs. For a monochrome picture, or a picture with three separate color planes, the PU may comprise a single PB and a syntactic structure used to predict this PB. For each lumens, Cb, and Cr PB of the PU of the CU, the video encoder 20 can generate lumens, Cb, and Cr prediction blocks.
[0066] The video encoder 20 can generate prediction blocks for the PU using intra-prediction or inter-prediction. When the video encoder 20 generates prediction blocks for the PU using intra-prediction, the video encoder 20 can generate prediction blocks for the PU based on the decoded samples of the frames associated with this PU. When the video encoder 20 generates prediction blocks for the PU using inter-prediction, the video encoder 20 can generate prediction blocks for the PU based on the decoded samples of one or more frames other than the frames associated with this PU.
[0067] After the video encoder 20 generates predicted Luma, Cb, and Cr blocks for one or more PUs of the CU, the video encoder 20 can generate a Luma residual block for the CU by subtracting the predicted Luma block of the CU from its original Luma coded block, where each sample in the Luma residual block of the CU represents the difference between a Luma sample in one of the predicted Luma blocks of the CU and the corresponding sample in the original Luma coded block of the CU. Similarly, the video encoder 20 can generate Cb residual blocks and Cr residual blocks for the CU, where each sample in the Cb residual block of the CU represents the difference between a Cb sample in one of the predicted Cb blocks of the CU and the corresponding sample in the original Cb coded block of the CU, and each sample in the Cr residual block of the CU represents the difference between a Cr sample in one of the predicted Cr blocks of the CU and the corresponding sample in the original Cr coded block of the CU.
[0068] Furthermore, as shown in Figure 4C, the video encoder 20 can use quadtree partitions to decompose the luma, Cb, and Cr residual blocks of the CU into one or more luma, Cb, and Cr transformation blocks, respectively. A transformation block is a rectangular (square or non-square) block of samples to which the same transformation is applied. The TU of the CU may comprise a transformation block for luma samples, two corresponding transformation blocks for chroma samples, and syntactic elements used to transform the transformation block samples. Thus, each TU of the CU can be associated with a luma transformation block, a Cb transformation block, and a Cr transformation block. In some examples, a luma transformation block associated with a TU may be a subblock of the luma residual block of the CU. A Cb transformation block may be a subblock of the Cb residual block of the CU. A Cr transformation block may be a subblock of the Cr residual block of the CU. For a monochrome picture, or a picture with three separate color planes, the TU may comprise a single transformation block and syntactic structures used to transform the samples of the transformation block.
[0069] The video encoder 20 can generate a Luma coefficient block for a TU by applying one or more transformations to the Luma transformation block of the TU. The coefficient block can be a two-dimensional array of transformation coefficients. The transformation coefficients may be scalar quantities. The video encoder 20 can generate a Cb coefficient block for a TU by applying one or more transformations to the Cb transformation block of the TU. The video encoder 20 can generate a Cr coefficient block for a TU by applying one or more transformations to the Cr transformation block of the TU.
[0070] After generating a coefficient block (e.g., a Luma coefficient block, a Cb coefficient block, or a Cr coefficient block), the video encoder 20 can quantize the coefficient block. Quantization generally refers to the process of quantizing transformation coefficients to reduce the amount of data used to represent the transformation coefficients as much as possible, thereby achieving further compression. After the video encoder 20 has quantized the coefficient block, the video encoder 20 can entropy encode the syntactic elements representing the quantized transformation coefficients. For example, the video encoder 20 can perform CABAC on the syntactic elements representing the quantized transformation coefficients. Finally, the video encoder 20 can output a bitstream containing a sequence of bits that forms a display of the encoded frame and associated data. This bitstream is stored in the storage device 32 or transmitted to the destination device 14.
[0071] After receiving the bitstream generated by the video encoder 20, the video decoder 30 can parse the bitstream and obtain syntactic elements from it. The video decoder 30 can reconstruct frames of video data based at least partially on the syntactic elements obtained from the bitstream. The process of reconstructing video data is generally the reverse of the encoding process performed by the video encoder 20. For example, the video decoder 30 can reconstruct residual blocks associated with the TU of the current CU by performing an inverse transform on the coefficient blocks associated with the TU of the current CU. The video decoder 30 also reconstructs the encoded blocks of the current CU by adding samples of the prediction blocks for the PU of the current CU to the corresponding samples of the transform blocks for the TU of the current CU. After reconstructing the encoded blocks for each CU of the frame, the video decoder 30 can reconstruct this frame.
[0072] As mentioned above, video coding primarily uses two modes to achieve video compression: intra-frame prediction (or intra prediction) and inter-frame prediction (or inter prediction). Note that IBC can be considered either intra-frame prediction or a third mode. Of the two modes, inter-frame prediction contributes more significantly to coding efficiency than intra-frame prediction because it utilizes motion vectors from a reference video block to predict the current video block.
[0073] However, as video data capture technology improves and video block sizes become more refined to preserve details in video data, the amount of data required to represent the motion vector of the current frame also increases significantly. One way to overcome this challenge is to utilize the fact that adjacent CU groups not only have similar video data for prediction purposes in both the spatial and temporal domains, but the motion vectors between these adjacent CUs are also similar. Therefore, motion information of spatially adjacent CUs, and / or co-located CUs, can be used as an approximation of the motion information (e.g., motion vector) of the current CU by searching for their spatial and temporal correlations. This is also referred to as a "motion vector predictor (MVP)" for the current CU.
[0074] Regarding Figure 2, instead of encoding the actual motion vector for the current CU, determined by the motion estimation unit 42 described above, into the video bitstream, the motion vector predictor for the current CU is subtracted from the actual motion vector for the current CU to generate a Motion Vector Difference (MVD) for the current CU. This eliminates the need to encode the motion vector for each CU in a frame, determined by the motion estimation unit 42, into the video bitstream, significantly reducing the amount of data used to represent motion information within the video bitstream.
[0075] Similar to the process of selecting a predicted block in a reference frame when predicting an interframe of an encoded block, both the video encoder 20 and the video decoder 30 must adopt a set of rules to construct a list of motion vector candidates (also known as a "merge list") for the current CU using potential motion vector candidates for spatially adjacent CUs and / or CUs at the same temporal position as the current CU, and then select one member from the motion vector candidate list as a single motion vector for the current CU. This eliminates the need to transmit the motion vector candidate list itself from the video encoder 20 to the video decoder 30, and the index of the selected motion vector predictor in the motion vector candidate list is sufficient for both the video encoder 20 and the video decoder 30 to use the same motion vector predictor in the motion vector candidate list for encoding and decoding the current CU.
[0076] Generally, the basic intra-prediction methods applied to VVC remain largely the same as those used in HEVC, with the exception of several extended, added, and / or improved prediction tools, such as wide-angle intra-mode extended intra-prediction, multiple reference line (MRL) intra-prediction, position-dependent intra-prediction combination (PDPC), intra-sub-partition (ISP) prediction, cross-component linear model (CCLM) prediction, and matrix-weighted intra-prediction (MIP).
[0077] Similar to HEVC, VVC predicts the sample of the current CU using a set of reference samples adjacent to the current CU (i.e., above or to the left of the current CU). However, to capture the finer edge orientation present in natural video (especially high-resolution video content such as 4K), the number of angular intra-modes is expanded from 33 in HEVC to 93 in VVC. Figure 4F is a block diagram showing the intra-modes defined in VVC. As shown in Figure 4F, of the 93 angular intra-modes, modes 2-66 are conventional angular intra-modes, while modes -1-14 and modes 67-80 are wide-angle intra-modes. In addition to angular intra-modes, the planar mode (mode 0 in Figure 1) and direct current (DC) mode (mode 1 in Figure 1) from HEVC are also applied to VVC.
[0078] As shown in Figure 4E, since VVC applies a quadtree / binary / ternary tree partition structure, VVC intraprediction includes rectangular video blocks in addition to square video blocks. Because the width and height of a given video block are unequal, various sets of angle intramodes can be selected from 93 angle intramodes for different block shapes. More specifically, for both square and rectangular video blocks, in addition to the planar mode and DC mode, 65 of the 93 angle intramodes are supported for each block shape. If the rectangular block shape of a video block satisfies certain conditions, the index of the wide-angle intramode of the video block can be adaptively determined by the video decoder 30 according to the index of the conventional angle intramode received from the video encoder 20, using the mapping relationship shown in Table 1-0 below. That is, for non-square blocks, the wide-angle intramode is signaled by the video encoder 20 using the index of the conventional angle intramode. This is mapped to the wide-angle intra-mode index by the video decoder 30 after analysis, and the total number of intra-modes (i.e., 67) (i.e., planar mode, DC mode, and 65 of the 93 angular intra-modes) remains unchanged, and the intra-predictive mode coding scheme remains unchanged. As a result, efficient signaling of intra-predictive modes is achieved while providing a consistent design across different block sizes.
[0079] Table 1-0 shows the mapping relationship between the index for the conventional angular intra-mode and the index for the wide-angle intra-mode for intra-predictions of different block shapes in VCC. Here, W represents the width of the video block and H represents the height of the video block. [Table 1]
[0080] Similar to HEVC's intra-prediction, all VVC intra-modes (i.e., planar, DC, and angular intra-modes) utilize the set of reference samples above and to the left of the current video block for intra-prediction. However, unlike HEVC, where only the nearest row / column of the reference sample (i.e., the 0th line 201 in Figure 4G) is used, VVC introduces MRL intra-prediction. This means that in addition to the nearest row / column of the reference sample, two additional rows / columns of the reference sample (i.e., the 1st line 203 and the 3rd line 205 in Figure 4G) are available for intra-prediction. The index of the selected row / column of the reference sample is signaled from the video encoder 20 to the video decoder 30. If a non-nearest row / column of the reference sample (i.e., the 1st line 203 and the 3rd line 205 in Figure 4G) is selected, the planar mode is excluded from the set of intra-modes that can be used for predicting the current video block. MRL intra-prediction is disabled for the first column / row of the video block within the current CTU, preventing the use of extended reference samples outside the current CTU.
[0081] Sample Adaptive Offset (SAO)
[0082] Sample Adaptive Offset (SAO) is a process that modifies decoded samples by conditionally adding an offset value to each sample based on the values in a lookup table transmitted by the encoder after the application of a deblocking filter. SAO filtering is performed on a region-by-region basis, based on the filtering type selected for each CTB by the syntactic element sao-type-idx. A value of 0 for sao-type-idx indicates that no SAO filter is applied to the CTB, while values of 1 and 2 signal the use of the band offset and edge offset filtering types, respectively. In the band offset mode, specified by sao-type-idx being 1, the selected offset value depends directly on the sample amplitude. In this mode, the entire amplitude range of the sample is uniformly divided into 32 segments called bands, and sample values belonging to four of these bands (consecutive within the 32 bands) are modified by adding a transmitted value, indicated as the band offset. This value can be positive or negative. The primary reason for using four consecutive bands is that in smoothing regions where banding artifacts can appear, the sample amplitude of the CTB tends to be concentrated in only a few of these bands. Furthermore, the design choice to use four offsets is unified with the edge offset operating mode, which also uses four offset values. In the edge offset mode, specified by sao-type-idx being 2, the syntactic element sao-eo-class, which has values from 0 to 3, signals whether horizontal, vertical, or two diagonal gradient directions are used for edge offset classification in the CTB.
[0083] Figure 5A is a block diagram showing four gradient patterns used in SAO in several implementation examples of this disclosure. The four gradient patterns 502, 504, 506, and 508 are for their respective sao-eo-class in edge offset mode. The sample labeled "p" indicates the central sample to be considered. The two samples labeled "n0" and "n1" specify two adjacent samples along the gradient patterns of (a) horizontal (sao-eo-class=0), (b) vertical (sao-eo-class=1), (c) 135° diagonal (sao-eo-class=2), and (d) 45° (sao-eo-class=3). Each sample in CTB is classified into one of five EdgeIdx categories by comparing the sample value p located at a certain position with the values n0 and n1 of two adjacent samples located at adjacent positions, as shown in Figure 5A. This classification is performed for each sample based on the decoded sample value, and therefore no additional signaling is required for EdgeIdx classification. Depending on the EdgeIdx category at the sample location, an offset value from the transmitted lookup table is added to the sample value for EdgeIdx categories 1 through 4. The offset value is always positive for categories 1 and 2, and always negative for categories 3 and 4. Therefore, the filter generally has a smoothing effect in edge offset mode. Table 1-1 below shows the sample EdgeIdx categories for SAO edge classes. [Table 2]
[0084] For SAO types 1 and 2, a total of four amplitude offset values are transmitted to the decoder for each CTB. In type 1, the code is also encoded. The offset values and associated syntactic elements such as sao-type-idx and sao-eo-class are determined by the encoder, typically using criteria that optimize rate-distortion performance. SAO parameters can be indicated to be inherited from the left or upper CTB using merge flags to make signaling more efficient. In summary, SAO is a nonlinear filtering operation that allows for further refinement of the reconstructed signal, enhancing the signal representation both in smoothed areas and around edges.
[0085] Pre-Sample Adaptive Offset (Pre-SAO)
[0086] In some examples, pre-sample adaptive offset (Pre-SAO) is implemented. The low complexity of Pre-SAO coding performance makes it promising for the development of future video coding standards. In some examples, Pre-SAO is applied only to luma component samples, using luma samples for classification. Pre-SAO works by applying two SAO-like filtering operations called SAOV and SAOH, which are applied along with a deblocking filter (DBF) before applying the existing (legacy) SAO. The first SAO-like filter, SAOV, works to apply SAO to the input picture Y2 after a deblocking filter for vertical edges (DBFV) has been applied.
number
[0087] Here, T is a predetermined positive constant, and d1 and d2 are f(i) = Y1(i) - Y2(i) This is an offset coefficient associated with two classes based on the difference in samples between Y1(i) and Y2(i), given by [the formula].
[0088] The first class for d1 is given by making all sample positions i such that f(i)>T, and the second class for d2 is given by f(i)<-T. The offset coefficients d1 and d2 are calculated by the encoder to minimize the mean squared error between the SAOV output picture Y3 and the original picture X, similar to conventional SAO processing. After SAOV is applied, a second SAO-like filter, SAOH, operates to apply SAO to Y4 after SAOV application. Here, the classification is based on the sample difference between Y3(i) and Y4(i), which are the output pictures of the deblocking filter (DBFH) for horizontal edges, as shown in Figure 5B. A similar procedure to SAOV is applied to SAOH, but with respect to classification, using Y3(i)-Y4(i) instead of Y1(i)-Y2(i). The two offset coefficients, a predetermined threshold T, and enable flags for SAOH and SAOV, respectively, are signaled at the slice level. SAOH and SAOV are applied independently to the luma and the two chroma components.
[0089] In some cases, SAOH and SAOV operate only on picture samples affected by their respective deblocking (DBFV or DBFH). Therefore, unlike existing SAO processes, only a subset of all samples within a given spatial domain (picture, or CTU in the case of legacy SAO) is processed by Pre-SAO, resulting in a lower average increase in the decoder's computational load per picture sample (preliminary estimates suggest 2-3 comparisons and 2 additions per sample in the worst-case scenario). Pre-SAO requires only the samples used in the deblocking filter and does not need to store additional samples in the decoder.
[0090] Bilateral filter (BIF)
[0091] In some embodiments, a bilateral filter (BIF) is implemented to explore compression efficiencies beyond VVC. The BIF is performed in the loop filtering stage of the sample-adaptive offset (SAO). Both the bilateral filter (BIF) and the SAO use samples from deblocking as input. Each filter generates an offset for each sample, which is appended to the input sample and clipped before proceeding to ALF.
[0092] For details, see Output Sample I OUT It can be obtained as follows: I OUT =clip3(I C +ΔI BIF +ΔI SAO ) Here, I C This is the input sample from deblocking, and ΔI BIF is the offset from the bilateral filter, and ΔI SAO This is the offset from SAO.
[0093] In some embodiments, this implementation provides the possibility that the encoder enables or disables filtering at the CTU and slice levels. The encoder evaluates the rate-distortion optimization (RDO) cost to make a decision.
[0094] Table 1-2, which shows the Picture Parameter Set RBSP syntax, introduces the following syntactic elements in the PPS: [Table 3]
[0095] If pps_bilateral_filter_enabled_flag is 0, bilateral loop filtering is disabled for slices referencing PPS. If pps_bilateral_filter_flag is 1, bilateral loop filtering is enabled for slices referencing PPS.
[0096] `bilateral_filter_strength` specifies the strength value of the bilateral loop filter used in the bilateral transformation block filtering process. The value of `bilateral_filter_strength` must be in the range of 0 to 2, including both ends.
[0097] `bilateral_filter_qp_offset` specifies the offset used to derive the bilateral filter lookup table LUT(x) for slices referencing PPS. The value of `bilateral_filter_qp_offset` must be in the range of -12 to +12, including both ends.
[0098] The following syntactic elements are introduced in Table 1-3, which shows the slice header syntax, and in Table 1-4, which shows the coded tree unit syntax. [Table 4]
[0099] The meaning is as follows: If `slice_bilateral_filter_all_ctb_enabled_flag` is equal to 1, it indicates that bilateral filtering is enabled and applies to all CTBs in the current slice. If `slice_bilateral_filter_all_ctb_enabled_flag` is not present, it is presumed to be equal to 0.
[0100] If slice_bilateral_filter_enabled_flag is equal to 1, it indicates that bilateral filtering is enabled and can be applied to the current slice's CTB. If slice_bilateral_filter_enabled_flag does not exist, it is presumed to be equal to slice_bilateral_filter_all_ctb_enabled_flag.
[0101] When bilateral_filter_ctb_flag[xCtb >> CtbLog2SizeY][yCtb >> CtbLog2SizeY] is equal to 1, it specifies that a bilateral filter is applied to the luma coding tree block of the coding tree unit at the luma position (xCtb, yCtb). When bilateral_filter_ctb_flag[cIdx][xCtb >> CtbLog2SizeY][yCtb >> CtbLog2SizeY] is equal to 0, it specifies that a bilateral filter is not applied to the luma coding tree block of the coding tree unit at the luma position (xCtb, yCtb). If bilateral_filter_ctb_flag does not exist, it is presumed to be equal to (slice_bilateral_filter_all_ctb_enabled_flag & slice_bilateral_filter_enabled_flag).
[0102] In some examples, for the CTUs to be filtered, the filtering process proceeds as follows. At picture boundaries where samples cannot be obtained, the bilateral filter uses an extension function (sample repetition) to fill in the unavailable samples. For virtual boundaries, the operation is the same as in the case of SAO and no filtering occurs. When crossing a horizontal CTU boundary, the bilateral filter can access the same samples that SAO accesses. FIG. 7 is a block diagram showing the naming rules for the samples surrounding the central sample according to some implementations of the present disclosure. As an example, when the central sample I C is on the top line of the CTU, I NW , I A , I NE are read from the upper CTU in the same way as SAO, but I AA is padded so no extra line buffer is needed. The samples surrounding the central sample I C are denoted according to FIG. 7. Here, A, B, L, R represent up, down, left, right, and NW, NE, SW, SE represent north-west, etc. Similarly, AA represents up-up, BB represents down-down, etc. This diamond shape is for IAA , I BB , I LL , or I RR This differs from another method that uses square filter support without using [a specific method].
[0103] Each surrounding sample I A , I R For example, the value μ of the corresponding modifier. ΔIA , μΔ IR These contribute to the calculation as follows. Sample I on the right R Starting with the contribution from, the difference is calculated as follows: ΔI R =(|I R -I C |+4)>>3 Here, |·| represents the absolute value. For data that is not 10 bits, ΔI R =(|I R -I C |+2 n-6 )>>(n-7) This is used instead. For 8-bit data, n=8, for example. The resulting value is then clipped so that it is less than 16. sI R =min(15,ΔI R )
[0104] The modifier value is then calculated as follows:
number
[0105] These values can be stored using 6 bits per entry, resulting in 26 * 16 * 6 / 8 = 312 bytes, or 300 bytes if the first column is all zeros. μ ΔIL , μ ΔIA , and μ ΔIB The modifier value for is I L , I A , and I B Similarly, the diagonal sample I is calculated. NW , I NE , I SE , I SW , and I, which is two levels away AA , I BB , I RR , I LL For this, the calculation follows equations 2 and 3, but uses a value shifted by 1. For example, diagonal sample I SE Using this method,
number
[0106] The modifier values are summed up. m sum = μ ΔIA +μ ΔIB +μ ΔIL +μ ΔIR +μ ΔINW +μ ΔINE +μ ΔISW +μ ΔISE +μ ΔIAA +μ ΔIBB +μ ΔILL +μ ΔIRR
[0107] In some examples, μ with respect to the previous sample ΔIR -μ ΔIA It is equal to μ for the above sample. ΔIA -μ ΔIB And a similar symmetry is observed for diagonal modifier values and modifier values two steps away. This means that in hardware implementation, the six values μ ΔIR , μ ΔIB , μΔI SW , μ ΔISE , μ ΔIRR , and μ ΔIBB This means that the calculation is sufficient, and the remaining six values can be obtained from the previously calculated values.
[0108] Here the value m sum This is multiplied by c=1, 2, or 3. This can be done using a single adder and a logic AND gate as follows: c v =k1&(m sum <<1)+k2&m sum Here, & represents a logical AND, k1 is the most significant bit of the multiplier c, and k2 is the least significant bit. The value to be multiplied is obtained using the minimum block dimension D=min(width,height) shown in Table 1-5, which shows how to find the c parameter from the minimum block size D=min(width,height). [Table 5]
[0109] Ultimately, the bilateral filter osset ΔI BIF The following is calculated. For full-strength filtering, the following formula is used. ΔI BIF =(c v +16)>>5 On the other hand, the following equation is used for half-strength filtering. ΔI BIF =(c v +32)>>6
[0110] The general form for n-bit data is:
number
[0111] Adaptive Loop Filter (ALF)
[0112] In VVC, an adaptive loop filter (ALF) adapted to the block-based filter is applied. For the luma component, one of 25 filters is selected for each of the 4x4 blocks based on the direction and activity of the local gradient.
[0113] Two diamond filter shapes (as shown in Figures 8A and 8B) are used. A 7x7 diamond shape is applied to the luma component, and a 5x5 diamond shape is applied to the chroma component.
[0114] For the Luma component, each 4x4 block is classified into one of 25 classes. The classification index C is determined by its direction D and the quantization value of its activity. JPEG0007880965000010.jpg43 Based on this, the following can be derived:
number
[0115] JPEG0007880965000012.jpg9162
number
[0116] To reduce the complexity of block classification, a subsampled one-dimensional Laplacian calculation is applied. As shown in Figures 9A to 9D, the same subsampled positions are used for gradient calculations in all directions.
[0117] Next, the maximum and minimum values of the horizontal and vertical slopes D are set as follows:
number
number
[0118] To determine the value of directionality D, these values are compared with each other and with two thresholds t1 and t2: JPEG0007880965000016.jpg28160 Activity value A is calculated as follows:
number
[0119] The classification method does not apply to the chroma component of the picture.
[0120] Geometric transformation of filter coefficients and clipping values
[0121] Before filtering each 4x4 Luma block, geometric transformations such as rotation, diagonal, and vertical inversion are applied to the filter coefficients f(k,l) and the corresponding filter clipping values c(k,l), depending on the gradient value calculated for that block. This is equivalent to applying these transformations to samples in the region supported by the filter. This is the idea to make different blocks to which ALF is applied more similar by aligning their orientation.
[0122] Three geometric transformations are introduced, including diagonal, vertical inversion, and rotation: JPEG0007880965000019.jpg15112
[0123] Here, K is the filter size, 0≦k,l≦K-1 are the coefficient coordinates, (0,0) is the upper left corner, and (K-1,K-1) is the lower right corner. The transformation is applied to the filter coefficients f(k,l) and clipping value c(k,l) according to the gradient values calculated for that block. The relationship between the transformation and the four gradients in the four directions is summarized in Table 1-6 below. This shows the mapping of the calculated gradients and transformations for a single block. [Table 6]
[0124] Filtering process
[0125] On the decoder side, if ALF is enabled for CTB, each sample R(i,j) in CU is filtered, and the sample value R'(i,j) is obtained as shown below:
number
[0126] Cross-component adaptive loop filter (CC-ALF)
[0127] CC-ALF improves each chromatic component by using luma sample values to apply an adaptive linear filter to the luma channel, and then using the output of this filtering operation for chromatic improvement. Figure 10A provides a system-level diagram of the CC-ALF process relating to the SAO, luma ALF, and chromatic ALF processes.
[0128] Filtering in CC-ALF is achieved by applying a linear diamond filter (Figure 10B) to the chroma channels. One filter is used for each chroma channel, and this operation is expressed as follows:
number
[0129] As shown in FIG. 10B, the luma filter support is an area collocated with the current chroma sample after considering the spatial scaling factor between the luma plane and the chroma plane.
[0130] In the VVC reference software, the CC-ALF filter coefficients are calculated by minimizing the mean squared error of each chroma channel with respect to the original chroma content. To achieve this, the VTM algorithm uses a coefficient derivation process similar to that used for chroma ALF. Specifically, a correlation matrix is obtained, and the coefficients are calculated using a Cholesky decomposition solver to minimize the mean squared error metric. In filter design, up to eight CC-ALF filters can be designed and transmitted per picture. The resulting filters are then indicated in CTU units for each of the two chroma channels.
[0131] Further features of CC-ALF are as follows: · The design uses a 3×4 diamond shape with eight taps. · Seven filter coefficients are transmitted in the APS. · Each of the transmitted coefficients has a 6-bit dynamic range and is limited to a power-of-two value. · The eighth filter coefficient is obtained at the decoder such that the sum of the filter coefficients is zero. · The APS is referenceable in the slice header. · CC-ALF filter selection is controlled at the CTU level for each chroma component. · Boundary padding for horizontal virtual boundaries uses the same memory access pattern as luma ALF.
[0132] As an additional feature, the reference encoder can be configured to allow some basic subjective adjustments via a configuration file. When enabled, the VTM attenuates the application of CC-ALF in regions that are encoded with a high QP and are close to medium gray or contain a large amount of luma high frequencies. Algorithmically, this is achieved by disabling the application of CC-ALF in CTUs where any of the following conditions are true: · The value obtained by subtracting 1 from the slice QP value is less than or equal to the base QP value; · The number of chroma samples with a local contrast greater than (1<<(bitDepth-2))-1 exceeds the CTU height, where the local contrast is the difference between the maximum and minimum luma sample values within the filter support region; · More than a quarter of the chroma samples are between (1<<(bitDepth-1))-16 and (1<<(bitDepth-1))+16.
[0133] The aim of this feature is to somewhat ensure that CC-ALF does not amplify artifacts introduced into the decoding path at an early stage (this is due to the fact that the VTM does not currently explicitly optimize the chroma subjective quality). In alternative encoder implementations, it is expected that this feature will not be used or alternative strategies suitable for their encoding characteristics will be incorporated.
[0134] Filter Parameter Signaling
[0135] The ALF filter parameters are signaled in an Adaptive Parameter Set (APS). In one APS, up to 25 sets of luma filter coefficients and clipping value indices, and up to 8 sets of chroma filter coefficients and clipping value indices can be signaled. To reduce the bit overhead, filter coefficients of different classifications for the luma component can be merged. In the slice header, the APS index used for the current slice is signaled.
[0136] The clipping value index decoded from the APS allows the clipping value to be determined using a table of clipping values for both the luma and chroma components. These clipping values depend on the internal bit depth. More precisely, the clipping value is obtained by the following formula:
number
[0137] The slice header can signal up to seven APS indices to specify the Luma filter set used for the current slice. The filtering process is further controlled at the CTB level. A flag indicating whether the ALF is applied to the Luma CTB is always signaled. The Luma CTB can select a filter set from 16 fixed filter sets and filter sets from the APS. A filter set index is signaled to the Luma CTB to indicate which filter set is applied. The 16 fixed filter sets are predefined and hardcoded into both the encoder and decoder.
[0138] For chroma components, the APS index, which indicates the chroma filter set currently used in the slice, is signaled in the slice header. At the CTB level, if the APS has two or more chroma filter sets, the filter index is signaled for each chroma CTB.
[0139] The filter coefficients are quantized with a norm of 128. To limit the complexity of the multiplication, the coefficient values for non-center positions are set to -2. 7 ~2 7 Bitstream fitting is applied to ensure the range is -1 (including both ends). The coefficient for the center position is not signaled in the bitstream and is treated as equal to 128.
[0140] Virtual boundary filtering process for line buffer reduction
[0141] In VVC, modified block classification and filtering are employed for samples near the horizontal CTU boundary to reduce the amount of ALF line buffer required. For this purpose, as shown in Figure 11, the virtual boundary is defined as a line shifted by "N" samples from the horizontal CTU boundary, where N is equal to 4 for the luma component and 2 for the chroma component.
[0142] As shown in Figure 11, the modified block classification is applied to the Luma component. In the one-dimensional Laplacian gradient calculation for 4x4 blocks above the virtual boundary, only the samples above the virtual boundary are used. Similarly, in the one-dimensional Laplacian gradient calculation for 4x4 blocks below the virtual boundary, only the samples below the virtual boundary are used. The quantization of the activity value A is scaled accordingly to account for the reduced number of samples used in the one-dimensional Laplacian gradient calculation.
[0143] For the filtering process, symmetric padding operations at the virtual boundary are used for both the luma and chroma components. As shown in Figure 12, if a filtered sample is located below the virtual boundary, adjacent samples located above the virtual boundary are padded. Conversely, the corresponding sample on the opposite side is also padded symmetrically.
[0144] Unlike the symmetric padding method used for horizontal CTU boundaries, when cross-boundary filtering is disabled, simple padding is applied to slice, tile, and subpicture boundaries. Simple padding is also applied to picture boundaries. Padded samples are used for both classification and filtering. To compensate for extreme padding when filtering samples directly above or below a virtual boundary, the filtering intensity in these cases is reduced for both luma and chroma by increasing the right shift of the formula for obtaining the sample value R'(i,j) by 3.
[0145] For existing SAO designs in HEVC, VVC, AVS2, and AVS3 standards, the sample offset values for luminous Y, chromic Cb, and chromic Cr are determined independently. That is, for example, the offset of the current chromic sample is determined only by the value of the current chromic sample and the values of adjacent chromic samples, and luminous samples at the same position or adjacent luminous samples are not considered. However, luminous samples retain more detailed information of the original picture than chromic samples and can be helpful in determining the offset of the current chromic sample. Furthermore, since chromic samples usually lose high-frequency detail after color conversion from RGB to YCbCr, or after quantization and deblocking filtering, introducing luminous samples that retain high-frequency detail is helpful in chromic sample reconstruction for chromic offset determination. Therefore, further gains can be expected by exploring the correlation of cross-components, for example, by using cross-component sample adaptive offset (CCSAO) methods and systems. In some embodiments, the correlation here includes not only cross-component sample values but also picture / encoded information such as the expected / residual coding mode, transformation type, and quantization / deblocking / SAO / ALF parameters from the cross-components.
[0146] In another example, with respect to SAO, the luma sample offset is determined solely by the luma sample. However, luma samples with the same band offset (BO) classification can be further classified by their identical and adjacent chromatic samples, which can lead to a more efficient classification. SAO classification can be seen as a shortcut to compensate for sample differences between the original picture and the reconstructed picture. Therefore, an effective classification is desirable.
[0147] Cross-component sample adaptive offset (CCSAO)
[0148] Existing SAO designs in the HEVC, VVC, AVS2, and AVS3 standards are used as the basic SAO method in the following description. For those skilled in the field of video coding, the proposed cross-component method described in this disclosure is also applicable to other loop filter designs and other coding tools with similar design principles. For example, in the AVS3 standard, SAO is replaced by a coding tool called Enhanced Sample Adaptive Offset (ESAO), but the proposed CCSAO can be applied in parallel with ESAO. Another example of a CCSAO that can be applied in parallel is the Constrained Directional Enhancement Filter (CDEF) in the AV1 standard.
[0149] Figures 13A to 13F are diagrams of the proposed method. In Figure 13A, additional offsets for chroma Cb and Cr after SAO Cb and SAO Cr are determined using the luma samples after the luma deblocking filter (DBF Y). For example, the current chroma sample (1302) is first classified using the luma sample (1304) at the same position and the adjacent luma sample (1306), and the CCSAO offset of the corresponding class is added to the current chroma sample. In Figure 13B, CCSAO is applied to the luma samples and chroma samples, and DBF Y / Cb / Cr is used as the input. In Figure 13C, CCSAO can operate independently. In Figure 13D, CCSAO can be recursively applied (twice or N times) with the same offset or different offsets in the same codec stage, or can be repeatedly applied in different stages. In Figure 13E, CCSAO is applied in parallel with SAO and BIF. In Figure 13F, CCSAO replaces SAO and is applied in parallel with BIF.
[0150] Therefore, the current and adjacent luma samples, the luma samples at the same position and adjacent chroma samples (Cb and Cr) can be used for the classification of the current luma sample. Furthermore, for the classification of the current chroma sample (Cb or Cr), the information of the luma samples at the same position and adjacent, the cross-chroma samples at the same position and adjacent, and the current and adjacent chroma samples can be used.
[0151] Figure 14 shows that CCSAO can be applied in parallel with other coding tools, such as ESAO in the AVS standard and CDE in the AV1 standard. Figure 15A shows that the position of CCSAO can be after SAO, i.e., the position of CCALF in the VVC standard. In Figure 15B, CCSAO can operate independently without CCALF. In Figure 15C, CCSAO can function as a post-reconstruction filter, that is, it uses the reconstructed samples as input to classification and corrects luma / chroma samples before entering adjacent intra-prediction. Figure 16 shows that CCSAO can also be applied in parallel with CCALF. In Figure 16, the positions of CCALF and CCSAO can be swapped. Note that in Figures 13A to 16, or in other paragraphs of this disclosure, the SAO Y / Cb / Cr block can be replaced with ESAO Y / Cb / Cr (in AVS3) or CDEF (in AV1). Note that Y / Cb / Cr may be denoted as Y / U / V in the video coding field.
[0152] In some cases, when the video is in RGB format, the proposed CCSAO can also be applied by simply mapping the YUV notation in the following paragraph to GBR.
[0153] Please note that the figures in this disclosure are combinable with all examples referred to herein.
[0154] classification
[0155] Figures 13A–13F and 19 show the inputs for CCSAO classification. Figures 13A–13F and 19 also show that all identical and adjacent luma / chroma samples can be supplied for CCSAO classification. Note that the classifier newly proposed in this disclosure can also be useful for the original SAO classification, and therefore the classifier described in this disclosure can be useful not only for cross-component classification (e.g., classification of chroma using luma, or vice versa) but also for single-component classification (e.g., classification of luma using luma, or classification of chroma using chroma).
[0156] The classifier example (C0) uses the values of luma or chroma samples at the same position (Y0 in Figure 13A) (Y4 / U4 / V4 in Figures 13B-13C) for classification. If band_num is the number of bands obtained by equally dividing the dynamic range of the luma or chroma, and bit_depth is the bit depth of the sequence, then an example of the class index for the current chroma sample is: Class(C0)=(Y0*band_num)>>bit_depth This is the result.
[0157] Table 2-2 below shows some examples of band_num and bit_depth. Table 2-2 shows three classification examples where the number of bands differs for each classification example. The classification can take rounding into account. Class (C0)=((Y0 * band_num)+(1<<bit_depth))> >bit_depth Some examples of band_num and bit_depth are listed in Table 2-1 below. [Table 7]
[0158] In some cases, the classifier uses a different luma (or chroma) sample position for C0 classification. For example, as shown in Figure 17, adjacent Y7 is used for C0 classification instead of Y0. Different classifiers can be switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels. For example, in Figure 17, as shown in Table 2-2 below, Y0 is used for POC0, but Y7 is used for POC1. [Table 8]
[0159] Figures 18A to 18G show several examples of luma candidates with different shapes. As shown in Figures 18B to 18D, a constraint can be applied to the shape that the total number of candidates must be a power of 2. As shown in Figures 18A, 18C to 18E, a constraint can be applied to the shape that the number of luma candidates must be horizontally and vertically symmetric with respect to the chroma sample. The power of 2 constraint and the symmetry constraint can also be applied to the chroma candidates. In Figures 13B to 13C, the U / V section shows an example of a symmetry constraint.
[0160] In some examples, different color formats can have different classifier "constraints." For example, 420 uses luma / chroma candidate selection (selecting one candidate from a 3x3 shape) as shown in Figures 13B-13C, while 444 uses Figure 18F for luma and chroma candidate selection, and 422 uses Figure 18G for luma (two chroma samples share four luma candidates) and also uses Figure 18F for chroma candidates.
[0161] The C0 position and C0band_num can be combined and switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels. Different combinations may result in different classifiers, as shown in Table 2-3 below. [Table 9]
[0162] In some examples, the co-location luma sample value (Y0) is replaced by a value (Yp) by weighting the co-location and adjacent luma samples. Figures 20A and 20B show two examples. Different Yp values may result in different classifiers. Different Yp values can be applied to different chroma formats. For example, Yp in Figure 20A is used for 420, Yp in Figure 20B is used for 422, and Y0 is used for 444.
[0163] In some examples, another classifier example (C1) uses a comparison score [-8,8] between a luma sample (Y0) at the same location and eight adjacent luma samples, resulting in a total of 17 classes. Initial class (C1) = 0, loop through 8 adjacent Luma samples (Yi, i = 1 to 8) if Y0 > Yi Class += 1 else if Y0 <Yi Class-=1
[0164] In some examples, the C1 example is equivalent to the following function where the threshold th is 0. ClassIdx=Index2ClassTable(f(C,P1)+f(C,P2)+…+f(C,P8)) f(x,y)=1, if xy>th; f(x,y)=0, if xy=th; f(x,y)=-1, if xy <th
[0165] In some cases, similar to the C4 classifier, a threshold of 1 or more can be predefined (e.g., stored in the LUT), or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level to aid in difference classification (quantization).
[0166] In some examples, the variation (C1') counts only the comparison score [0,8], which yields eight classes. (C1,C1') is a classifier group, and the PH / SH level flag can be signaled to switch between C1 and C1'. Initial class (C1') = 0, loop through 8 adjacent Luma samples (Yi, i = 1 to 8) if Y0 > Yi Class += 1
[0167] In some examples, the variant (C1s) uses N adjacent samples from among M adjacent samples to count the comparison score. An M-bit bitmask can be signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock level to indicate which adjacent samples are selected to count the comparison score. Using Figure 13B as an example of a Luma classifier: eight adjacent Luma samples are candidates, and an 8-bit bitmask (01111110) is signaled at PH, indicating that six samples from Y1 to Y6 are selected, the comparison score is within [-6,6], and an offset of 13 is obtained. The selective classifier C1s gives the encoder more choices due to the trade-off between offset signaling overhead and classification granularity.
[0168] In some examples, similar to C1s, the variation (C1's) only counts the comparison score [0,+N], and in the previous bitmask 01111110 example, this gives that the comparison score is within [0,6], resulting in 7 offsets.
[0169] A general-purpose classifier can be obtained by combining different classifiers. For example, different classifiers are applied to different pictures (different POC values), as shown in Table 2-4 below. [Table 10]
[0170] In some examples, another classifier example (C2) uses the difference (Yn) between a luma sample at the same location and an adjacent luma sample. Figures 21A and 21B show examples of Yn, and when the bit depth is 10, its dynamic range is [-1024, 1023]. If C2 band_num is the number of bands obtained by equally dividing the Yn dynamic range, Class(C2)=(Yn+(1<<bit_depth)*band_num)> >(bit_depth+1) A general-purpose classifier can be obtained by combining C0 and C2. For example, as shown in Table 2-5 below. [Table 11]
[0171] In some examples, another classifier example (C3) uses a bitmask for classification as shown in Table 2-6. Table 2-6 shows an example of a classifier that uses a bitmask for classification (bitmask positions are underlined). A 10-bit bitmask is signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels to indicate the classifier. For example, bitmask 11 1100 0000 means that only the 4 bits of the MSB are used for classification for a given 10-bit luma sample value, resulting in a total of 16 classes. Another example bitmask 10 0100 0001 means that only 3 bits are used for classification, resulting in a total of 8 classes. The bitmask length (N) is fixed or switchable at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels. For example, in the case of a 10-bit sequence, a 4-bit bitmask 1110 signaled by PH in the picture uses the 3 MSB bits b9, b8, and b7 for classification. Another example is a 4-bit bitmask 0011, where the LSB bits b0 and b1 are used for classification. Bitmask classifiers can be applied to Luma classifiers or Chroma classifiers. Whether to use MSB or LSB in bitmask N may be fixed or switchable at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level.
[0172] In some examples, the Luma position and C3 bitmask can be combined and switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels. Different combinations can result in different classifiers.
[0173] In some cases, it is possible to limit the number of corresponding offsets by applying the bitmask limit "max number of 1s". For example, limiting the bitmask "max number of 1s" to 4 in SPS results in a maximum offset of 16 for the sequence. Bitmasks for different POCs may be different, but the "max number of 1s" must not exceed 4 (all classes must not exceed 16). The "max number of 1s" value can be signaled and switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels. [Table 12]
[0174] As shown in Figure 19, other cross-component chroma samples can also be supplied to the CCSAO classification. The classifier for cross-component chroma samples may be the same as the luma cross-component classifier, or it may have its own classifier as described in this disclosure. The two classifiers can be combined to form a combined classifier for classifying the current chroma sample. For example, a combined classifier combining a cross-component luma sample and a chroma sample yields a total of 16 classes, as shown in Table 2-7 below. Table 2-7 shows an example of a classifier using a combined classifier combining a cross-component luma sample and a chroma sample (bitmask positions are underlined). [Table 13]
[0175] All of the above-mentioned classifiers (C0, C1, C1', C2, C3) can be combined (joined). For example, see Table 2-8 below. Table 2-8 shows combinations of different classifiers. [Table 14]
[0176] In some examples, another classifier example (C4) uses the difference between the CCSAO input and the sample value to be compensated for for classification. For example, when CCSAO is applied at the ALF stage, the difference between the sample values of the current component before and after ALF is used for classification. Similar to the classifier, a threshold of 1 or more can be predefined (e.g., held in the LUT) or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level to help classify (quantize) the difference. The C4 classifier can be combined with C0 Y / U / V bandNum to form a joint classifier (e.g., the example POC1 shown in Table 2-9). Table 2-9 shows that the classifier example uses the difference between the CCSAO input value and the sample value to be compensated for for classification. [Table 15]
[0177] In some embodiments, the example classifier (C5) uses "encoded information" to aid in subblock classification because different encoding modes may introduce different distortion statistics into the reconstructed image. A CCSAO sample is classified by the sample's previous encoded information, and combinations of encoded information can form a classifier, for example, as shown in Table 2-10 below. Figure 39 shows another example of different stages of the encoded information for C5. Table 2-10 shows that a CCSAO sample is classified by the sample's previous encoded information, and combinations of encoded information can form a classifier. [Table 16]
[0178] In some examples, the classifier example (C6) uses YUV color conversion values for classification. For example, to classify the current Y component, one-to-one identical or adjacent Y / U / V samples are selected, color-converted to RGB, and the R value is quantized using the C3 bandNum to form the current Y component classifier.
[0179] In some examples, the example classifier (C7) can be considered a generalized version of C0 / C3 and C6. To determine the C0 / C3 bandNum classification of the current component, the same-position / current and adjacent samples of all three color components are used. For example, as shown in Figure 13B, the current U sample, the same-position and adjacent Y / V samples, and the current and adjacent U samples are used. This can be formulated as follows:
number
[0180] In some embodiments, an example of a special subset of C7 is that only Y / U / V samples located at the same position (1 / 1 / 1) or adjacent to each other can be used to derive an intermediate sample S. This can be considered a special case of C6 (color conversion using three components). S can then be further fed into a C0 / C3 bandnum classifier. classIdx=bandS=(S*bandNumS)>>BitDepth;
[0181] In some embodiments, C7, like the C0 / C3 bandNum classifier, can also be combined with other classifiers to generate a combined classifier. In some examples, C7 may not be the same as in later examples that jointly use identical and adjacent Y / U / V samples for classification (three-component combined bandNum classification of each Y / U / V component).
[0182] In some embodiments, one constraint: c ijThe sum = 1, for c ij To reduce the signaling overhead of and limit the value of S within the range of bit depth, it may be applied. For example, c00 = (1 - sum of other c ij ) is forced. Which c ij (in this example, c00) is forced (derived by other coefficients) can be predefined or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / sub - block / sample level.
[0183] In some embodiments, another classifier example (C8) uses the cross - component / current - component spatial activity information as a classifier. Similar to the above block activity classifier, one sample located at (k, l) can obtain the sample activity as follows. JPEG0007880965000035.jpg14126
[0184] JPEG0007880965000036.jpg26161 Here, the notation (BD - 6) or B is a predefined normalization term related to the bit depth.
[0185] In some embodiments, A is then further mapped to the range of [0, 4]:
Number
[0186] In some embodiments, another classifier example (C9) can use the cross - component / current - component spatial gradient information as a classifier. Similar to the above block gradient classifier, one sample located at (k, l) can obtain the sample gradient class as follows. (1) Calculate the gradient in the N direction (Laplacian or forward / backward). (2) Calculate the maximum and minimum values of the gradients for M grouped directions (M <= N). (3) Compare N values against each other and select m thresholds t1~t m Then calculate the direction D. (4) Apply a geometric transformation according to the magnitude of the relative gradient (optional).
[0187] For example, it is similar to the ALF block classifier, but the following is applied at the sample level for sample classification. (1) Calculate the gradient in four directions (Laplacian). (2) Calculate the maximum and minimum values of the gradients in the two grouped directions (H / V and D / A). (3) Compare N values and select two thresholds t1~t m Then calculate the direction D. (4) Apply geometric transformations according to the relative gradient magnitudes shown in Table 1-6.
[0188] In some cases, it is possible to generate a combined classifier by combining C8 and C9.
[0189] In some embodiments, an example of another classifier (C10) can use cross-component / current-component edge information for classifying the current-component. By extending the original SAO classifier, C10 can more effectively extract cross-component / current-component edge information as follows: (1) Select one direction and calculate two edge strengths, where one direction is formed by the current sample and two adjacent samples, and one edge strength is calculated by subtracting the current sample from one adjacent sample. (2) Each edge intensity is quantized into M segments by M-1 threshold Tis. (3) Classify the current component sample using M*M classes.
[0190] Figures 22A-22B show examples of using cross / current component edge information for current component classification according to several implementations of this disclosure. The current sample is represented by c, and two adjacent samples of the current / cross component are represented by a and b. In this example, (1) Select one diagonal direction from four candidate directions, and the difference (ca) and (cb) are two edge strengths in the range of -1023 to 1023 (for example, in the case of a 10b sequence). (2) Each edge strength is quantized into four segments by a common threshold [-T, 0, T]. (3) Classify the current component samples using 16 classes.
[0191] As shown in Figures 22A and 22B, one diagonal direction is selected, and the difference (ca) and (cb) are quantized into a 4x4 segment with a threshold [-T, 0, T], forming 16 edge segments. The position of (a, b) can be indicated by signaling two syntaxes, edgeDir and edgeStep.
[0192] In some examples, the directional patterns can be extended to 0 degrees, 45 degrees, 90 degrees, 135 degrees (45 degrees between directions), or 22.5 degrees between directions, or they can be a predefined set of directions, or they can be signaled at the SPS / APS / PPS / PH / SH / region(set) / CTU / CU / subblock / sample level.
[0193] In some examples, edge strength can be defined as (ba). This simplifies the calculation but compromises accuracy.
[0194] In some examples, M-1 thresholds can be predefined or signaled at the SPS / APS / PPS / PH / SH / region(set) / CTU / CU / subblock / sample levels.
[0195] In some examples, the M-1 thresholds may be different sets for edge strength calculations, e.g., different sets for (ca) and (cb). When different sets are used, the total number of classes may be different. For example, if [-T,0,T] is used for the calculation of (ca) and [-T,T] is used for the calculation of (cb), the total number of classes is 4*3.
[0196] In some cases, M-1 thresholds can reduce signaling overhead by using a "symmetric" property. For example, one could use a predefined pattern [-T,0,T] and avoid using [T0,T1,T2] which requires signaling of three thresholds. Another example is [-T,T].
[0197] In some examples, the threshold can only contain powers of 2, which not only effectively captures the edge intensity distribution but also reduces the complexity of the comparison (only N bits of the MSB need to be compared).
[0198] In some examples, the positions of a and b can be indicated by signaling two syntaxes: (1) edgeDir, which indicates the selected direction, and (2) edgeStep, which indicates the sample distance used to calculate the edge strength, as shown in Figures 22A and 22B.
[0199] In some examples, edgeDir / edgeStep may be predefined or signaled at the SPS / APS / PPS / PH / SH / region(set) / CTU / CU / subblock / sample levels.
[0200] In some cases, edgeDir / edgeStep may be encoded in other ways, such as fixed-length code (FLC), truncated unary (TU) code, exponential-golomb code of degree k (EGk), signed EG0 (SVLC), or unsigned EG0 (UVLC).
[0201] In some cases, C10 can be combined with bandNumY / U / V or other classifiers to form a combined classifier. For example, combining 16 edge intensities with up to 4 bandNumY bands yields 64 classes.
[0202] In some embodiments, examples of other classifiers that use only the current component information for classifying the current component can be used as cross-component classifiers. For example, as shown in Figure 5A and Table 1-1, EdgeIdx is determined using chroma sample information and eo-class to classify the current chroma sample. Other “non-cross-component” classifiers that can also be used as cross-component classifiers include edge direction, pixel intensity, pixel variation, pixel variance, pixel Laplacian sum, Sobel operator, compass operator, high-pass filtering value, low-pass filtering value, and others.
[0203] In some embodiments, multiple classifiers are used in the same POC. The current frame is divided into multiple regions, each using the same classifier. For example, POC0 uses three different classifiers, and which classifier (0, 1, or 2) is used is signaled at the CTU level, as shown in Table 2-11 below. This indicates that different general classifiers are applied to different regions of the same picture. [Table 17]
[0204] In some embodiments, the maximum number of classifiers (also called alternative offset sets) is fixed or signalable at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level. In one example, the fixed (predefined) maximum number of classifiers is 4. In this case, four different classifiers are used in POC0, and which classifier (0, 1, or 2) is used is signaled at the CTU level. Truncated unary (TU) codes can be used to indicate the classifier used for each luma or chroma CTB. For example, as shown in Table 2-12 below, if the TU code is 0, CCSAO is not applied. If the TU code is 10, set 0 is applied; if the TU code is 110, set 1 is applied; if the TU code is 1110, set 2 is applied; and if the TU code is 1111, set 3 is applied. Fixed-length codes, golom-rice codes, and exponential golom codes can also be used to indicate the classifier (offset set index) for the CTB. POC1 uses three different classifiers. [Table 18]
[0205] Examples of CTB offset set indices for Cb and Cr are shown for the 1280x720 sequence POC0 (where the CTU size is 128x128, resulting in 10x6 CTUs in the frame). POC0 Cb uses four offset sets, while Cr uses one offset set. For example, as shown in Table 2-13 below, if the offset set index is 0, CCSAO is not applied. If the offset set index is 1, set 0 is applied. If the offset set index is 2, set 1 is applied. If the offset set index is 3, set 2 is applied. If the offset set index is 4, set 3 is applied. The type refers to the position of the selected luma sample (Yi) at the same location. Different offset sets may have different types, band_num, and corresponding offsets. Table 2-13 shows examples of CTB offset set indices for Cb and Cr for a 1280×720 sequence POC0 (where the CTU size is 128×128, resulting in 10×6 CTUs in the frame). [Table 19]
[0206] Table 2-14 below shows examples of applying the joint classification of same-position / current and adjacent Y / U / V samples in some embodiments (three-component combined bandNum classification of each Y / U / V component). Table 2-14 shows examples of using same-position / current samples and adjacent Y / U / V samples jointly for classification. In POC0, the {24,1} offset set is applied to {Y,U,V} respectively. Each offset set can be adaptively switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level. Different offset sets can have different classifiers. For example, to classify the current Y4 luma sample as candidate positions (candPos) shown in Figures 13B and 13C, Y set0 selects {current Y4, same-position U4, same-position V4} as candidates, each having different bandNum {Y,U,V}={16,1,2}. Using {candY,candU,candV} as sample values for the selected {Y,U,V} candidates, with a total of 32 classes, the classification index is derived as follows: bandY = (candY * bandNumY) >> BitDepth; bandU = (candU * bandNumU) >> BitDepth; bandV = (candV * bandNumV) >> BitDepth; classIdx = bandY * bandNumU * bandNumV + bandU * bandNumV + bandV.
[0207] In some embodiments, the derivation of classIdx for a combined classifier can be expressed in "or-shift" form to simplify the derivation process. For example, for max bandNum={16,4,4}, classIdx=(bandY<<4)|(bandU<<2)|bandV That is the case.
[0208] Another example is component V set1 classification in POC1. In this example, we use bandNum={4,1,2} and candPos={adjacent Y8,adjacent U3,adjacent V0} to generate eight classes. [Table 20]
[0209] In some embodiments, examples of using identical and adjacent Y / U / V samples jointly for current Y / U / V sample classification are listed, for example, in Table 2-15 below (three-component combined edgeNum(C1) and bandNum classification for each Y / U / V component). edgeCandPos is the center position used in the C1 classifier, edgebitMask is the activation indicator for the C1 adjacent sample, and edgeNum is the corresponding C1 class number. In this example, C1 is applied only to the Y classifier (thus edgeNum is equal to edgeNumY), and edgecandPos is always Y4 (current / identical sample position). However, C1 is applicable to Y / U / V classifiers that have edgecandPos as the adjacent sample position.
[0210] If diff represents the comparison score of Y C1, the derivation of classIdx is as follows: bandY = (candY * bandNumY) >> BitDepth; bandU = (candU * bandNumU) >> BitDepth; bandV = (candV * bandNumV) >> BitDepth; edgeIdx = diff + (edgeNum >> 1); bandIdx = bandY * bandNumU * bandNumV + bandU * bandNumV +bandV; classIdx = bandIdx * edgeNum + edgeIdx; [Table 21] [Table 22] [Table 23]
[0211] In some embodiments, as described above, a combined classifier can be formed by combining multiple C0 classifiers (different position or weight combinations, bandNum) for a single component. This combined classifier can be combined with other components to classify one U sample using, for example, two Y samples (candY / candX, and bandNumY / bandNumX), one U sample (candU and bandNumU), and one V sample (candV and bandNumV) (Y / V can have the same concept). The derivation of the class index is as follows: bandY = (candY * bandNumY) >> BitDepth; bandX = (candX * bandNumX) >> BitDepth; bandU = (candU * bandNumU) >> BitDepth; bandV = (candV * bandNumV) >> BitDepth; classIdx = bandY * bandNumX * bandNumU * bandNumV + bandX * bandNumU * bandNumV + bandU * bandNumV +bandV;
[0212] In some embodiments, when using multiple C0s for a single component, several decoder normative constraints or encoder conformance constraints may apply. These constraints include (1) selected C0 candidates must be distinct from each other (e.g., candX != candY), and / or (2) newly added bandNums must be smaller than other bandNums (e.g., bandNumX <= bandNumY). By applying intuitive constraints within a single component (Y), redundant cases can be eliminated, saving bit cost and complexity.
[0213] In some embodiments, the maximum band_num (bandNumY, bandNumU, or bandNumV) can be fixed or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level. For example, the decoder can be fixed to max band_num=16, and 4 bits can be signaled in each frame to indicate the C0 band_num within the frame. Several other examples of maximum band_num are shown in Table 2-16 below. [Table 24]
[0214] In some embodiments, the maximum number of classes or offsets (combinations using multiple classifiers together, e.g., C1s edgeNum*C1 bandNumY*bandNumU*bandNumV) for each set (or all added sets) can be fixed or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level. For example, the maximum value may be fixed at class_num=256*4 for all added sets, and encoder conformance checks or decoder normative checks may be used to verify the constraint.
[0215] In some embodiments, restrictions can be applied to the C0 classification, for example, restricting band_num(bandNumY, bandNumU, or bandNumV) to only powers of 2. Instead of explicitly signaling band_num, the syntax band_num_shift is signaled. The decoder can use a shift operation to avoid multiplication. Different band_num_shift may be used for different components. Class(C0)=(Y0>>band_num_shift)>>bit_depth
[0216] Another example of calculation considers rounding to reduce errors. Class(C0)=((Y0+(1<<(band_num_shift-1)))>>band_num_shift)>>bit_depth
[0217] For example, if band_num_max(Y, U, or V) is 16, the possible band_num_shift candidates are 0, 1, 2, 3, and 4, corresponding to band_num=1, 2, 4, 8, and 16, respectively, as shown in Table 2-17. [Table 25]
[0218] Offset signaling
[0219] In some embodiments, different classifiers are applied to Cb and Cr. The Cb and Cr offsets for all classes can be signaled separately. For example, different signaling offsets are applied to each chroma component, as shown in Table 2-18 below. [Table 26]
[0220] In some embodiments, the maximum offset value is fixed or signaled in the Sequence Parameter Set (SPS) / Adaptive Parameter Set (APS) / Picture Parameter Set (PPS) / Picture Header (PH) / Slice Header (SH) / Region / CTU / CU / Subblock / Sample Level. For example, the maximum offset is between [-15, 15]. Different components may have different maximum offset values.
[0221] In some embodiments, offset signaling can be performed using differential pulse-code modulation (DPCM). For example, an offset of {3,3,2,1,-1} can be signaled as {3,0,-1,-1,-2}.
[0222] In one embodiment, the offset may be stored in an APS or memory buffer for the next picture / slice reuse. The index can be signaled to indicate which of the previously stored frame offsets is used for the current picture.
[0223] In some embodiments, the Cb and Cr classifiers are the same. The Cb and Cr offsets for all classes can be signaled together, for example, as shown in Table 2-19 below. [Table 27]
[0224] In some embodiments, the Cb and Cr classifiers may be identical. The Cb and Cr offsets for all classes can be signaled together by differences in the sign flag, for example, as shown in Table 16 below. According to Table 2-20, if the Cb offset is (3,3,2,-1), the derived Cr offset is (-3,-3,-2,1). [Table 28]
[0225] In some embodiments, a sign flag can be signaled for each class, for example, as shown in Table 2-21 below. According to Table 2-21, when the Cb offset is (3,3,2,-1), the derived Cr offset will be (-3,3,2,1) according to the respective sign flag. [Table 29]
[0226] In some embodiments, the classifiers for Cb and Cr may be the same. The Cb and Cr offsets for all classes can be jointly signaled with weight differences, as shown in Table 2-22 below, for example, that the Cb and Cr offsets for all classes can be jointly signaled with weight differences. The weights (w) can be selected from a limited table of ±1 / 4, ±1 / 2, 0, ±1, ±2, ±4…, where |w| contains only powers of 2. According to Table 18, if the Cb offset is (3,3,2,-1), the Cr offset derived based on the respective sign flags is (-6,-6,-4,2). [Table 30]
[0227] In some embodiments, the weights can be signaled for each class, as shown in Table 2-23 below. This shows that the Cb and Cr offsets for all classes can be signaled together with the weights signaled for each class. According to Table 2-22, when the Cb offset is (3,3,2,-1), the derived Cr offset is (-6,12,0,-1) according to the respective sign flags. [Table 31]
[0228] In some embodiments, when multiple classifiers are used for the same POC, different offset sets are signaled individually or jointly.
[0229] In some embodiments, previously decoded offsets can be saved for use in future frames. To reduce the overhead of offset signaling, it is possible to signal an index to indicate which previously decoded set of offsets to use for the current frame. For example, the POC0 offset can be reused by POC2 by signaling the offset set idx=0, as shown in Table 2-23 below. Table 2-23 shows that it is possible to signal an index to indicate which previously decoded set of offsets to use for the current frame. [Table 32]
[0230] In some embodiments, the reuse offset sets idx for Cb and Cr may differ, for example, as shown in Table 2-24 below. Table 2-24 shows that it is possible to signal the index to indicate which previously decoded offset set is used for the current frame. The index may differ for the Cb and Cr components. [Table 33]
[0231] In some embodiments, signaling of offsets can use additional syntax including start and length to reduce signaling overhead. For example, when band_num=256, only offsets between band_idx=37 and 44 are signaled. In the example in Table 2-25 below, both the start and length syntax are encoded with a fixed 8-bit length that matches the band_num bit. [Table 34]
[0232] In some embodiments, when CCSAO is applied to all YUV3 components, co-positional and adjacent YUV samples can be jointly used for classification, and all the above-described signaling methods for offsets to Cb / Cr can be extended to Y / Cb / Cr. In some embodiments, offset sets for different components can be stored and used individually (each component has its own stored set) or jointly (each component shares / reuses the same stored set). Examples of individual sets are shown in Table 2-26 below, which illustrates that offset sets for different components can be stored and used individually (each component has its own stored set) or jointly (each component shares / reuses the same stored set). [Table 35]
[0233] In some embodiments, if the sequence bit depth is greater than 10 (or a specific bit depth), the offset may be quantized before signaling. On the decoder side, the decoded offset is dequantized before applying it, as shown in Table 2-27 below. For example, in the case of a 12-bit sequence, the decoded offset is shifted left by 2 (dequantized). [Table 36]
[0234] In some embodiments, the offset can be calculated as CcSaoOffsetVal=(1-2*ccsao_offset_sign_flag)*(ccsao_offset_abs<<(BitDepth-Min(10,BitDepth))).
[0235] In some embodiments, offset quantization is selectable (programmable) in the encoder. Whether to enable offset quantization (on / off control) and the indicated quantization step size can be predefined or signaled at the SPS / APS / PPS / PH / SH / region (set) / CTU / CU / subblock / sample levels. For example, the quantization step size can be predefined according to bit depth and resolution and switched at PH. The on / off control flag and quantization step size can be stored in APS for future frame reuse. The range of step sizes supported by this sequence can be predefined or signaled at each level of SPS / APS / PPS / PH / SH / region (set) / CTU / CU / subblock / sample. The offset quantization mechanism allows for a trade-off between bit cost and improved picture quality in the encoder for offset accuracy.
[0236] In some embodiments, the offset binarization method may depend on the quantization step size. The offset binarization method can be predefined or signaled at the SPS / APS / PPS / PH / SH / region (set) / CTU / CU / subblock / sample levels. Different components may have different {on / off control, quantization step size, offset binarization method} or share the same ones. For example, U / V may use the same one, while Y may use a different one. Different bit depths of sequences may result in different predefined quantization step sizes / offset binarization methods. For example, different EGk orders may be used for different quantization step sizes.
[0237] For example, for the sequence {8b,10b,12b,14b,16b}, a step size of {0,0,2,4,6} is defined beforehand, and the step size / binarization method is switched at different levels.
[0238] For example, in the case of an 8b sequence, • Enable offset quantization with one SPS flag, predefined step size = 0 (offset = 0, ±1, ±2…) • One PH syntax that adaptively changes the step size to 2 (offset=0, ±4, ±8…) • A single-region (set) level syntax that adaptively changes the step size of each set (Set0=0,Set1=1…) • A single-domain (set) level syntax for switching between predefined binarization methods: Set0:TU, Set1:EG1, Set2:FLC…
[0239] For example, in the case of a 10b sequence, • Enable offset quantization with one SPS flag, predefined step size = 1 (offset = 0, ±2, ±4…) • Predefined quantization step size for binarization mapping: 0->EG0, 1->EG1, 2->EG2 • A single APS syntax that stores previously used quantization step size (q) / corresponding EGk order. New pictures can have new APS indices added to them. Index0: Set0: q=0, Set1: q=2, Set2: q=1, Set3: q=0 Index1: Set0: q=1, Set1: q=0, Set2: q=0, Set3: q=2 ... Each region (set) within a single picture can reuse the quantization step size (q) / corresponding EGk order within the saved APS.
[0240] For example, for a sequence of {<480p, 720p, 1080p, 4K, >=8K}, a step size of {0, 0, 2, 4, 6} is defined in advance, and the step size / binarization method is switched at different levels.
[0241] In some embodiments, the concept of filter strength is further introduced. For example, the classifier offset can be further weighted before being applied to the sample. The weight (w) can be selected from a table of powers of 2, e.g., ±1 / 4, ±1 / 2, 0, ±1, ±2, ±4, etc., where |w| contains only powers of 2. The weight index can be signaled at the SPS / APS / PPS / PH / SH / region(set) / CTU / CU / subblock / sample levels. The quantization offset signal can be considered a subset of this weighting application. When a recursive CCSAO is applied as shown in Figure 13D, a similar weight index mechanism can be applied between the first and second stages.
[0242] In some cases, weighting for different classifiers, i.e., offsets for multiple classifiers, can be combined and applied to the same sample. Similar weight indexing mechanisms can be signaled, as described above. For example, offset_final = w * offset_1 + (1-w) * offset_2, or offset_final = w1 * offset_1 + w2 * offset_2 + …
[0243] Adaptive Parameter Set (APS)
[0244] In some embodiments, instead of directly signaling CCSAO parameters to PH / SH, previously used parameters / offsets can be stored in an Adaptive Parameter Set (APS) or memory buffer for reuse in the next picture / slice. An index can be signaled to PH / SH to indicate which previously stored frame offset is used for the current picture / slice. A new APS ID can be created to maintain the CCSAO history offset. The table below shows an example using Figure 13E, candPos, and bandNum{Y,U,V}={16,4,4}. In some examples, the candPos, bandNum, and offset signaling method may be fixed-length code (FLC), or other methods such as truncated unary (TU) code, order k exponential Golomb code (EGk), signed EG0 (SVLC), or unsigned EG0 (UVLC). In this case, sao_cc_y_class_num (or cb,cr) is equal to sao_cc_y_band_num_y * sao_cc_y_band_num_u * sao_cc_y_band_num_v (or cb,cr). ph_sao_cc_y_aps_id is the index of the parameter used in this picture / slice. The cb and cr components can follow the same signaling logic. [Table 37] JPEG0007880965000059.jpg238148
[0245] aps_adaptation_parameter_set_id provides an identifier for the APS that other syntactic elements can refer to. If aps_params_type is equal to CCSAO_APS, the value of aps_adaptation_parameter_set_id must be within the range of 0 to 7 (inclusive).
[0246] ph_sao_cc_y_aps_id specifies the aps_adaptation_parameter_set_id of the CCSAO APS referenced by the Y color component of the current picture slice. If ph_sao_cc_y_aps_id exists, the following applies: The value of sao_cc_y_set_signal_flag for an APSNAL unit where aps_params_type is equal to CCSAO_APS and aps_adaptation_parameter_set_id is equal to ph_sao_cc_y_aps_id is equal to 1. The TemporalId of an APS Network Abstraction Layer (NAL) unit where aps_params_type is equal to CCSAO_APS and aps_adaptation_parameter_set_id is equal to ph_sao_cc_y_aps_id must be less than or equal to the TemporalId of the current picture.
[0247] In some embodiments, an APS update mechanism is described herein. The maximum number of APS offset sets can be predefined or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level. Different components may have different limits on the maximum number. If an APS offset set is full, a newly added offset set can replace one of the existing stored offsets using a first-in, first-out (FIFO), last-in, first-out (LIFO), or least-recently-used (LRU) method, or it receives an index value indicating which APS offset set should be replaced. In some examples, if the selected classifier consists of candPos / edge info / coding info, etc., all classifier information can be taken as part of the APS offset set and stored in the APS offset set along with its offset value. In some cases, the update mechanism described above can be predefined or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level.
[0248] In some embodiments, a constraint called "pruning" can be applied. For example, newly received classifier information and offsets must not be the same as any of the stored APS offset sets (either for the same component or spanning different components).
[0249] In some examples, when the C0 candPos / bandNum classifier is used, the maximum number of APS offset sets is 4 each for Y / U / V, with FIFO updates used for Y / V and the idx indicating the update used for U. Table 2-29 shows the updates of the CCSAO offset set using FIFO. [Table 38]
[0250] In some embodiments, the pruning criteria can be relaxed to provide a more flexible way for encoder trade-offs. For example, N offsets can be different when applying the pruning operation (e.g., N=4), or in another example, a difference (denoted as "thr") in the value of each offset can be allowed when applying the pruning operation (e.g., ±2).
[0251] In some embodiments, the two criteria may be applied simultaneously or individually. Whether each criterion is applied can be predefined or switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level.
[0252] In some embodiments, N / thr can be predefined or switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level.
[0253] In some embodiments, FIFO updates can be (1) cyclically updated from the previously remaining set idx, as in the example above (starting again from set 0 once all have been updated), or (2) updated from set 0 each time. In some examples, updates can be performed at the PH level (as in the example) or at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level when a new offset set is received.
[0254] For LRU updates, the decoder maintains a count table. This counts the "total number of offset sets used" and can be updated at the SPS / APS / Group of Pictures (GOP) structure / PPS / PH / SH / region / CTU / CU / subblock / sample levels. Newly received offset sets replace offset sets that have not been recently used within the APS. If two stored offset sets have the same count, FIFO / LIFO can be used. For example, see component Y in Table 2-30 below. [Table 39]
[0255] In some embodiments, different components may have different renewal mechanisms.
[0256] In some embodiments, different components (e.g., U / V) can share the same classifier (the same candPos / edge info / coding info / offsets may also have weights with modifiers).
[0257] In some embodiments, since offset sets used in different pictures / slices may have only small offset value differences, a “patch” implementation may be used in the offset substitution mechanism. In some embodiments, the “patch” implementation is differential pulse code modulation (DPCM). For example, when signaling a new offset set (OffsetNew), the offset value can be placed on top of an existing APS-stored offset set (OffsetOld). The encoder signals only the difference value to update the old offset set (DPCM: OffsetNew = OffsetOld + delta). As shown in Table 2-31, alternatives to FIFO updates (LRU, LIFO, or signaling an index indicating which set to update) are also available in the following examples. The YUV component may have the same update mechanism or may use a different update mechanism. In the examples in Table 2-31, the classifier candPos / bandNum does not change, but the set classifier override can be indicated by signaling an additional flag (flag=0: update only the set offset, flag=1: update both the set classifier and the set offset). [Table 40]
[0258] In some embodiments, the DPCM differential offset value may be signaled with an FLC / TU / EGk (order=0,1,...) code. A single flag indicating whether to enable the DPCM signal may be signaled for each offset set. The DPCM differential offset value, or newly added offset value (which is signaled directly without using DPCM if APS DPCM=0 is enabled) (ccsao_offset_abs), may be dequantized / mapped before being applied to the target offset (CcSaoOffsetVal). The offset quantization step can be predefined or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels. For example, one method is: Quantization step = 2: CcSaoOffsetVal=(1-2*ccsao_offset_sign_flag)*(ccsao_offset_abs<<1) This involves directly signaling the offset.
[0259] Another method is quantization step = 2: CcSaoOffsetVal=(1-2*ccsao_offset_sign_flag)*(ccsao_offset_abs<<1) This involves directly using the DPCM signaling offset.
[0260] In some embodiments, a constraint may be applied to reduce the overhead of direct offset signaling, for example, the updated offset value must have the same sign as the old offset value. By using such an inferred offset sign, the newly updated offset does not need to retransmit the sign flag (ccsao_offset_sign_flag is inferred to be the same as the old offset).
[0261] Sample processing in several embodiments is described below. Let R(x,y) be the input luma or chroma sample value before CCSAO, and R'(x,y) be the output luma or chroma sample value after CCSAO: offset = ccsao_offset [class_index of R(x, y)] R'(x, y) = Clip3( 0, (1 << bit_depth) - 1, R(x, y) + offset )
[0262] Sample processing
[0263] According to the above formula, each luma or chroma sample value R(x,y) is classified using the classifier indicated by the current picture and / or the current offset set idx. The corresponding offset of the derived class index is added to each luma or chroma sample value R(x,y). The clip function Clip3 is applied to (R(x,y)+offset) to bring the luma or chroma output sample value R’(x,y) within the bit-depth dynamic range, for example, within the range from 0 to (1<<bit_depth)-1.
[0264] For each luma sample or chroma sample, first, classify using the classifier indicated by the current picture / current offset set idx, second, add the corresponding offset of the derived class index, and third, clip to the bit-depth dynamic range.
[0265] FIG. 6 is a block diagram showing that in some implementations of the present disclosure, both the proposed bilateral filter (BIF) and SAO use samples from the deblocking stage as input.
[0266] In some embodiments, when CCSAO operates with other loop filters, the clip operation may be as follows. (1) Clipping after addition: The following formula shows examples of (a) when CCSAO operates with SAO and BIF, or (b) when CCSAO replaces SAO and still operates with BIF. (a) I OUT =clip1(I C +ΔI SAO +ΔI BIF ++ΔI CCSAO ) (b) I OUT =clip1(I C +ΔI CCSAO +ΔI BIF ) (2) Clipping before addition when operating with BIF: In some embodiments, the order of clipping is switchable. (a) IOUT =clip1(I C +ΔI SAO ) I' OUT =clip1(I OUT +ΔI BIF ) I'' OUT =clip1(I'' OUT +ΔI CCSAO ) (b)I OUT =clip1(I C +ΔI BIF ) I' OUT =clip1(I' OUT +ΔI CCSAO ) (3) Clipping after partial addition (a)I OUT =clip1(I C +ΔI SAO +ΔI BIF ) I' OUT =clip1(I OUT +ΔI CCSAO )
[0267] In some embodiments, different clipping combinations result in different trade-offs between correction accuracy and hardware temporary buffer size (register or SRAM bit width).
[0268] Figure 6 illustrates SAO / BIF offset clipping. More specifically, for example, Figure 6 shows the current BIF design when interacting with SAO. Offsets from SAO and BIF are added to the input sample, followed by 1-bit depth clipping. However, if CCSAO is also added at the SAO stage, two possible clipping designs are selectable: (1) one bit-depth clipping is added to the CCSAO, or (2) one harmonic design performs co-clipping after adding the SAO / BIF / CCSAO offset to the input sample. In some embodiments, the BIF is applied only to luma samples, so the above clipping designs differ only in luma samples.
[0269] Boundary processing
[0270] In some embodiments, boundary processing is described below. If either the same-position or adjacent luma (chroma) sample used for classification is located outside the current picture, CCSAO is not applied to the current chroma (chroma) sample. Figures 23A and 23B are block diagrams illustrating that CCSAO is not applied to the current chroma (chroma) sample if either the same-position or adjacent luma (chroma) sample used for classification is located outside the current picture, according to some implementation examples of this disclosure. For example, in Figure 23A, when a classifier is used, CCSAO is not applied to the chroma component in the leftmost column of the current picture. For example, when C1' is used, as shown in Figure 23B, CCSAO is not applied to the chroma component in the leftmost column and the top row of the current picture.
[0271] Figures 24A and 24B show how CCSAO is applied to the current luma or chroma sample when either the identical and adjacent luma or chroma sample used for classification is outside the current picture, according to some implementations of the present disclosure. In some embodiments, in one modification, if either the identical and adjacent luma or chroma sample for classification is outside the current picture, the missing sample can be reused as shown in Figure 24A, or the missing sample can be mirror-padding to create a sample for classification as shown in Figure 24B, and then CCSAO can be applied to the current luma or chroma sample. In some embodiments, if either the identical and adjacent luma (chroma) sample used for classification is outside the current subpicture / slice / tile / patch / CTU / 360 virtual boundary, the invalidation / repeating / mirror picture boundary processing methods disclosed herein can also be applied to the subpicture / slice / tile / CTU / 36 degree virtual boundary.
[0272] For example, a picture is divided into one or more tile rows and one or more tile columns. A tile is a sequence of CTUs that cover a rectangular area of the picture.
[0273] A slice consists of an integer number of complete tiles or an integer number of consecutive complete CTU rows within a tile of pictures.
[0274] A subpicture contains one or more slices that collectively cover the rectangular area of the picture.
[0275] In some embodiments, 360-degree video is captured on a sphere and is essentially without "boundaries." Therefore, reference samples outside the boundaries of a reference picture within the projection region can always be obtained from adjacent samples within the spherical region. In projection formats composed of multiple faces, discontinuities appear between two or more adjacent faces in a frame-packed picture, regardless of how dense the frame packing arrangement is. VVC introduces vertical and / or horizontal virtual boundaries that disable in-loop filtering, and the position of these boundaries is signaled in either the SPS or the picture header. The use of 360-degree virtual boundaries is more flexible than using two tiles, one for each set of consecutive faces, because the size of the faces does not need to be a multiple of the CTU size. In some embodiments, the maximum number of vertical 360-degree virtual boundaries is 3, and the maximum number of horizontal 360-degree virtual boundaries is also 3. In some embodiments, the distance between two virtual boundaries is greater than or equal to the CTU size, and the granularity of the virtual boundaries is 8 luma samples, e.g., an 8x8 sample grid.
[0276] Figures 28A and 28B are block diagrams illustrating that, in some implementations of this application, CCSAO does not apply to the current chroma sample when the corresponding selected identical or adjacent luma sample used for classification lies outside the virtual space defined by the virtual boundary. In some embodiments, the virtual boundary (VB) is a virtual line that demarcates space within the picture frame. In some embodiments, when the virtual boundary (VB) is applied in the current frame, CCSAO does not apply to chroma samples whose selected corresponding luma position lies outside the virtual space defined by the virtual boundary. Figures 28A and 28B show examples with a virtual boundary for a C0 classifier with nine luma position candidates. For each CTU, CCSAO does not apply to chroma samples whose corresponding selected luma position lies outside the virtual space enclosed by the virtual boundary. For example, in Figure 28A, if the selected Y7 luma sample position is on the opposite side of the horizontal virtual boundary 2806 located 4 pixels from the bottom of the frame, CCSAO does not apply to chroma sample 2802. For example, in Figure 28B, if the selected Y5 chroma sample position is located on the opposite side of the vertical virtual boundary 2808, which lies on the y-pixel line from the right side of the frame, then CCSAO is not applied to the chroma sample 2804.
[0277] Figures 32A and 32B show that, according to several implementations of this disclosure, iterative padding or mirror padding can be applied to chroma samples outside the virtual boundary. Figure 32A shows an example of iterative padding. If the original Y7 is selected as the classifier located below VB3202, the chroma sample value of Y4 is used for classification instead of the original Y7 chroma sample value (copied to the position of Y7). Figure 32B shows an example of mirror padding. If Y7 is selected as the classifier located below VB3204, the Y1 chroma sample value, which is symmetric to the Y7 value with respect to the Y0 chroma sample value, is used for classification instead of the original Y7 chroma sample value. The padding scheme allows for greater coding gain to be achieved because it gives more chroma samples applicability to CCSAO.
[0278] In some embodiments, restrictions can be applied to reduce the line buffer required for CCSAO and simplify boundary processing condition checks. Figure 26A shows that, according to some implementations of this disclosure, if all nine adjacent luma samples at the same location are used for classification, an additional luma line buffer, i.e., all line luma samples above line-5 above the current VB1602, may be required. Figures 18A–18G show an example where only six luma candidates are used for classification. This reduces the line buffer and eliminates the need for the additional boundary checks shown in Figures 23A–23B and 24A–24B.
[0279] In some embodiments, using luma samples for CCSAO classification can increase the luma line buffer and therefore increase the hardware implementation cost of the decoder. Figure 25 shows that in some embodiments of this disclosure, nine luma candidate CCSAOs intersecting VB1702 can increase the luma line buffer by two additional values. For luma and chroma samples above the virtual boundary (VB) 1702, the DBF / SAO / ALF is processed in the current CTU row. For luma and chroma samples below VB1702, the DBF / SAO / ALF is processed in the next CTU row. In the hardware design of the AVS decoder, pre-DBF samples from luma lines -4 to -1, pre-SAO samples from line -5, and pre-DBF samples from chroma lines -3 to -1, and pre-SAO samples from line -4 are stored as line buffers for DBF / SAO / ALF processing in the next CTU row. When processing the next CTU row, luma and chroma samples that are not in the line buffers are unavailable. However, for example, at the position of chroma line -3(b), the chroma sample is processed in the next CTU line, but CCSAO requires pre-SAO luma sample lines -7, -6, and -5 for classification. Pre-SAO luma sample lines -7 and -6 cannot be used because they are not in the line buffer. Also, adding pre-SAO luma sample lines -7 and -6 to the line buffer increases the hardware implementation cost of the decoder. In some examples, the luma VB (line -4) and chroma VB (line -3) may be different (not aligned).
[0280] Similar to Figure 25, Figure 26A is an explanatory diagram in VVC showing that nine luma candidate CCSAOs intersecting VB1802 can increase the luma line buffer by one, according to some embodiments of this disclosure. VB may differ depending on the standard. In VVC, since the luma VB is line-4 and the chroma VB is line-2, the nine CCSAO candidates increase the luma line buffer by one.
[0281] In some embodiments, in the first solution, if any of the luma candidates of the chroma sample cross VB (are outside the VB of the current chroma sample), the CCSAO is disabled for the chroma sample. Figures 27A–27C show that in AVS and VVC, according to some embodiments of the present disclosure, if any of the luma candidates of the chroma sample cross VB2702 (are outside the VB of the current chroma sample), the CCSAO is disabled for the chroma sample. Figures 28A–28B also show some examples of this implementation.
[0282] In some embodiments, in the second solution, repetitive padding from luma lines close to and opposite VB, e.g., luma line-4, is used for "cross VB" chroma candidates in the CCSAO. In some embodiments, repetitive padding from the nearest luma below VB is implemented for "cross VB" chroma candidates. Figures 29A-29C show that, according to the exemplary implementations of this application, in the AVS and VVC, if any of the chroma sample luma candidates straddle VB (are outside the current chroma sample VB), the CCSAO is enabled using repetitive padding for the chroma sample. Figure 28A also shows some examples of this implementation.
[0283] In some embodiments, a third solution involves using mirror padding from below the luma VB for "cross VB" luma candidates in the CCSAO. Figures 30A–30C show how, according to some implementations of this disclosure, CCSAO is enabled in the AVS and VVC when any of the luma candidates of the chroma sample crosses VB3002 (is outside the current chroma sample VB). Figures 28B and 24B also show some examples of this implementation. In some embodiments, a fourth solution involves using "double-sided symmetric padding" for the application of CCSAO. Figures 31A–31B show how CCSAO is enabled using double-sided symmetric padding for some examples of different CCSAO shapes (e.g., nine luma candidates (Figure 31A) and eight luma candidates (Figure 31B)) according to some implementations of this disclosure. For a luma sample set having a central luma sample at the same position as the chroma sample, if one side of the luma sample set is outside VB3102, double-sided symmetrical padding is applied to both sides of the luma sample set. For example, in Figure 31A, since luma samples Y0, Y1, and Y2 are outside VB3102, both Y0, Y1, Y2 and Y6, Y7, Y8 are padded using Y3, Y4, and Y5. For example, in Figure 31B, since luma sample Y0 is outside VB3102, Y0 is padded using Y2 and Y7 is padded using Y5.
[0284] Figure 26B shows that when chroma samples at the same location or adjacent to each other are used to classify the current chroma sample, the selected chroma candidates may span VB, potentially requiring an additional chroma line buffer according to some embodiments of this disclosure. The problem can be addressed by applying solutions 1-4 similar to those described above.
[0285] Solution 1 is to disable CCSAO for the chroma sample if any of the chroma candidates span across VB.
[0286] Solution 2 involves using iterating from the nearest chroma below VB for the chroma candidates in "Cross VB".
[0287] Solution 3 involves using mirror padding from below the chroma VB for the chroma candidate of "Cross VB".
[0288] Solution 4 is to use "double-sided symmetric padding". For a candidate set centered on a CCSAO co-positional chromatic sample, if one side of the candidate set is outside the VB, double-sided symmetric padding is applied to both sides.
[0289] The padding method allows for the application of CCSAO to more luma or chroma samples, and therefore greater coding gain can be achieved.
[0290] In some embodiments, at the bottom picture (or slice, tile, brick) boundary CTU row, samples below VB are processed in the current CTU row, and therefore the special handling described above (Solutions 1, 2, 3, 4) does not apply to this bottom picture (or slice, tile, brick) boundary CTU row. For example, a 1920×1080 frame is divided into 128×128 CTUs. The frame contains 15×9 (rounded up) CTUs. The bottommost CTU row is the 15th CTU row. The decoding process is performed CTU row by CTU row, and for each CTU row, it is performed CTU by CTU. Deblocking needs to be applied along the horizontal CTU boundary between the current CTU row and the next CTU row. Within a single CTU, in the bottommost 4 / 2 luma / chroma line, the DBF sample (in the case of VVC) is processed in the next CTU row and is not available for the CCSAO of the current CTU row, so the VB of the CTB is applied to each CTU row. However, in the bottommost CTU row of the picture frame, since there is no next CTU row remaining, the DBF sample of the bottommost 4 / 2 luma / chroma line is available in the current CTU row and is processed for DBF in the current CTU row.
[0291] In some embodiments, the VBs shown in Figures 13 to 22 can be replaced with the boundaries of a subpicture / slice / tile / patch / CTU / 360-degree virtual boundary. In some embodiments, the positions of the chroma sample and luma sample in Figures 13 to 22 can be swapped. In some embodiments, the positions of the chroma sample and luma sample in Figure 6, Figures 23A to 32B can be replaced with the positions of the first chroma sample and the second chroma sample. In some embodiments, the ALF VB inside the CTU may generally be horizontal. In some embodiments, the boundaries of a subpicture / slice / tile / patch / CTU / 360-degree virtual boundary may be horizontal or vertical.
[0292] In some embodiments, limitations can be applied to reduce the line buffer required for CCSAO and to simplify boundary processing condition checks. Figure 26A shows that if all nine identical adjacent luma samples are used for classification, an additional luma line buffer (all line luma samples of line: -5) may be required. Figures 33A and 33B illustrate limitations on using a limited number of luma candidates for classification according to some embodiments of the present disclosure. Figure 33A shows a limitation on using only six luma candidates for classification. Figure 33B shows a limitation on using only four luma candidates for classification.
[0293] Application area
[0294] In some embodiments, application regions are implemented. The CCSAO application region unit can be based on a CTB; that is, the on / off control and CCSAO parameters (classification, offset used for offset set index, luma candidate position, band_num, bitmask, etc.) are the same within a single CTB.
[0295] In some embodiments, the application region does not need to be aligned to the CTB boundary. For example, the application region is not aligned to the chroma CTB boundary but is shifted. The syntax (on / off control, CCSAO parameters) is still signaled for each CTB, but the true application region is not aligned to the CTB boundary. Figure 34 shows that some implementation examples of this disclosure demonstrate that the CCSAO application region is not aligned to the CTB / CTU boundary 3406. For example, the application region is not aligned to the chroma CTB / CTU boundary 3406 and is shifted to the upper left (4,4) relative to VB3408. This unaligned CTB boundary design benefits the deblocking process because the same deblocking parameters are used for each 8x8 deblocking process region.
[0296] In some embodiments, as shown in Table 2-32, the CCSAO application area unit (mask size) may be variable (larger or smaller than the CTB size). The mask size may differ for different components. The mask size can be switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level. For example, in PH, a series of mask on / off flags and offset set indices are signaled to indicate each CCSAO region information. [Table 41]
[0297] In some embodiments, the CCSAO application region frame partitioning can be fixed. For example, the frame is partitioned into N regions. Figure 35 shows that, by some implementation examples of this disclosure, the CCSAO application region frame partitioning can be fixed by CCSAO parameters.
[0298] In some embodiments, each region may have its own region on / off control flag and CCSAO parameters. Furthermore, if the region size is larger than the CTB size, it may have both a CTB on / off control flag and a region on / off control flag. Figures 35(a) and 35(b) show several examples of partitioning a frame into N regions. Figure 35(a) shows a vertical partition into four regions. Figure 35(b) shows a square partition into four regions. In some embodiments, if the region on / off control flag is off, the CTB on / off flag can be further signaled, similar to the image-level CTB all-on control flag (ph_cc_sao_cb_ctb_control_flag / ph_cc_sao_cr_ctb_control_flag). Otherwise, CCSAO is applied to all CTBs within this region without further signaling of the CTB flag.
[0299] In some embodiments, different CCSAO application regions can share the same region's on / off control and CCSAO parameters. For example, in Figure 35(c), regions 0-2 share the same parameters, and regions 3-15 share the same parameters. Figure 35(c) also shows that region on / off control flags and CCSAO parameters can be signaled in Hilbert scan order.
[0300] In some embodiments, a CCSAO application area unit can be a quadtree / binary / ternary tree partition from the picture / slice / CTB level. Similar to CTB partitions, a set of partitioned flags are signaled to indicate the CCSAO application area partition. Figure 36 shows, with some implementation examples of the present disclosure, that a CCSAO application area can be a binary tree (BT) / quadtree (QT) / ternary tree (TT) partition from the frame / slice / CTB level.
[0301] Figure 37 is a block diagram showing multiple classifiers used and switched at different levels within a picture frame, according to some implementations of the present disclosure. In some embodiments, when multiple classifiers are used in a single frame, the method of applying the classifier set index can be switched at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample levels. For example, as shown in Table 2-33 below, four classifier sets are used in a frame and switched at PH. Figures 37(a) and 37(c) show the default fixed-region classifier. Figure 37(b) shows that the classifier set index is signaled at the mask / CTB level, where 0 means CCSAO off for this CTB and 1-4 means set index. [Table 42]
[0302] In some embodiments, for a default region, the region-level flag may be signaled if the CTB in this region does not use the default set index (e.g., the region-level flag is 0) and uses a different classifier set within this frame. For example, if the default set index is used, the region-level flag is 1. For example, in four regions of a square partition, the following classifier sets are used, as shown in Table 2-34 below, which shows that the region-level flag may be signaled to indicate whether the CTB in this region does not use the default set index. [Table 43]
[0303] Figure 38 is a block diagram illustrating that, according to some implementations of this disclosure, the CCSAO application area partitioning is dynamic and can be switched at the picture level. For example, Figure 38(a) shows that three CCSAO offset sets are used in this POC (set_num=3) and the picture frame is partitioned vertically into three areas. Figure 38(b) shows that four CCSAO offset sets (set_num=4) are used in this POC and the picture frame is partitioned horizontally into four areas. Figure 38(c) shows that three CCSAO offset sets (set_num=3) are used in this POC and the picture frame is partitioned into three areas by the raster. Each area may have its own all-on flag for storing CTB on / off control bits. The number of areas depends on the signaled picture set_num.
[0304] The CCSAO application area may be a specific area within a block that corresponds to the encoding information (sample position, sample encoding mode, loop filter parameters, etc.). For example, (1) the CCSAO application area is applicable only if the samples are skip-mode encoded, or (2) the CCSAO application area contains only N samples along the CTU boundary, or (3) the CCSAO application area contains only samples on an 8x8 grid within the frame, or (4) the CCSAO application area contains only DBF-filtered samples, or (5) the CCSAO application area contains only the top M rows and left N rows within the CU, or (6) the CCSAO application area contains only intra-encoded samples, or (7) the CCSAO application area contains only samples within a cbf=0 block, or (8) the CCSAO application area is only on blocks with a block QP of [N,M], where (N,M) can be predefined or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock / sample level. Cross-component coding information may also be considered, (9) the CCSAO application region is on a chroma sample where the luma sample at the same location is in a cbf=0 block.
[0305] In some embodiments, whether or not to introduce restrictions on the coding information application area can be predefined, or a single control flag can be signaled at the SPS / APS / PPS / PH / SH / area (per alternate set) / CTU / CU / subblock / sample level to indicate whether or not specific coding information is included in the CCSAO application. According to the predefined conditions or control flags, the decoder skips CCSAO processing to those areas. For example, YUV uses another predefined condition / flag-controlled condition that is switched at the area (set) level. The CCSAO application decision may be at the CU / TU / PU level or the sample level. Table 2-35 shows that YUV uses another predefined condition / flag-controlled condition that is switched at the area (set) level. [Table 44]
[0306] Another example is the reuse of all or part of a bilateral enablement constraint (predefined). bool isInter = (currCU.predMode == MODE_INTER ) ? true : false; if (ccSaoParams.ctuOn [ctuRsAddr] && ((TU::getCbf (currTU, COMPONENT_Y) || isInter == false ) && (currTU.cu ->qp > 17 )) && (128 > std::max(currTU.lumaSize().width, currTU.lumaSize().height)) && ((isInter == false ) || (32 > std::min(currTU.lumaSize().width, currTU.lumaSize().height))))
[0307] In some embodiments, excluding certain areas may be beneficial for CCSAO statistical collection. For areas that truly require correction, the offset derivation may be more accurate or appropriate. For example, a block with cbf=0 typically means the block is fully predicted and requires no further correction. Excluding such blocks may be beneficial for offset derivation in other areas.
[0308] Different classifiers can be used in different application domains. For example, in a CTU, skip mode uses C1, 8x8 grid uses C2, and skip mode and 8x8 grid uses C3. For example, in a CTU, samples encoded in skip mode use C1, samples at the center of a CU use C2, and samples encoded in skip mode at the center of a CU use C3. Figure 39 shows that the CCSAO classifier can take into account the encoding information of the current component or cross component, as shown in some implementation examples of this disclosure. For example, different encoding modes / parameters / sample locations can form different classifiers. It is also possible to combine different encoding information to form a combined classifier. Different domains can use different classifiers. Figure 29 shows another example of an application domain.
[0309] In some embodiments, a predefined or flag-controlled "encoded information exclusion area" mechanism can be used in DBF / Pre-SAO / SAO / BIF / CCSAO / ALF / CCALF / NN loop filter (NNLF) or other loop filters.
[0310] syntax
[0311] Table 2-36 below shows the CCSAO syntax implemented in several embodiments. In some examples, the binarization of each syntactic element is modifiable. In AVS3, the term patch is analogous to slice, and patch headers are analogous to slice headers. FLC represents a fixed-length code. TU represents a truncated unary code. EGk represents an exponential Golomb code of degree k, where k is fixed. SVLC represents a signed EG0. UVLC represents an unsigned EG0. [Table 45] JPEG0007880965000068.jpg214164
[0312] If a higher-level flag is off, lower-level flags can be inferred from the off state of that flag and do not need to be signaled. For example, if ph_cc_sao_cb_flag is false in this picture, then ph_cc_sao_cb_band_num_minus1, ph_cc_sao_cb_luma_type, cc_sao_cb_offset_sign_flag, cc_sao_cb_offset_abs, ctb_cc_sao_cb_flag, cc_sao_cb_merge_left_flag, and cc_sao_cb_merge_up_flag do not exist and are inferred to be false.
[0313] In some embodiments, SPS ccsao_enabled_flag is conditional on the SPS SAO enabled flag, as shown in Table 2-37 below. [Table 46]
[0314] In some embodiments, ph_cc_sao_cb_ctb_control_flag and ph_cc_sao_cr_ctb_control_flag indicate whether to enable granularity of Cb / Cr CTB on / off control. If ph_cc_sao_cb_ctb_control_flag and ph_cc_sao_cr_ctb_control_flag are enabled, ctb_cc_sao_cb_flag and ctb_cc_sao_cr_flag may be further signaled. Otherwise, whether CCSAO is applied to the current picture depends on ph_cc_sao_cb_flag and ph_cc_sao_cr_flag, and ctb_cc_sao_cb_flag and ctb_cc_sao_cr_flag are not further signaled at the CTB level.
[0315] In some embodiments, for ph_cc_sao_cb_type and ph_cc_sao_cr_type, flags may be further signaled to distinguish whether the central luma position at the same location (Y0 position in Figures 18A to 18G) is used for classification to the chroma sample, in order to reduce bit overhead. Similarly, if cc_sao_cb_type and cc_sao_cr_type are signaled at the CTB level, flags may be further signaled by the same mechanism. For example, if there are 9 candidate C0 luma positions, as shown in Table 2-38 below, cc_sao_cb_type0_flag is further signaled to distinguish whether the central luma position at the same location is used. If the central luma position at the same location is not used, cc_sao_cb_type_idc is used to indicate which of the remaining 8 adjacent luma positions is used. [Table 47]
[0316] Table 2-39 below shows an example of how a single (set_num=1) or multiple (set_num>1) classifier is used within a frame in AVS. The syntactic notation can be mapped to the notation used above. [Table 48]
[0317] When combined with Figure 35 or 37, where each region has its own set, an example syntax may include a region on / off control flag (picture_ccsao_lcu_control_flag[compIdx][setIdx]), as shown in Table 2-40 below. [Table 49]
[0318] In some embodiments, the pps_ccsao_info_in_ph_flag and gci_no_sao_constraint_flag can be added to the high-level syntax.
[0319] In some embodiments, pps_ccsao_info_in_ph_flag being equal to 1 indicates that CCSAO filter information may be present in the PH syntax structure and not in slice headers referencing PPS that do not contain a PH syntax structure. pps_ccsao_info_in_ph_flag being equal to 0 indicates that CCSAO filter information is not present in the PH syntax structure but may be present in slice headers referencing PPS. If pps_ccsao_info_in_ph_flag is absent, its value is presumed to be equal to 0.
[0320] In some embodiments, gci_no_ccsao_constraint_flag being equal to 1 specifies that sps_ccsao_enabled_flag for all pictures in OlsInScope is equal to 0. If gci_no_ccsao_constraint_flag is equal to 0, no such constraint is imposed. In some embodiments, a video bitstream consists of one or more output layer sets (OLS) according to rules. In the examples herein, OlsInScope refers to one or more OLS in scope. In some examples, the profile_tier_level() syntax structure provides level information to which OlsInScope conforms, and optionally, profile, hierarchy, subprofile, and general constraint information. If the profile_tier_level() syntax structure is included in a VPS, OlsInScope is one or more OLS specified in the VPS. If the profile_tier_level() syntax is included in an SPS, OlsInScope is an OLS that includes only the lowest layer among the layers referencing the SPS, and this lowest layer is an independent layer.
[0321] In some embodiments, the separate signaling of band_num_y_minus1, band_num_u_minus1, and band_num_v_minus1 can result in syntactic redundancy. For example, as shown in Table 2-41, U / V are not segmented (redundant), so U1 / V1 are the same as Y1. [Table 50]
[0322] In some embodiments, if the classifier is a band classifier or a combined classifier encompassing band classifiers, the indicator is predefined or signaled by an encoder. In some examples, the band classifier can be determined by utilizing sample values based on one or more samples from the same position and / or adjacent samples of the Y component, as well as the current and adjacent samples of the U / V component for each sample of the U / V component, dividing the range of sample values into multiple bands, and selecting a band from the multiple bands.
[0323] In some examples, if a classifier or composite classifier (e.g., C0+C10) consists of one or more component bandNum segments, the indicator may be a bandNum indicator (bandIdc, bandNum mapping table) predefined or signaled at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock level to represent one or more component bandNum segments. Different components may have different bandNum indicators. Different components may share the same bandNum indicator. For example, U / V share the same bandNum mapping table.
[0324] For example, if the U / V bandIdc contains bandNum segments for two or more components, the bandNum indicator can be predefined for different components. Table 2-42 shows an example of a bandNum mapping table. As shown in Table 2-42 below, if the current sample is the U component, the bandNum indicator signaled as 2 may indicate the use of the Y3 segment, i.e., the bandNum segment of the Y component. In another example, if the current sample is the V component, the bandNum indicator signaled as 6 indicates the use of the U2 and V2 segments, and the bandNum indicator signaled as 7 indicates the use of the Y2, U2, and V2 segments. [Table 51]
[0325] In some embodiments, C (current chroma) and CT (transposed chroma, using other chroma components) can be used to represent bandIdc. Table 2-43 shows an example of a bandNum mapping table. For example, as shown in Table 2-43, when C represents the current chroma component, CT represents another chroma component. For example, when C represents the U component, CT represents the V component, and when C represents the V component, CT represents the U component. [Table 52]
[0326] In some embodiments, the bandNum mapping table may be predefined or adjusted / modified by an encoder at the SPS / APS / PPS / PH / SH / region / CTU / CU / subblock level according to different granularity requirements.
[0327] Extension to intra and interpost predictive SAO filters
[0328] Extensions to intra and interpost-predictive SAO filters in some embodiments are described further below. In some embodiments, the SAO classification method disclosed herein (including cross-component sample / encoded information classification) can function as a post-predictive filter, and the prediction may be intra, inter, or other predictive tools such as intra-block copy. Figure 40A is a block diagram showing the SAO classification method disclosed herein acting as a post-predictive filter according to some implementation examples of the disclosure.
[0329] In some embodiments, a corresponding classifier is selected for each of the Y, U, and V components. For each component's predicted sample, classification is first performed, and a corresponding offset is added. For example, each component can use the current sample and adjacent samples for classification. As shown in Table 2-44 below, Y uses the current Y sample and adjacent Y samples for classification, and U / V uses the current U / V sample for classification. Figures 40B to 40D are block diagrams showing that, with respect to post-predictive SAO filters, each component can use the current sample and adjacent samples for classification in some implementation examples of this disclosure. [Table 53]
[0330] In some embodiments, improved prediction samples (Ypred', Upred', Vpred') are updated by adding the corresponding class offset and then used for intra, inter, or other predictions. Ypred' = clip3(0, (1 << bit_depth)-1, Ypred + h_Y[i]) Upred' = clip3(0, (1 << bit_depth)-1, Upred + h_U[i]) Vpred' = clip3(0, (1 << bit_depth)-1, Vpred + h_V[i])
[0331] In some embodiments, in addition to the current chroma components, a cross component (Y) can be used for further offset classification of the chroma's U and V components. For example, as shown in Table 2-45 below, an additional cross component offset (h'_U, h'_V) can be added to the offset of the current components (h_U, h_V). [Table 54]
[0332] In some embodiments, improved prediction samples (Upred'', Vpred'') are updated by adding a corresponding class offset and then used for intra, inter, or other predictions. Upred'' = clip3(0, (1 << bit_depth)-1, Upred' + h'_U[i]) Vpred'' = clip3(0, (1 << bit_depth)-1, Vpred' + h'_V[i])
[0333] In some embodiments, intra and inter predictions can use different SAO filter offsets.
[0334] Extension to post-reconstruction filters
[0335] Figure 15C is a block diagram showing how the SAO classification method disclosed in this disclosure acts as a post-reconstruction filter in several implementation examples of this disclosure.
[0336] In some embodiments, the SAO / CCSAO classification methods disclosed herein (including cross-component sample / encoded information classification) can function as filters applied to reconstructed samples of tree units (TUs). As shown in Figure 15C, CCSAO can function as a post-reconstruction filter. That is, the reconstructed sample (after predictive / residual sample addition, before deblocking) is used as input to the classification, compensating for luma / chroma samples before entering adjacent intra-prediction. The CCSAO post-reconstruction filter can reduce distortion of the current TU sample and provide better predictions for adjacent intra / interblocks. Better predictions can be expected to lead to better compression efficiency.
[0337] Encoding algorithm
[0338] In some embodiments, a single hierarchical rate-distortion (RD) optimization algorithm is designed to efficiently determine the optimal CCSAO parameters for a single picture, including: 1) an incremental scheme for searching for the optimal single classifier; 2) a training process for fine-tuning the offset values of the single classifier; and 3) a robust algorithm for effectively assigning the appropriate classifier to different local regions. A typical CCSAO classifier is as follows: band Y =( Y col ·N Y )>>BD band U =(U col ·N U )>>BD band V =(V col ·N V )>>BD i=band Y ·(N U ·N V ) + band U ·N V +band V C' rec =Clip1(C rec +σ_ CCSAO [i])) Here, {Y col ,U col ,V col} are three identically located samples used to classify the current sample, and {N Y ,N U ,N V} are the number of bands applied to the Y, U, and V components, respectively, BD is the encoding bit depth, and C rec and C' rec These are the reconstructed samples before and after the application of CCSAO, and σ_ CCSAO [i] is the value of the CCSAO offset applied to the i-th category, and Clip1() takes the input as a bit depth range, i.e., [0, 2 BDThis is a clipping function that clips to [,-1], and >> represents a right shift operation. In the proposed CCSAO, luma samples at the same position can be selected from nine candidate positions, but chroma samples at the same position are fixed.
[0339] Progressive Search Scheme
[0340] In some embodiments, N categories (N Y ,N U ,N V To find the best classifier consisting of (N categories), a multi-stage early termination method is applied. If the RD cost does not improve with a classifier with a small number of categories, the classifier with a large number of categories is skipped. Multiple breakpoints for N-category early termination are set based on different configurations. For example, AI: every 4 categories (N Y ,N U ,N V <4,8,12…), RA / LB: 16 categories (N Y ,N U ,N V (<16, 32, 48, 64…)
[0341] Furthermore, the classifier is skipped if Ny is less than Nu or Nv, or if all categories N are greater than the threshold. The gradual approach not only adjusts the overall bit cost but also significantly reduces coding time. 9 Y COL The process is repeated for each location to determine the best single classifier.
[0342] Offset value improvement (Refinement)
[0343] In some embodiments, for a given classifier, the reconstructed samples in the picture are first classified according to equation (1). An initial offset for each category is derived using SAO fast distortion estimation. Further estimation is repeated using the RD cost of smaller offset values until the value becomes 0. Thus, CTBs that do not show RD cost improvement are disabled, and the remaining CTBs are retrained to obtain improved offset values. The CTB on-off procedure is repeated until the RD cost of the picture no longer improves or a threshold count is reached.
number
[0344] In some embodiments, for a given category, k, s(k), x(k) are the sample position, the original sample, and the sample before CCSAO, E is the sum of the differences between s(k) and x(k), N is the number of samples, ΔD is the difference distortion estimated by applying the offset h, ΔJ is the RD cost, λ is the Lagrange multiplier, and R is the bit cost.
[0345] In some embodiments, the original sample may be a true original sample (a raw image sample without preprocessing) or a motion-compensated temporal filter (MCTF, a classic encoding algorithm that preprocesses the original sample before encoding). λ is either the same as in SAO / ALF or weighted by a coefficient (depending on the configuration / resolution).
[0346] In some embodiments, the encoder optimized CCSAO by trading off the total RD cost across all categories.
[0347] In some embodiments, statistical data E and N for each category are stored for each CTB to further determine multiple domain classifiers.
[0348] Robust Multiple Classifier Assignment
[0349] In some embodiments, CTS with CCSAO enabled are sorted in ascending order according to distortion (or according to RD cost including bit cost) to determine whether the second classifier benefits the overall image quality.
[0350] In some embodiments, half of the CTBs with smaller distortion (or a predefined / dependent ratio, e.g., (setNum-1) / setNum-1) maintain the same classifier, while the other half of the CTBs are trained with a new second classifier. Meanwhile, during the offset refinement that turns the CTBs on and off, each CTB can select its best classifier, so that good classifiers can be propagated to more CTBs. In the spirit of shuffling and diffusion, this strategy gives both randomness and robustness to parameter determination. If there is no further improvement in RD cost with the current number of classifiers, a further number of classifiers are skipped.
[0351] Figure 41 shows a computing environment 4110 coupled with a user interface 4150. The computing environment 4110 may be part of a data processing server. The computing environment 4110 includes a processor 4120, memory 4130, and an input / output (I / O) interface 4140.
[0352] The processor 4120 typically controls the overall operation of the computing environment 4110, including operations related to display, data acquisition, data communication, and image processing. The processor 4120 may include one or more processors to execute instructions that perform all or some of the steps in the method described above. Furthermore, the processor 4120 may include one or more modules that facilitate interaction between the processor 4120 and other components. The processor may be a central processing unit (CPU), a microprocessor, a single-chip machine, a graphics processing unit (GPU), etc.
[0353] Memory 4130 is configured to store various types of data to support the operation of the computing environment 4110. Memory 4130 may contain certain software 4132. Examples of such data include instructions for any application or method operated in the computing environment 4110, video datasets, image data, and so on. Memory 4130 can be implemented using any type of volatile or non-volatile memory device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.
[0354] The I / O interface 4140 provides an interface between the processor 4120 and peripheral interface modules such as a keyboard, click wheel, and buttons. The buttons include, but are not limited to, a home button, a scan start button, and a scan stop button. The I / O interface 4140 can be coupled to an encoder and decoder.
[0355] Figure 42 is a flowchart illustrating a video decoding method according to an example of the present disclosure.
[0356] In step 4201, the processor 4120 can receive a picture frame from the video decoder containing one or more components, which may include a first component and a second component.
[0357] In some examples, the first component may include one of the luma component, the first chromatic component, or the second chromatic component, and the second component may include one of the luma component, the first chromatic component, or the second chromatic component. The luma component may be the Y component, the first chromatic component the U component, and the second chromatic component the V component. The first and second chromatic components are interchangeable.
[0358] In step 4202, the processor 4120 can determine the classifier for each sample of the second component according to the same-position sample of the first component.
[0359] In some examples, a co-positional sample of the first component can be obtained by linear weighting from at least one of several co-positional and adjacent luma samples, or a three-component linearly weighted sample consisting of a luma component, a first chromatic component, and a second chromatic component.
[0360] In some examples, the first component's co-positional sample can be obtained by linearly weighting samples from multiple co-positional and adjacent luma samples, as shown in Figures 20A-20B and 21A-21B. For example, the co-positional luma sample value (Y0) can be replaced with a value (Yp) obtained by weighting the co-positional and adjacent luma samples, as shown in Figures 20A-20B.
[0361] In some examples, the co-positional sample of the first component can be obtained by linearly weighting multiple linearly weighted samples for the three components, including the luma component, the first chroma component, and the second chroma component. For example, in classifier C7, the classification of the C0 / C3 bandNum of the current component is derived using co-positional / current and adjacent samples of all three color components. For example, to classify the current U sample, co-positional and adjacent Y / V, current and adjacent U samples are used, as shown in Figure 13B.
[0362] In step 4203, the processor 4120 can obtain an indicator showing one or more bandNum segments of one or more components in response to the determination that the classifier is a band classifier or a combined classifier that encompasses band classifiers.
[0363] In some cases, indicators can be predefined or signaled by the encoder at at least one level. For example, at least one level may include at least one of the following: Sequence Parameter Set (SPS) level, Adaptive Parameter Set (APS) level, Picture Parameter Set (PPS) level, Picture Header (PH) level, Sequence Header (SH) level, Region level, Coding Tree Unit (CTU) level, Subblock level, or Sample level. If indicators are predefined, the encoder does not need to design or create signals, reducing complexity.
[0364] In some examples, as shown in Table 2-42, the luma component, the first chromatic component, and the second chromatic component can use different indicators to represent their respective bandNum segments.
[0365] In some examples, at least two of the luma component, first chroma component, or second chroma component use the same indicator to show their respective bandNum segments.
[0366] In some examples, chroma transpose components may be used in the bandNum mapping table, where the chroma transpose component is the transpose of either the first or second chroma component. For example, as shown in Table 2-43, C represents the current chroma component, which may be either the first or second chroma component, and CT represents the chroma transpose component, which is the transpose of C, i.e., the other chroma component. For example, when C represents the U component, CT represents the V component, and when C represents the V component, CT represents the U component.
[0367] In some examples, the indicator may show one or more bandNum segments of one or more components, including at least one of the following: a luma component, a first chromatic component, a second chromatic component, or a chromatic transpose component related to the first or second chromatic component, as shown in Table 2-42.
[0368] In step 4204, the processor 4120 can determine one or more bandNum segments according to the indicator, and then determine the band offset according to one or more bandNum segments.
[0369] In some examples, one or more bandNum segments can be determined according to indicators determined from a mapping table.
[0370] In some examples, the mapping table may be predefined or may be adjusted / modified by the encoder. For example, the mapping table may be a bandNum mapping table as shown in Tables 2-42 and 2-43.
[0371] In some cases, if the mapping table is predefined, it is designed offline and cannot be designed or modified by the encoder. Both the encoder and decoder can have the mapping table, and there is no need to signal the mapping table in the bitstream sent to the decoder. In such situations, since neither the encoder nor the decoder needs to modify the mapping table, the encoder / decoder does not need to support various variables, which can reduce complexity.
[0372] For example, the mapping table can be predefined offline, as shown in Table 2-42 or Table 2-43. Both the encoder and decoder know the mapping table without signaling. Pre-defining it eliminates the need for design or signaling in the encoder.
[0373] In some other examples, where the mapping table can be modified or adjusted, the encoder can modify the mapping table at different levels.
[0374] In step 4205, the processor 4120 can correct each sample of the second component based on the sample offset, which includes the band offset.
[0375] In some examples, the sample offset may include band offset and edge offset, as mentioned earlier.
[0376] Figure 43 is a flowchart illustrating a video encoding method according to an example of the present disclosure.
[0377] In step 4301, the processor 4120 can determine a classifier for each sample of the second component according to the same-position samples of the first component from the video encoder side, and the picture frame may contain one or more components including the first and second components.
[0378] In some examples, the first component may include one of the luma component, the first chromatic component, or the second chromatic component, and the second component may include one of the luma component, the first chromatic component, or the second chromatic component. The luma component may be the Y component, the first chromatic component the U component, and the second chromatic component the V component. The first and second chromatic components are interchangeable.
[0379] In some examples, a co-positional sample of the first component can be obtained by linear weighting from at least one of multiple co-positional and adjacent luma samples, or multiple linearly weighted samples consisting of three components: a luma component, a first chromatic component, and a second chromatic component.
[0380] In some examples, the first component's co-positional sample can be obtained by linearly weighting samples from multiple co-positional and adjacent luma samples, as shown in Figures 20A-20B and 21A-21B. For example, the co-positional luma sample value (Y0) can be replaced with a value (Yp) obtained by weighting the co-positional and adjacent luma samples, as shown in Figures 20A-20B.
[0381] In some examples, the same-position sample of the first component can be obtained by linearly weighting multiple linearly weighted samples for the three components consisting of the luma component, the first chroma component, and the second chroma component. For example, classifier C7 uses the same-position / current and adjacent samples of all three color components to derive the classification of the current component C0 / C3 bandNum. For example, to classify the current U sample, same-position and adjacent Y / V, current and adjacent U samples are used, as shown in Figure 13B.
[0382] In step 4302, the processor 4120 may predefine or signal indicators to the bitstream, and in response to the determination that the classifier is a band classifier or a combined classifier encompassing classifiers, the indicators show one or more bandNum segments of one or more components.
[0383] In some examples, the indicator may be predefined or signaled by the encoder at at least one level. For example, at least one level may include at least one of the following: sequence parameter set (SPS) level, adaptive parameter set (APS) level, picture parameter set (PPS) level, picture header (PH) level, sequence header (SH) level, region level, coded tree unit (CTU) level, subblock level, or sample level.
[0384] In some examples, one or more bandNum segments can be determined according to indicators determined from a mapping table. In some examples, the processor 4120 can pre-define or adjust / modify the mapping table. For example, the mapping table may be a bandNum mapping table as shown in Tables 2-42 and 2-43.
[0385] In some cases, if the mapping table is predefined, both the encoder and decoder can have the mapping table, and it is not necessary to signal the mapping table in the bitstream sent to the decoder. In such situations, neither the encoder nor the decoder needs to modify the mapping table.
[0386] In some other examples, where the mapping table can be modified or adjusted, the encoder can modify the mapping table at different levels.
[0387] In some examples, the luma component, the first chromatic component, and the second chromatic component can use different indicators to represent their respective bandNum segments, as shown in Table 2-42.
[0388] In some examples, at least two of the luma component, the first chroma component, or the second chroma component use the same indicator to show their respective bandNum segments.
[0389] In some examples, chroma transpose components may be used in the mapping table, and a chroma transpose component is the transpose of either the first or second chroma component. For example, as shown in Table 2-43, C represents the current chroma component, which may be either the first or second chroma component, and CT represents the chroma transpose component, which is the transpose of C, i.e., the other chroma component. For example, when C represents the U component, CT represents the V component, and when C represents the V component, CT represents the U component.
[0390] In some examples, the indicator may show one or more bandNum segments of one or more components, including at least one of the following: a luma component, a first chromatic component, a second chromatic component, or a chromatic transpose component related to the first or second chromatic component, as shown in Table 2-42.
[0391] In some examples, a video decoder, after receiving a bitstream encoded by an encoder, can determine one or more bandNum segments according to an indicator, and then determine the band offset according to one or more bandNum segments.
[0392] In some examples, one or more bandNum segments can be determined according to indicators determined from a mapping table.
[0393] In some examples, the mapping table may be predefined or may be adjusted / modified by the encoder. For example, the mapping table may be a bandNum mapping table as shown in Tables 2-42 and 2-43.
[0394] In some examples, the decoder can correct each sample of the second component based on a sample offset that includes a band offset.
[0395] In some embodiments, a non-temporary computer-readable storage medium is also provided, which contains, for example, a memory 4130, a plurality of programs executable by a processor 4120 in a computing environment 4110 for performing the methods described above. In one example, the plurality of programs may be executed by the processor 4120 in the computing environment 4110 to receive a bitstream or data stream containing encoded video information (e.g., video blocks representing encoded video frames, and / or one or more associated syntax elements, etc.) (e.g., from the video encoder 20 in Figure 2), and may also be executed by the processor 4120 in the computing environment 4110 to perform the decoding method described above according to the received bitstream or data stream. In another example, the plurality of programs can be executed by the processor 4120 in the computing environment 4110 to perform the encoding method described above to encode video information (e.g., video blocks representing video frames, and / or one or more syntax elements, etc.) into a bitstream or data stream, and can also be executed by the processor 4120 in the computing environment 4110 to transmit the bitstream or data stream (e.g., to the video decoder 30 in Figure 3). Alternatively, a non-temporary computer-readable storage medium may store a bitstream or datastream containing encoded video information (e.g., video information containing one or more syntactic elements) generated by an encoder (e.g., a video encoder 20 in Figure 2) using the encoding method described above, for use by a decoder (e.g., a video decoder 30 in Figure 3) when decoding video data. Examples of non-temporary computer-readable storage media include ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disks (registered trademark), and optical data storage devices.
[0396] In one embodiment, a computer device is also provided comprising one or more processors (e.g., processor 4120) and a non-temporary computer-readable storage medium or memory 4130 in which a plurality of programs executable by the one or more processors are stored. Here, the one or more processors are configured to execute the method described above when executing a plurality of programs.
[0397] In one embodiment, a computer program product is also provided that includes, for example, a plurality of programs executable by a processor 4120 within a computing environment 4110 to perform the method described above, contained in memory 4130. For example, the computer program product may include a non-temporary computer-readable storage medium.
[0398] In one embodiment, the computing environment 4110 can be implemented by one or more ASICs, DSPs, digital signal processing devices (DSPDs), programmable logic devices (PLDs), FPGAs, GPUs, controllers, microcontrollers, microprocessors, or other electronic components to perform the above method.
[0399] Further embodiments also include various subsets of the above embodiments that are combined with various other embodiments or otherwise reconfigured.
[0400] In one or more examples, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or codes on or transmitted through a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include computer-readable storage media corresponding to tangible media such as data storage media, or communication media including any medium that facilitates the transmission of computer programs from one location to another according to a communication protocol, for example. Thus, the computer-readable medium can generally correspond to (1) non-transient tangible computer-readable storage media, or (2) communication media such as signals or carrier waves. The data storage medium may be any available medium accessible by one or more computers or one or more processors for retrieving instructions, codes, and / or data structures for implementation of the implementation examples described in this application. Computer program products may include computer-readable media.
[0401] The descriptions in this disclosure are presented for illustrative purposes only and are not intended to be exhaustive or limitful. Many modifications, variations, and alternative implementations will be apparent to those skilled in the art who benefit from the teachings presented in the above description and the accompanying drawings.
[0402] Unless otherwise specified, the order of the steps in the methods of this disclosure is illustrative only, and the steps of the methods of this disclosure are not limited to the order specifically described above and may be modified according to actual conditions. Furthermore, at least one of the steps of the methods of this disclosure may be adjusted, combined, or omitted according to practical requirements.
[0403] The examples provided illustrate the principles of the Disclosure and are selected and described to enable those skilled in the art to understand the Disclosure in various embodiments and to make optimal use of the fundamental principles and various embodiments with various modifications to suit specific intended applications. Therefore, it should be understood that the scope of the Disclosure is not limited to the specific embodiments disclosed, and modifications and other implementations are also intended to be included within the scope of the Disclosure.
Claims
1. The decoder receives a picture frame consisting of one or more components, including a first component and a second component. The decoder determines a classifier for each sample of the second component according to the same-position sample of the first component. The decoder, in response to determining whether the classifier is a band classifier or a combined classifier including the band classifier, obtains an indicator showing one or more bandNum segments of one or more components. Determine the one or more bandNum segments according to the indicator, and determine the band offset according to the one or more bandNum segments. The decoder modifies each sample of the second component according to the sample offset including the band offset, Obtain one or more bandNum segments according to the indicator determined from the mapping table. Includes, The one or more bandNum segments are obtained by dividing the range of sample values of the one or more components. The first component includes one of the luma component, the first chromatic component, or the second chromatic component. The second component includes one of the luma component, the first chromatic component, or the second chromatic component. The chromatic transpose component is used in the mapping table, and the chromatic transpose component is the transpose component of the first chromatic component or the second chromatic component. In response to the determination that the first chromatic component is the current component, the transposed chromatic component is the second chromatic component, In response to the determination that the second chromatic component is the current component, the chromatic transposed component is the first chromatic component. How to decrypt a video.
2. The indicator is predefined or signaled by an encoder at at least one level. The method according to claim 1, wherein the at least one level includes at least one of the following: sequence parameter set (SPS) level, adaptive parameter set (APS) level, picture parameter set (PPS) level, picture header (PH) level, sequence header (SH) level, region level, coded tree unit (CTU) level, subblock level, or sample level.
3. The method according to claim 1, wherein the mapping table is predefined or adjusted at at least one level by an encoder.
4. The first component includes one of the luma component, the first chromatic component, or the second chromatic component. The second component includes one of the luma component, the first chromatic component, or the second chromatic component. The luma component, the first chroma component, and the second chroma component may use different indicators to represent their respective bandNum segments, or At least two of the luma component, the first chroma component, or the second chroma component use the same indicator to represent their respective bandNum segments. The method according to claim 1.
5. The method according to claim 1, wherein the indicator shows one or more bandNum segments of one or more components, which include at least one of the luma component, the first chroma component, the second chroma component, or the chroma transpose component related to the first chroma component or the second chroma component.
6. The decoder receives a picture frame consisting of one or more components, including a first component and a second component. The decoder determines a classifier for each sample of the second component according to the same-position sample of the first component. The decoder, in response to determining whether the classifier is a band classifier or a combined classifier including the band classifier, obtains an indicator showing one or more bandNum segments of one or more components. Determine the one or more bandNum segments according to the indicator, and determine the band offset according to the one or more bandNum segments. The decoder modifies each sample of the second component according to the sample offset including the band offset. Includes, The aforementioned sample of the first component at the same location is Multiple identical and adjacent luma samples, or Multiple linearly weighted samples for three components consisting of a luma component, a first chromatic component, and a second chromatic component. Obtained by a linearly weighted sample from at least one of the following: How to decrypt a video.
7. The encoder determines a classifier for each sample of the second component according to the same-position sample of the first component, and the picture frame consists of one or more components including the first component and the second component. In response to determining whether the classifier is a band classifier or a combined classifier encompassing the band classifiers, the encoder pre-defines or signals an indicator in the bitstream, the indicator indicating one or more bandNum segments of the one or more components, The process further includes determining the one or more bandNum segments according to the indicator determined from the mapping table, The aforementioned one or more bandNum segments are obtained by dividing the range of sample values of the one or more components. The first component includes one of the luma component, the first chromatic component, or the second chromatic component. The second component includes one of the luma component, the first chromatic component, or the second chromatic component. The chromatic transpose component is used in the mapping table, and the chromatic transpose component is the transpose component of the first chromatic component or the second chromatic component. In response to the determination that the first chromatic component is the current component, the transposed chromatic component is the second chromatic component, In response to the determination that the second chromatic component is the current component, the chromatic transposed component is the first chromatic component. How to encode video.
8. The indicator is predefined or signaled by an encoder at at least one level. The method according to claim 7, wherein the at least one level includes at least one of the following: sequence parameter set (SPS) level, adaptive parameter set (APS) level, picture parameter set (PPS) level, picture header (PH) level, sequence header (SH) level, region level, coded tree unit (CTU) level, subblock level, or sample level.
9. The method according to claim 7, further comprising predefining the mapping table or adjusting the mapping table at at least one level using the encoder.
10. The first component includes one of the luma component, the first chromatic component, or the second chromatic component. The second component includes one of the luma component, the first chromatic component, or the second chromatic component. The luma component, the first chroma component, and the second chroma component may use different indicators to represent their respective bandNum segments, or At least two of the luma component, the first chroma component, or the second chroma component use the same indicator to represent their respective bandNum segments. The method according to claim 7.
11. The method according to claim 7, wherein the indicator shows one or more bandNum segments of one or more components, the indicator including at least one of the luma component, the first chroma component, the second chroma component, or the chroma transpose component related to the first chroma component or the second chroma component.
12. The encoder determines a classifier for each sample of the second component according to the same-position sample of the first component, and the picture frame consists of one or more components including the first component and the second component. In response to determining whether the classifier is a band classifier or a combined classifier encompassing the band classifiers, the encoder pre-defines or signals an indicator in the bitstream, the indicator indicating one or more bandNum segments of the one or more components. Includes, The same-position sample of the first component is Multiple identical and adjacent luma samples, or Multiple linearly weighted samples consisting of three components: a luma component, a first chromatic component, and a second chromatic component. Obtained by a linearly weighted sample from at least one of the following: How to encode video.
13. One or more processors, A memory coupled to the one or more processors and configured to store instructions that can be executed by the one or more processors, Equipped with, The one or more processors described above are configured to execute the method according to any one of claims 1 to 12 when they execute the instruction. Video encoding device.
14. A method for generating and storing a bitstream, The bitstream is generated by the method described in any one of claims 7 to 12. A method for storing the generated bitstream in a memory device.
15. A method for generating and transmitting a bitstream, The bitstream is generated by the method described in any one of claims 7 to 12. The generated bitstream is sent to the destination device. method.