Adaptive transform selection using primary and secondary characteristics in video coding

Adaptive transform selection in video coding improves efficiency and quality by utilizing intra-prediction type and secondary characteristics to optimize transform selection, addressing limitations in conventional methods.

WO2026135912A1PCT designated stage Publication Date: 2026-06-25QUALCOMM INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
QUALCOMM INC
Filing Date
2025-11-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Conventional video coding approaches face challenges in achieving efficient compression while maintaining visual quality due to limitations in transform selection methods, particularly in block-based prediction and transform coding techniques.

Method used

Adaptive transform selection based on primary and secondary characteristics of video data, including intra-prediction type, to optimize transform selection across different block sizes and prediction types.

Benefits of technology

Enhances compression efficiency and visual quality by dynamically selecting transforms based on data-driven characteristics, reducing bitstream overhead and computational complexity.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure US2025056176_25062026_PF_FP_ABST
    Figure US2025056176_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A method of decoding video data includes determining an intra-prediction type for a current block of video data, deriving a primary characteristic for the current block based at least on the intra-prediction type, and deriving a secondary characteristic based at least in part on the primary characteristic. An inverse transform is selected based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic. The selected inverse transform is applied to a transform block representing residual data to generate residual data, and the current block is reconstructed based on the residual data and a prediction block. The described technique enables adaptive inverse transform selection for improved intra-prediction-based decoding efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Qualcomm Ref. No. 2501625WO 1ADAPTIVE TRANSFORM SELECTION USING PRIMARY AND SECONDARY CHARACTERISTICS IN VIDEO CODINGCLAIM OF PRIORITY

[0001] This application claims the benefit of U.S. Provisional Patent Application No. 63 / 735,830, filed 18 December 2024, the entire contents of which is incorporated herein by reference.TECHNICAL FIELD

[0002] Aspects of the disclosure relate generally to image and video signal processing, including block-based prediction and transform coding.BACKGROUND

[0003] Digital video processing technologies are widely employed across computing, communication, and entertainment systems, including televisions, streaming platforms, mobile devices, and video conferencing systems. These systems utilize video compression techniques to efficiently represent visual information for storage, transmission, and playback.

[0004] Conventional video coding approaches, such as those standardized by the Moving Picture Experts Group (MPEG) and the International Telecommunication Union (ITU-T), rely on block-based prediction and transform coding. In these approaches, a video frame or a portion of a frame is partitioned into blocks that may be predicted from adjacent blocks within the same frame or from reference frames in a sequence. Intra-prediction reduces spatial redundancy by estimating pixel values within a block using reconstructed pixels of neighboring blocks, while inter-prediction reduces temporal redundancy using data from other frames.

[0005] Transform coding techniques convert residual signals produced by prediction into a frequency or transform domain representation to enable quantization and entropy coding. Modem standards, including ITU-T H.265 / HEVC, ITU-T H.266 / VVC, and related coding frameworks, employ multiple transform designs, finer block partitioning, and context-adaptive entropy models to achieve improved compression efficiency while maintaining visual quality.1616-601 WOOlQualcomm Ref. No. 2501625WO 2SUMMARY

[0006] Techniques are described for coding and decoding video data through adaptive transform and inverse-transform selection based on characteristics of a current block of video data. In a decoding example, a decoder determines an intra-prediction type for the current block, derives a primary characteristic for the block based at least on the intraprediction type, and derives a secondary characteristic based at least in part on the primary characteristic. The decoder selects, based on the intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to apply to a transform block representing residual data. The decoder applies the selected inverse transform to the transform block to generate residual data and reconstructs the current block based on the residual data and a prediction block.

[0007] In an encoding example, an encoder determines an intra-prediction type for a current block, derives a primary and secondary characteristic in the same manner, selects a transform based on the determined characteristics, and applies the transform to residual data to generate a transform block representing transformed coefficients. The encoder may output data indicative of the transformed coefficients for inclusion in a coded bitstream.

[0008] These techniques may support adaptive and data-driven transform selection across different block sizes, prediction types, and transform domains. The described operations can be implemented in a video encoder, a video decoder, or other processing circuitry configured for block-based video coding.

[0009] According to one example, a method of decoding video data includes determining an intra-prediction type for a current block of video data. In one example, the method includes deriving a primary characteristic for the current block based at least on the intra-prediction type. According to certain examples, the method includes deriving a secondary characteristic for the current block based at least in part on the primary characteristic. In at least one example, the method includes selecting, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data. According to such examples, the method includes applying the inverse transform to the transform block to generate the residual data. In a particular example, the method includes reconstructing the current block based on residual data and a prediction block.1616-601 WOOlQualcomm Ref. No. 2501625WO 3

[0010] According to another example, a decoder apparatus for decoding video data includes a memory configured to store video data and processing circuitry in communication with the memory. In one example, the decoder apparatus includes processing circuitry configured to determine an intra-prediction type for a current block of video data. According to certain examples, the decoder apparatus includes processing circuitry configured to derive a primary characteristic for the current block based at least on the intra-prediction type. In at least one example, the decoder apparatus includes processing circuitry configured to derive a secondary characteristic for the current block based at least in part on the primary characteristic. According to such examples, the decoder apparatus includes processing circuitry configured to select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data. In a particular example, the decoder apparatus includes processing circuitry configured to apply the inverse transform to the transform block to generate residual data. According to yet another example, the decoder apparatus includes processing circuitry configured to reconstruct the current block based on residual data and a prediction block.

[0011] According to yet another example, an encoder apparatus for coding video data includes a memory configured to store video data and processing circuitry in communication with the memory. In one example, the encoder apparatus includes processing circuitry configured to determine an intra-prediction type for a current block of video data. According to certain examples, the encoder apparatus includes processing circuitry configured to derive a primary characteristic for the current block based at least on the intra-prediction type. In at least one example, the encoder apparatus includes processing circuitry configured to derive a secondary characteristic for the current block based at least in part on the primary characteristic. According to such examples, the encoder apparatus includes processing circuitry configured to select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, a transform to be applied to residual data of the current block. In a particular example, the encoder apparatus includes processing circuitry configured to apply the transform to the residual data to generate a transform block representing transformed coefficients for coding. According to yet another example, the encoder apparatus includes processing circuitry configured to output data indicative of the transformed coefficients.1616-601 WOOlQualcomm Ref. No. 2501625WO 4

[0012] In a particular example, there is a device which includes means for determining an intra-prediction type for a current block of video data. According to one example, the device includes means for deriving a primary characteristic for the current block based at least on the intra-prediction type. In one example, the device includes means for deriving a secondary characteristic for the current block based at least in part on the primary characteristic. According to certain examples, the device includes means for selecting, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data. In at least one example, the device includes means for applying the inverse transform to the transform block to generate the residual data. According to such examples, the device includes means for reconstructing the current block based on residual data and a prediction block.

[0013] The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.BRIEF DESCRIPTION OF DRAWINGS

[0014] FIG. l is a block diagram illustrating an example video encoding and decoding system, in accordance with aspects of the disclosure.

[0015] FIG. 2 is a block diagram illustrating an example video encoder, in accordance with aspects of the disclosure.

[0016] FIG. 3 is a block diagram illustrating an example video decoder, in accordance with aspects of the disclosure.

[0017] FIG. 4 is a flowchart illustrating an example method for encoding a current block, in accordance with aspects of the disclosure.

[0018] FIG. 5 is a flowchart illustrating an example method for decoding a current block of video data, in accordance with aspects of the disclosure.

[0019] FIG. 6 is a flow diagram illustrating an example method for decoding video data, in accordance with aspects of this disclosure.

[0020] FIG. 7 is a flow diagram illustrating an example method for encoding video data, in accordance with aspects of this disclosure.1616-601 WOOlQualcomm Ref. No. 2501625WO 5DETAILED DESCRIPTION

[0021] Techniques are described for coding (encoding or decoding) video data through adaptive transform selection based on characteristics of a current video block. In video encoding, a video encoder determines a prediction block for a current block and determines residual data based on a difference between the prediction block and the current block. The video encoder applies a transform to transform the residual data, in the sample domain, into a transform block, in the frequency or transform domain. The example techniques described in this disclosure relate to a manner in which the video encoder may select the transform that is applied.

[0022] The video encoder may entropy-encode coefficients of the transform block and signal information indicative of the encoded coefficients in a bitstream. A video decoder may receive the bitstream and determine a prediction block using the same techniques as the video encoder. The video decoder may receive the information indicative of the encoded coefficients and decode the encoded coefficients to generate the transform block. The video decoder applies the selected inverse transform to the transform block to generate residual data in the sample domain. The example techniques described in this disclosure relate to a manner in which the video decoder may select the inverse transform that is applied. The video decoder may add the residual data to the prediction block to reconstruct the current block.

[0023] For brevity, this disclosure describes techniques adaptive transform selection. It should be understood that such disclosure, from the perspective of the video encoder, refers to selecting a transform. From the perspective of the video decoder, adaptive transform selection refers to selecting the inverse transform.

[0024] Examples of the characteristics used to select the transform (e.g., transform for the video encoder and inverse transform for the video decoder) may be based on characteristics such as intra-prediction time, a primary characteristic, and a secondary characteristic. In some examples, the video coder (e.g., video encoder or video decoder) may determine a measure of a difference between the primary characteristic and the second characteristic to select the transform.

[0025] Examples of the primary characteristic and the secondary characteristic are described in more detail below. Examples of the intra-prediction type include angular modes, DC, planar, etc. where samples neighboring the current block in the same picture as the current block are used to generate the prediction block.1616-601 WOOlQualcomm Ref. No. 2501625WO 6

[0026] These techniques may support adaptive and data-driven transform selection across different block sizes, prediction types, and transform domains. The described operations can be implemented in a video encoder, a video decoder, or other processing circuitry configured for block-based video coding.

[0027] Video coding operations described herein may be performed in connection with a variety of electronic devices, such as televisions, streaming devices, mobile terminals, cameras, and computing systems. These devices may conform to or extend existing coding standards, including MPEG-2, MPEG-4, H.264 / AVC, H.265 / HEVC, H.266 / VVC, or AOMedia Video 1 (AVI).

[0028] In block-based coding architectures, a picture or a portion of a picture may be divided into blocks, such as coding tree units (CTUs) or coding units (CUs). Spatial prediction (intra-prediction) may estimate samples of a current block from reconstructed samples of neighboring blocks within the same picture, while temporal prediction (interprediction) may use reference samples from other pictures. These prediction and transform operations may be executed by dedicated hardware, programmable circuitry, or software instructions.

[0029] FIG. l is a block diagram illustrating an example video encoding and decoding system 100, in accordance with aspects of the disclosure. The techniques described herein are generally directed to coding (e.g., encoding and / or decoding) video data. In general, video data includes any data used in the processing, transmission, or storage of a video. Video data may therefore encompass raw, unencoded video, encoded video, decoded (e.g., reconstructed) video, and video metadata, such as syntax or signaling information generated during encoding and decoding operations.

[0030] As shown in FIG. 1, system 100 includes source device 102 and destination device 116. Source device 102 provides encoded video data to destination device 116, which decodes and displays the video data. Source device 102 provides the video data to destination device 116 via computer-readable medium 110. The medium 110 may represent a communication medium, a storage medium, or a combination thereof. Source device 102 and destination device 116 may each include or represent any of a wide range of devices, including desktop computers, notebook or laptop computers, tablet computers, smartphones, set-top boxes, televisions, video cameras, digital media players, video game consoles, video streaming devices, broadcast receivers, or other multimedia devices. In some cases, source device 102 and destination device 116 may1616-601 WOOlQualcomm Ref. No. 2501625WO 7 be equipped for wireless communication and may therefore also be referred to as wireless communication devices.

[0031] In the example of FIG. 1, source device 102 includes video source 104, memory 106, video encoder 200, processing circuitry 202 A, and output interface 108. Destination device 116 includes input interface 122, video decoder 300, processing circuitry 202B, memory 120, and display device 118. In accordance with aspects of this disclosure, video encoder 200 of source device 102 and video decoder 300 of destination device 116 are configured to apply techniques for implicit multiple transform selection. Thus, source device 102 represents an example of a video encoding device, while destination device 116 represents an example of a video decoding device. In other examples, source device 102 and destination device 116 may include additional components or alternative arrangements. For example, source device 102 may obtain video data from an external capture device or a connected peripheral camera, while destination device 116 may drive an external monitor or projection system rather than include an integrated display.

[0032] System 100 as shown in FIG. 1 is merely one example implementation. Any digital video encoding and / or decoding system may perform the described techniques for implicit multiple transform selection. Source device 102 and destination device 116 are illustrative examples of coding devices in which source device 102 generates coded video data for transmission to destination device 116. A “coding” device refers to a device that performs encoding and / or decoding of data. Thus, video encoder 200 and video decoder 300 represent coding components. Specifically, an encoder and a decoder, respectively. In some implementations, source device 102 and destination device 116 may operate symmetrically such that each device includes both encoding and decoding functionality, enabling two-way video transmission or conferencing between the devices. System 100 can therefore support one-way or two-way communication for applications such as video streaming, playback, broadcasting, and interactive video telephony.

[0033] Video source 104 represents a source of video data, such as raw unencoded picture data, and provides a sequential series of pictures, sometimes referred to as frames, to video encoder 200. Video source 104 may include an image sensor or camera, a video archive containing previously captured video, or a live video feed interface connected to a video content provider. In other examples, video source 104 may generate computer graphics-based imagery or composite video content that includes1616-601 WOOlQualcomm Ref. No. 2501625WO 8 both live and computer-generated imagery. Video encoder 200 encodes this input data. For instance, processing circuitry 202A may perform block partitioning, prediction, transform, quantization, entropy coding, and other standard or proprietary video coding operations. Video encoder 200 may also reorder pictures from the received display order into a coding order to improve compression efficiency. The resulting bitstream of encoded video data may be output from source device 102 via output interface 108 for transmission, storage, or both.

[0034] Output interface 108 transmits the encoded video data from source device 102 to destination device 116 via computer-readable medium 110. The medium 110 may represent a communication medium, a storage medium, or a combination of both. For example, in real-time applications, computer-readable medium 110 may correspond to a wired or wireless communication channel through which output interface 108 modulates a signal carrying the encoded video data and input interface 122 demodulates the received signal. In other implementations, computer-readable medium 110 may correspond to a tangible storage or distribution medium, such as storage device 112, onto which source device 102 records the encoded video data for later retrieval. In yet other implementations, computer-readable medium 110 may include a networked intermediate device such as file server 114, where source device 102 uploads the encoded video data and destination device 116 subsequently accesses the stored data by streaming or download. Thus, computer-readable medium 110 generically encompasses any physical or logical medium capable of transporting encoded video data between source device 102 and destination device 116, whether directly or through one or more intermediate storage or network components.

[0035] Memory 106 of source device 102 and memory 120 of destination device 116 represent general-purpose memory components, which may include random access memory (RAM), read-only memory (ROM), flash memory, or other types of volatile or non-volatile storage. In some implementations, memory 106 and memory 120 store raw, encoded, and / or decoded video data. Additionally or alternatively, these memories may store software or firmware instructions executable by processing circuitry 202A and processing circuitry 202B, respectively. Although memory 106 and memory 120 are shown separately from video encoder 200 and video decoder 300 in FIG. 1, each of video encoder 200 and video decoder 300 may include local or embedded memory resources used for similar purposes, such as buffering intermediate video data or storing lookup tables. Portions of memory 106 and memory 120 may be allocated as frame1616-601 WOOlQualcomm Ref. No. 2501625WO 9 buffers, circular buffers, or line stores for holding uncompressed, reconstructed, or encoded video samples.

[0036] Processing circuitry 202A of video encoder 200 and processing circuitry 202B of video decoder 300 each represent one or more processors, such as a central processing unit (CPU), digital signal processor (DSP), graphics processing unit (GPU), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or any combination thereof. Processing circuitry 202A and processing circuitry 202B may operate under control of instructions stored in the respective memories 106 and 120 to perform the coding techniques described in this disclosure. For instance, processing circuitry 202A may determine intra-prediction types, derive primary and secondary characteristics for blocks of video data, select transforms based on those characteristics, and apply the selected transforms to residual data to generate transform coefficients. Processing circuitry 202B may perform inverse transforms and prediction operations corresponding to the encoding process. The processing circuitry may also perform motion compensation, quantization, inverse quantization, filtering, rate control, and entropy decoding, depending on whether the device is operating in an encoding or decoding mode.

[0037] Computer-readable medium 110 may represent any medium capable of transporting encoded video data from source device 102 to destination device 116. In some embodiments, computer-readable medium 110 represents a communication medium that enables real-time transmission of video data, such as through a wired or wireless communication network. Output interface 108 may modulate a signal carrying the encoded video data, while input interface 122 demodulates the received signal. The communication medium may include a radio-frequency (RF) spectrum, optical fiber, or physical cabling. The communication may occur over a packet-based network such as a local area network (LAN), wide area network (WAN), or the Internet, potentially involving routers, switches, base stations, or other network components.

[0038] In other embodiments, computer-readable medium 110 may represent a tangible storage or distribution medium, such as optical discs, flash drives, or other non-volatile storage media. Source device 102 may output encoded video data via output interface 108 to storage device 112, and destination device 116 may subsequently access that data via input interface 122. Storage device 112 may include hard disk drives, Blu-ray discs, digital versatile discs (DVDs), compact discs (CD-ROMs), flash memory, or other digital storage media.1616-601 WOOlQualcomm Ref. No. 2501625WO 10

[0039] In some implementations, source device 102 may provide encoded video data to file server 114 or another network-accessible intermediate storage device. Destination device 116 may retrieve stored encoded data from file server 114 through streaming or download. File server 114 may represent a web server, content delivery network (CDN) node, or other networked data host that transmits encoded video data to destination device 116 upon request. File server 114 may support standard network transport or media streaming protocols such as File Transfer Protocol (FTP), File Delivery over Unidirectional Transport (FLUTE), Hypertext Transfer Protocol (HTTP), Dynamic Adaptive Streaming over HTTP (DASH), HTTP Live Streaming (HLS), Real-Time Streaming Protocol (RTSP), or other similar protocols.

[0040] Destination device 116 may access encoded video data through various types of network connections. These may include wireless connections such as Wi-Fi, cellular, or satellite channels, wired connections such as digital subscriber line (DSL) or cable, or hybrid configurations. Input interface 122 may support multiple communication protocols to retrieve or receive encoded media data from file server 114 or other content sources.

[0041] Output interface 108 and input interface 122 may include wired or wireless transceivers, modems, or networking components such as Ethernet controllers or radio transceivers. In cases where these interfaces include wireless components, they may support cellular standards such as fourth-generation (4G), Long-Term Evolution (LTE), LTE-Advanced, fifth-generation (5G), or later systems. They may alternatively or additionally conform to wireless standards such as IEEE 802.11 (Wi-Fi), IEEE 802.15 (ZigBee or similar), or Bluetooth specifications. Source device 102 and destination device 116 may each incorporate a system-on-a-chip (SoC) device that integrates video encoder 200, video decoder 300, processing circuitry 202A, processing circuitry 202B, memory, and interface components within a single substrate.

[0042] The techniques described herein may be applied to a wide range of video coding applications, including over-the-air television broadcast, cable transmission, satellite distribution, internet streaming, adaptive HTTP -based streaming, and physical media encoding or decoding. System 100 may support these use cases by encoding or decoding video data compliant with industry standards such as MPEG-2, MPEG-4, H.264 / AVC, H.265 / HEVC, H.266 / VVC, and AOMedia Video 1 (AVI), or with proprietary or experimental codecs.1616-601 WOOlQualcomm Ref. No. 2501625WO 11

[0043] Input interface 122 of destination device 116 receives an encoded video bitstream from computer-readable medium 110, whether through real-time communication, access to storage device 112, or retrieval from file server 114. The encoded bitstream may include syntax information generated by video encoder 200 and used by video decoder 300, such as values describing block prediction modes, transform indices, quantization parameters, or other coding metadata. Processing circuitry 202B of video decoder 300 may parse this information and perform inverse transforms, prediction reconstruction, deblocking, and sample adaptive filtering to regenerate decoded video frames. Memory 120 may store the decoded frames for display. Display device 118 presents the reconstructed video data to a viewer and may be implemented as a liquid crystal display (LCD), plasma display, organic light-emitting diode (OLED) display, micro-LED panel, projection system, or any comparable visual output component.

[0044] In summary, FIG. 1 illustrates an example video coding system 100 that includes source device 102 and destination device 116 interconnected through computer-readable medium 110. Video encoder 200 with processing circuitry 202A encodes video data from video source 104, and video decoder 300 with processing circuitry 202B decodes the resulting bitstream for display on display device 118. The arrangement demonstrates how the described techniques for adaptive and implicit transform selection can be implemented within a variety of hardware or software configurations supporting blockbased video coding.

[0045] Although not shown in FIG. 1, in some examples, video encoder 200 (and associated processing circuitry 202 A) and video decoder 300 (and associated processing circuitry 202B) may each be integrated with an audio encoder and / or audio decoder (e.g., audio codec), and may include appropriate MUX-DEMUX units, or other hardware and / or software, to handle multiplexed streams including both audio and video in a common data stream. Example audio codecs may include AAC, AC-3, AC-4, ALAC, ALS, AMBE, AMR, AMR-WB (G.722.2), AMR-WB+, aptx (various versions), ATRAC, BroadVoice (BV16, BV32), CELT, Enhanced AC-3 (E-AC-3), EVS, FLAC, G.711, G.722, G.722.1, G.722.2 (AMR-WB). G.723.1, G.726, G.728, G.729, G.729.1, GSM-FR, HE-AAC, iLBC, iSAC, LA Lyra, Monkey's Audio, MP1, MP2 (MPEG-1, 2 Audio Layer II), MP3, Musepack, Nellymoser Asao, OptimFROG, Opus, Sac, Satin, SBC, SILK, Siren 7, Speex, SVOPC, True Audio (TTA), TwinVQ, USAC, Vorbis (Ogg), WavPack, and Windows Media Aud.1616-601 WOOlQualcomm Ref. No. 2501625WO 12

[0046] Video encoder 200 and video decoder 300 each may be implemented as any of a variety of suitable encoder and / or decoder circuitry that includes a processing system, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented partially in software, a device may store instructions for the software in a suitable, non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 200 and video decoder 300 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder / decoder (CODEC) in a respective device. A device including video encoder 200 and / or video decoder 300 may implement video encoder 200 and / or video decoder 300 in processing circuitry such as an integrated circuit and / or a microprocessor. Such a device may be a wireless communication device, such as a cellular telephone, or any other type of device described herein.

[0047] The techniques described herein, including operations such as determining, deriving, and selecting, are performed by the processing circuitry of video encoder 200 and video decoder 300 as part of the transform and prediction control logic. These operations are implemented using dedicated hardware circuitry, programmable logic, or processor-executed instructions stored in memory. The described methods therefore provide a concrete technological improvement to the functioning of video coding systems by adaptively selecting transforms without explicit signaling, reducing bitstream overhead and computational complexity during encoding and decoding. The described processes are thus tied to specific hardware configurations and improve the performance and efficiency of such machines.

[0048] Video encoder 200 and video decoder 300 may operate according to a video coding standard, such as ITU-T H.265, also referred to as High Efficiency Video Coding (HEVC), or extensions thereto, such as multi-view and / or scalable video coding extensions. Alternatively, video encoder 200 and video decoder 300 may operate according to other proprietary or industry standards, such as ITU-T H.266, also referred to as Versatile Video Coding (VVC). In other examples, video encoder 200 and video decoder 300 may operate according to a proprietary video codec or format, such as AOMedia Video 1 (AVI), extensions of AVI, and / or successor versions of AVI (e.g., AV2). In other examples, video encoder 200 and video decoder 300 may operate1616-601 WOOlQualcomm Ref. No. 2501625WO 13 according to other proprietary formats or industry standards. The techniques of this disclosure, however, are not limited to any particular coding standard or format and may be employed by processing circuitry 202A and 202B to perform implicit multiple transform selection during decoding and encoding, as described in further detail with respect to FIG. 6 and FIG. 7, respectively.

[0049] In general, video encoder 200 and video decoder 300 may perform block-based coding of pictures. The term “block” generally refers to a structure including data to be processed (e.g., encoded, decoded, or otherwise used in the encoding and / or decoding process). For example, a block may include a two-dimensional matrix of samples of luminance and / or chrominance data. In general, video encoder 200 and video decoder 300 may code video data represented in a YUV (e.g., Y, Cb, Cr) format. That is, rather than coding red, green, and blue (RGB) data for samples of a picture, video encoder 200 and video decoder 300 may code luminance and chrominance components, where the chrominance components may include both red hue and blue hue chrominance components. In some examples, video encoder 200 converts received RGB formatted data to a YUV representation prior to encoding, and video decoder 300 converts the YUV representation to the RGB format. Alternatively, pre- and post-processing units (not shown) may perform these conversions.

[0050] This disclosure may generally refer to coding (e.g., encoding and decoding) of pictures to include the process of encoding or decoding data of the picture. Similarly, this disclosure may refer to coding of blocks of a picture to include the process of encoding or decoding data for the blocks, e.g., prediction and / or residual coding. An encoded video bitstream generally includes a series of values for syntax elements representative of coding decisions (e.g., coding modes) and partitioning of pictures into blocks. Thus, references to coding a picture or a block should generally be understood as coding values for syntax elements forming the picture or block.

[0051] HEVC defines various blocks, including coding units (CUs), prediction units (PUs), and transform units (TUs). According to HEVC, a video coder (such as video encoder 200) partitions a coding tree unit (CTU) into CUs according to a quadtree structure. That is, the video coder partitions CTUs and CUs into four equal, nonoverlapping squares, and each node of the quadtree has either zero or four child nodes. Nodes without child nodes may be referred to as “leaf nodes,” and CUs of such leaf nodes may include one or more PUs and / or one or more TUs. The video coder may further partition PUs and TUs. For example, in HEVC, a residual quadtree (RQT)1616-601 WOOlQualcomm Ref. No. 2501625WO 14 represents partitioning of TUs. In HEVC, PUs represent inter-prediction data, while TUs represent residual data. CUs that are intra-predicted include intra-prediction information, such as an intra-mode indication.

[0052] As another example, video encoder 200 and video decoder 300 may be configured to operate according to VVC. According to VVC, a video coder (such as video encoder 200) partitions a picture into a plurality of CTUs. Video encoder 200 may partition a CTU according to a tree structure, such as a quadtree-binary tree (QTBT) structure or Multi-Type Tree (MTT) structure. The QTBT structure removes the concepts of multiple partition types, such as the separation between CUs, PUs, and TUs of HEVC. A QTBT structure includes two levels: a first level partitioned according to quadtree partitioning, and a second level partitioned according to binary tree partitioning. A root node of the QTBT structure corresponds to a CTU. Leaf nodes of the binary trees correspond to CUs.

[0053] In an MTT partitioning structure, blocks may be partitioned using a quadtree (QT) partition, a binary tree (BT) partition, and one or more types of triple tree (TT) (also called ternary tree (TT)) partitions. A triple or ternary tree partition is a partition where a block is split into three sub-blocks. In some examples, a triple or ternary tree partition divides a block into three sub-blocks without dividing the original block through the center. The partitioning types in MTT (e.g., QT, BT, and TT), may be symmetrical or asymmetrical.

[0054] When operating according to the AVI codec, video encoder 200 and video decoder 300 may be configured to code video data in blocks. In AVI, the largest coding block that can be processed is called a superblock. In AVI, a superblock can be either 128x128 luma samples or 64x64 luma samples. However, in successor video coding formats (e.g., AV2), a superblock may be defined by different (e.g., larger) luma sample sizes. In some examples, a superblock is the top level of a block quadtree. Video encoder 200 may further partition a superblock into smaller coding blocks. Video encoder 200 may partition a superblock and other coding blocks into smaller blocks using square or non-square partitioning. Non-square blocks may include N / 2xN, NxN / 2, N / 4xN, and NxN / 4 blocks. Video encoder 200 and video decoder 300 may perform separate prediction and transform processes on each of the coding blocks.

[0055] AVI also defines a tile of video data. A tile is a rectangular array of superblocks that may be coded independently of other tiles. That is, video encoder 200 and video decoder 300 may encode and decode, respectively, coding blocks within a tile without1616-601 WOOlQualcomm Ref. No. 2501625WO 15 using video data from other tiles. However, video encoder 200 and video decoder 300 may perform filtering across tile boundaries. Tiles may be uniform or non-uniform in size. Tile-based coding may enable parallel processing and / or multi-threading for encoder and decoder implementations.

[0056] In some examples, video encoder 200 and video decoder 300 may use a single QTBT or MTT structure to represent each of the luminance and chrominance components, while in other examples, video encoder 200 and video decoder 300 may use two or more QTBT or MTT structures, such as one QTBT / MTT structure for the luminance component and another QTBT / MTT structure for both chrominance components (or two QTBT / MTT structures for respective chrominance components).

[0057] Video encoder 200 and video decoder 300 may be configured to use quadtree partitioning, QTBT partitioning, MTT partitioning, superblock partitioning, or other partitioning structures.

[0058] In some examples, a CTU includes a coding tree block (CTB) of luma samples, two corresponding CTBs of chroma samples of a picture that has three sample arrays, or a CTB of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples. A CTB may be an NxN block of samples for some value of N such that the division of a component into CTBs is a partitioning. A component is an array or single sample from one of the three arrays (luma and two chroma) that compose a picture in 4:2:0, 4:2:2, or 4:4:4 color format or the array or a single sample of the array that compose a picture in monochrome format. In some examples, a coding block is an MxN block of samples for some values of M and N such that a division of a CTB into coding blocks is a partitioning.

[0059] The blocks (e.g., CTUs or CUs) may be grouped in various ways in a picture. As one example, a brick may refer to a rectangular region of CTU rows within a particular tile in a picture. A tile may be a rectangular region of CTUs within a particular tile column and a particular tile row in a picture. A tile column refers to a rectangular region of CTUs having a height equal to the height of the picture and a width specified by syntax elements (e.g., such as in a picture parameter set). A tile row refers to a rectangular region of CTUs having a height specified by syntax elements (e.g., such as in a picture parameter set) and a width equal to the width of the picture.

[0060] In some examples, a tile may be partitioned into multiple bricks, each of which may include one or more CTU rows within the tile. A tile that is not partitioned into1616-601 WOOlQualcomm Ref. No. 2501625WO 16 multiple bricks may also be referred to as a brick. However, a brick that is a true subset of a tile may not be referred to as a tile. The bricks in a picture may also be arranged in a slice. A slice may be an integer number of bricks of a picture that may be exclusively contained in a single network abstraction layer (NAL) unit. In some examples, a slice includes either a number of complete tiles or only a consecutive sequence of complete bricks of one tile.

[0061] This disclosure may use “NxN” and “N by N” interchangeably to refer to the sample dimensions of a block (such as a CU or other video block) in terms of vertical and horizontal dimensions, e.g., 16x16 samples or 16 by 16 samples. In general, a 16x16 CU will have 16 samples in a vertical direction (y = 16) and 16 samples in a horizontal direction (x = 16). Likewise, an NxN CU generally has N samples in a vertical direction and N samples in a horizontal direction, where N represents a nonnegative integer value. The samples in a CU may be arranged in rows and columns. Moreover, CUs need not necessarily have the same number of samples in the horizontal direction as in the vertical direction. For example, CUs may include NxM samples, where M is not necessarily equal to N.

[0062] Video encoder 200 encodes video data for CUs representing prediction and / or residual information, and other information. The prediction information indicates how the CU is to be predicted in order to form a prediction block for the CU. The residual information generally represents sample-by-sample differences between samples of the CU prior to encoding and the prediction block.

[0063] To predict a CU, video encoder 200 may generally form a prediction block for the CU through inter-prediction or intra-prediction. Inter-prediction generally refers to predicting the CU from data of a previously coded picture, whereas intra-prediction generally refers to predicting the CU from previously coded data of the same picture. To perform inter-prediction, video encoder 200 may generate the prediction block using one or more motion vectors. Video encoder 200 may generally perform a motion search to identify a reference block that closely matches the CU, e.g., in terms of differences between the CU and the reference block. Video encoder 200 may calculate a difference metric using a sum of absolute difference (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared differences (MSD), or other such difference calculations to determine whether a reference block closely matches the current CU. In some examples, video encoder 200 may predict the current CU using uni-directional prediction or bi-directional prediction.1616-601 WOOlQualcomm Ref. No. 2501625WO 17

[0064] Some examples of VVC also provide an affine motion compensation mode, which may be considered an inter-prediction mode. In affine motion compensation mode, video encoder 200 may determine two or more motion vectors that represent non- translational motion, such as zoom in or out, rotation, perspective motion, or other irregular motion types.

[0065] To perform intra-prediction, video encoder 200 may select an intra-prediction mode to generate the prediction block. Some examples of VVC provide sixty-seven intra-prediction modes, including various directional modes, as well as planar mode and DC mode. In general, video encoder 200 selects an intra-prediction mode that describes neighboring samples to a current block (e.g., a block of a CU) from which to predict samples of the current block. Such samples may generally be above, above and to the left, or to the left of the current block in the same picture as the current block, assuming video encoder 200 codes CTUs and CUs in raster scan order (left to right, top to bottom).

[0066] Video encoder 200 encodes data representing the prediction mode for a current block. For example, for inter-prediction modes, video encoder 200 may encode data representing which of the various available inter-prediction modes is used, as well as motion information for the corresponding mode. For uni-directional or bi-directional inter-prediction, for example, video encoder 200 may encode motion vectors using advanced motion vector prediction (AMVP) or merge mode. Video encoder 200 may use similar modes to encode motion vectors for affine motion compensation mode.

[0067] AVI includes two general techniques for encoding and decoding a coding block of video data. The two general techniques are intra prediction (e.g., intra frame prediction or spatial prediction) and inter prediction (e.g., inter frame prediction or temporal prediction). In the context of AVI, when predicting blocks of a current frame of video data using an intra prediction mode, video encoder 200 and video decoder 300 do not use video data from other frames of video data. For most intra prediction modes, video encoder 200 encodes blocks of a current frame based on the difference between sample values in the current block and predicted values generated from reference samples in the same frame. Video encoder 200 determines predicted values generated from the reference samples based on the intra prediction mode.

[0068] Following prediction, such as intra-prediction or inter-prediction of a block, video encoder 200 may calculate residual data for the block. The residual data, such as a residual block, represents sample by sample differences between the block and a1616-601 WOOlQualcomm Ref. No. 2501625WO 18 prediction block for the block, formed using the corresponding prediction mode. Video encoder 200 may apply one or more transforms to the residual block, to produce transformed data in a transform domain instead of the sample domain. For example, video encoder 200 may apply a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to residual video data. Additionally, video encoder 200 may apply a secondary transform following the first transform, such as a mode-dependent non-separable secondary transform (MDNSST), a signal dependent transform, a Karhunen-Loeve transform (KLT), or the like. Video encoder 200 produces transform coefficients following application of the one or more transforms.

[0069] As noted above, following any transforms to produce transform coefficients, video encoder 200 may perform quantization of the transform coefficients. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the transform coefficients, providing further compression. By performing the quantization process, video encoder 200 may reduce the bit depth associated with some or all of the transform coefficients. For example, video encoder 200 may round an zz-bit value down to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, video encoder 200 may perform a bitwise right-shift of the value to be quantized.

[0070] Following quantization, video encoder 200 may scan the transform coefficients, producing a one-dimensional vector from the two-dimensional matrix including the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) transform coefficients at the front of the vector and to place lower energy (and therefore higher frequency) transform coefficients at the back of the vector. In some examples, video encoder 200 may utilize a predefined scan order to scan the quantized transform coefficients to produce a serialized vector, and then entropy encode the quantized transform coefficients of the vector. In other examples, video encoder 200 may perform an adaptive scan. After scanning the quantized transform coefficients to form the one-dimensional vector, video encoder 200 may entropy encode the one-dimensional vector, e.g., according to context-adaptive binary arithmetic coding (CABAC). Video encoder 200 may also entropy encode values for syntax elements describing metadata associated with the encoded video data for use by video decoder 300 in decoding the video data.1616-601 WOOlQualcomm Ref. No. 2501625WO 19

[0071] To perform CAB AC, video encoder 200 may assign a context within a context model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values of the symbol are zero-valued or not. The probability determination may be based on a context assigned to the symbol.

[0072] Video encoder 200 may further generate syntax data, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data, to video decoder 300, e.g., in a picture header, a block header, a slice header, or other syntax data, such as a sequence parameter set (SPS), picture parameter set (PPS), or video parameter set (VPS). Video decoder 300 may likewise decode such syntax data to determine how to decode corresponding video data.

[0073] In this manner, video encoder 200 may generate a bitstream including encoded video data, e.g., syntax elements describing partitioning of a picture into blocks (e.g., CUs) and prediction and / or residual information for the blocks. Ultimately, video decoder 300 may receive the bitstream and decode the encoded video data.

[0074] In general, video decoder 300 performs a reciprocal process to that performed by video encoder 200 to decode the encoded video data of the bitstream. For example, video decoder 300 may decode values for syntax elements of the bitstream using CABAC in a manner substantially similar to, albeit reciprocal to, the CABAC encoding process of video encoder 200. The syntax elements may define partitioning information for partitioning of a picture into CTUs, and partitioning of each CTU according to a corresponding partition structure, such as a QTBT structure, to define CUs of the CTU. The syntax elements may further define prediction and residual information for blocks (e.g., CUs) of video data.

[0075] The residual information may be represented by, for example, quantized transform coefficients. Video decoder 300 may inverse quantize and inverse transform the quantized transform coefficients of a block to reproduce a residual block for the block. In corresponding fashion, the video encoder generates the residual data, applies the transform to produce the transform block, and signals information indicative of the resulting transform coefficients. Video decoder 300 uses a signaled prediction mode (intra- or inter-prediction) and related prediction information (e.g., motion information for inter-prediction) to form a prediction block for the block. Video decoder 300 may then combine the prediction block and the residual block (on a sample-by-sample basis) to reproduce the original block. Video decoder 300 may perform additional processing,1616-601 WOOlQualcomm Ref. No. 2501625WO 20 such as performing a deblocking process to reduce visual artifacts along boundaries of the block.

[0076] Any of the video encoding or video decoding processes described above may be performed using a neural network (NN). Additionally or alternatively, a neural network may be trained to efficiently compress video data without necessarily separately performing prediction and residual coding. Embedding neural networks into the hybrid video coding framework of video encoder 200 and video decoder 300 can improve compression efficiency. Neural networks may be used for intra prediction and inter prediction to improve the prediction efficiency. NN-based in-loop filtering and / or postfiltering have also performed well in heuristic testing.

[0077] For example, video encoder 200 and video decoder may use one or more NN- based filters for existing filters, such as deblocking filters, sample adaptive offset (SAO), and / or adaptive loop filtering (ALF). NN-based filters can also be applied exclusively, where NN-based filters are designed to replace all of the existing filters. Additionally or alternatively, NN-based filters may be designed to supplement, enhance, or replace any or all of the other filters.

[0078] In some examples, an NN-based filter may be a convolutional neural network (CNN)-based filter with multiple layers. An NN-based filtering process may take reconstructed samples as inputs, and may add the intermediate outputs back to the inputs to refine the input samples. The NN-based filter may use all color components (e.g., Y, U, and V, or Y, Cb, and Cr) as inputs 172 to exploit cross-component correlations. Different color components may share the same filters (including network structure and model parameters) or each component may have its own specific filters.

[0079] The filtering process can also be generalized as follows:7?'(bj) = R(i,j) + NN_filter_residual_ouput R)Here, R(i, j) represents a reconstructed sample at position (i, j) in the picture, R’(i, j) represents the filtered version of the reconstructed sample, and NN_filter_residaul_output(R) represents the intermediate samples discussed above that are calculated by the NN filter. The model structure and model parameters of NN-based filter(s) can be pre-defined and be stored at video encoder 200 and video decoder 300. The filters can also be signaled in the bitstream.

[0080] In some examples, an NN-based filter may include a series of feature extraction layers, followed by an output convolution. The feature extraction layers may include a 3x3 convolution (conv) layer followed by a parametric rectified linear unit (PReLU)1616-601 WOOlQualcomm Ref. No. 2501625WO 21 layer. The convolutional layer applies a convolution operation to the input data, which involves a filter or kernel processing the input data (e.g., the reconstruction samples) in a sliding window fashion and computing dot products at each position. The convolution operation essentially captures local patterns within the input data. For example, in the context of image processing, these patterns could be edges, textures, or other visual features. The filter or kernel is a small matrix of weights that gets updated during the training process. By sliding this filter across the input data (or feature map from a previous layer) and computing the dot product at each position, the convolutional layer creates a feature map that encodes spatial hierarchies and patterns detected in the input. The output of a convolutional layer is a set of feature maps, each corresponding to one filter, capturing different aspects of the input data. This layer helps the neural network to learn increasingly complex and abstract features as the data passes through deeper layers of the network.

[0081] The PReLU layer is an activation function used in neural networks, and is a variant of the ReLU (Rectified Linear Unit) activation function. As described above, the convolution layer outputs feature maps, each corresponding to one filter, representing detected features in the input. Following the convolution layer, the PReLU layer applies the PReLU activation function to each element of the feature maps produced by the convolution layer. For positive values, the PReLU layer acts like a standard ReLU, passing the value through. For negative values, instead of setting them to zero (e.g., as ReLU does), the PReLU layer allows a small, linear, negative output. This keeps neurons of the NN active and maintains the gradient flow, which can be beneficial for learning in deep networks.

[0082] When NN-based filtering is applied in video coding, the whole video signal (pixel data) may be split into multiple processing units (e.g., 2D blocks), and each processing unit can be processed separately or be combined with other information associated with this block of pixels. For example, a processing unit may be a frame, a slice / tile, a CTU, or any pre-defined or signaled shapes and sizes. Typically, NN-based filtering is performed on reconstructed blocks of video data. Here, reconstructed blocks and samples may refer to both decoded blocks produced by video decoder 300, as well blocks reconstructed in a reconstruction loop of video encoder 200.

[0083] To further improve the performance of NN-based filtering, different types of input data can be processed jointly to produce the filtered output. Input data may include, but is not limited to, reconstruction pixels / samples, prediction pixels / samples,1616-601 WOOlQualcomm Ref. No. 2501625WO 22 pixels / samples after the loop filter(s), partitioning structure information, deblocking parameters (e.g., boundary strength (BS)), quantization parameter (QP) values, slice or picture types, or a filters applicability or coding modes map. Input data can be provided at different granularities. Luma reconstruction and prediction samples may be provided at the original resolution, whereas chroma samples may be provided at lower resolution, e.g. for 4:2:0 representation, or can be up-sampled to the Luma resolution to achieve per-pixel representation. Similarly, QP, BS, partitioning, or coding mode information can be provided at lower resolution, including cases with a single value per frame, slice or processing block (e.g. QP). In other examples, QP, BS, partitioning, or coding mode information can be expanded (e.g., replicated) to achieve per-pixel / sample representation.

[0084] To further improve the performance of NN-based filtering, multi-mode solutions can be used. For example, for each processing unit, video encoder 200 may select a mode from a set of modes based on rate-distortion optimization and signal the selected mode in the bit-stream. The different modes may include different NN models, different values that may be used as the input information of the NN models, etc. In one example, video encoder 200 and video decoder 300 may use an NN-based filtering solution with multiple modes based on a single NN model by using different QP values as input to the NN model for different modes.

[0085] In summary, the neural network-based filtering techniques described above may be implemented within or alongside the hybrid coding architecture of video encoder 200 and video decoder 300, operating in cooperation with prediction, transform, and quantization modules of processing circuitry 202 A and 202B. These NN-based filters may improve reconstruction quality while preserving compatibility with syntax-based signaling and implicit transform operations. Accordingly, the techniques for implicit multiple transform selection described below may be used independently of, or in conjunction with, the NN-based filtering processes to further enhance coding efficiency without requiring additional signaling in the bitstream.

[0086] This disclosure may generally refer to “signaling” certain information, such as syntax elements. The term “signaling” may generally refer to the communication of values for syntax elements and / or other data used to decode encoded video data. That is, video encoder 200 may signal values for syntax elements in the bitstream. In general, signaling refers to generating a value in the bitstream. As noted above, source device 102 may transport the bitstream to destination device 116 substantially in real time, or1616-601 WOOlQualcomm Ref. No. 2501625WO 23 not in real time, such as might occur when storing syntax elements to storage device 112 for later retrieval by destination device 116.

[0087] In accordance with the techniques of this disclosure, a decoder apparatus for decoding video data may include processing circuitry configured to determine an intraprediction type for a current block of video data, derive a primary characteristic for the current block based at least on the intra-prediction type, and derive a secondary characteristic based at least in part on the primary characteristic. The processing circuitry may select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to apply to a transform block representing residual data. The processing circuitry may apply the selected inverse transform to the transform block to generate residual data and reconstruct the current block based on the residual data and a prediction block.

[0088] The following describes examples of video coding standards to assist with understanding. Video coding standards encompass a variety of specifications, including ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, and ISO / IEC MPEG-4 Visual (MPEG-4 Part 2). These standards also include ITU-T H.264, commonly referred to as ISO / IEC MPEG-4 Advanced Video Coding (AVC), along with its extensions, Scalable Video Coding (SVC) and Multiview Video Coding (MVC). Additionally, the ITU-T H.265 standard, also known as ISO / IEC MPEG-4 High Efficiency Video Coding (HEVC), includes various extensions. In April 2018, during the meeting of the Joint Video Experts Team (JVET), the standardization activity for Versatile Video Coding (VVC), also known as ITU-T H.266, was initiated. This effort began with the evaluation of video compression technologies submitted in response to the Call for Proposals.

[0089] The operations and signaling processes described above establish the functional framework within which the present techniques operate. Building on that foundation, the following sections describe how explicit and implicit multiple transform selection are incorporated into modern coding models, such as the Enhanced Compression Model (ECM), to enable transform adaptation based on prediction type, block structure, and coefficient behavior. These examples illustrate the context in which the disclosed implicit transform selection techniques are applied.

[0090] The following describes examples of explicit multiple transform selection in the Enhanced Compression Model (ECM) to assist with understanding. Multiple primary transform selection is incorporated into the Enhanced Compression Model (ECM)1616-601 WOOlQualcomm Ref. No. 2501625WO 24 through two distinct methods: explicit and implicit. In explicit Multiple Transform Selection (MTS), there are several options available for selecting a transform, and these choices are signaled within the bitstream. These choices are further influenced by the transform block shape and intra-prediction mode. In the current version of ECM, the blocks are categorized into 16 different size groups, based on their width (W) and height (H), as set forth by Table 1, as follows:Table 1Size group - {WxH} =Where N >=32.

[0091] For each size group, they are further classified in 5 mode-groups based on mode information. In total, 16*5 = 80 groups are considered, as set forth by Table 2, as follows:Table 2

[0092] Given a prediction mode and block size, the block corresponds to a specific group. Additionally, the number of transform choices may be determined by the sum of the absolute levels of the coefficients. The number of available choices can be 1, 4, or 6. A higher number of choices is provided for blocks where the sum of the transform coefficients is greater, while a lower number of choices is assigned to blocks with a smaller sum of coefficient levels.1616-601 woolQualcomm Ref. No. 2501625WO 25

[0093] The following describes examples of Implicit Multiple Transform Selection (Implicit MTS) in the Enhanced Compression Model (ECM) to assist with understanding. Video encoder 200 and video decoder 300 are configured to perform implicit transform selection. For example, video encoder 200 and video decoder 300 may be configured to perform implicit transform selection utilizing an implicit method of determining the transform, without the need for signaling. In the Versatile Video Coding (VVC) / Enhanced Compression Model (ECM), implicit transform selection uses two distinct transform kernels: Discrete Cosine Transform (DCT2) and Discrete Sine Transform (DST7), with the choice of kernel depending on the size. DST7 is applied when the block size falls within the range of [4, 16], while DCT2 is used for other sizes. Furthermore, for Matrix-Based Intra Prediction (MIP), implicit transform selection is not used; only DCT2 is applied for both horizontal and vertical directions. In JVET-AI0223, entitled “EE1 : Summary report of exploration experiment on neural network-based video coding,” Joint Video Experts Team (JEVT) of ITU-T, SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 35thmeeting, Sapporo, JP, 12-19 July 2024, implicit multiple transform selection is enhanced by incorporating block shape and intra- directi onal mode dependencies. For block shape, the same 16 block shapes are used as in explicit MTS, and for intra-directi onal mode, 35 different classes are considered.

[0094] The following describes examples of equivalent mode derivation to assist with understanding. Video encoder 200 and video decoder 300 may be configured to derive the directionality of a current block by analyzing the Histogram of Gradients (HoG), which represents the distribution of gradient orientations of neighboring pixels. The HoG-based directionality may be interpreted as a regular directional intra mode and, therefore, can serve as an equivalent intra mode description for prediction and transform selection.

[0095] One example of equivalent mode derivation is a Decoder-Side Intra Mode Derivation (DIMD) process. The DIMD process analyzes gradients of neighboring reconstructed pixels to derive multiple equivalent intra modes that may represent similar directional characteristics. These equivalent modes may be sorted according to a measure such as the amplitude of the gradient, and one or more of the modes may be selected as the representative equivalent mode. The pixels from a prediction block may also be analyzed to derive a corresponding equivalent mode.

[0096] The derived equivalent mode may then be applied to mode-dependent transforms and look-up tables, such as Low Frequency Non-Separable Transforms1616-601 WOOlQualcomm Ref. No. 2501625WO 26(LFNST), Non-Separable Prediction Tables (NSPT), and Multiple Transform Selection (MTS). The same technique may also be extended to inter-predicted blocks and used for other intra prediction types, including Intra Block Copy (IBC). In some examples, the equivalent intra mode may be derived either from the reconstructed neighborhood (external HoG) or from the current prediction block itself (internal HoG).

[0097] In the context of this disclosure, the term “characteristic” may refer to any parameter, descriptor, or feature derived from spatial, directional, or transform-domain analysis of reconstructed or predicted samples. In some examples, such characteristics correspond directly to intra prediction modes, such as Decoder-Side Intra Mode Derivation (DIMD), Template-based Intra Mode Derivation (TIMD), Histogram of Gradients (HoG)-derived modes, or other equivalent directional modes described herein. In other examples, a characteristic may represent a statistical quantity computed from local gradients or prediction data — such as covariance, orientation variance, or edge intensity distribution — that conveys similar directional or transform-domain behavior. Accordingly, the techniques described herein for deriving and applying modes to transform selection and prediction may likewise be viewed as examples of characteristic derivation, providing a generalized framework for implicit multiple transform selection based on spatial or directional properties of the video data.

[0098] In some examples, the encoder and decoder may derive a primary characteristic and a secondary characteristic for a current block consistent with the techniques claimed herein. For instance, the primary characteristic may correspond to one of the directional or statistical descriptors described above, such as a DIMD mode, a TIMD mode, or a HoG-derived orientation histogram that represents dominant spatial directionality. The secondary characteristic may then be derived from the primary characteristic, for example by refining or quantizing the directional representation, computing a measure of difference between alternative derived modes, or mapping the primary characteristic into a transform-domain descriptor. In this way, the encoder and decoder utilize the derived characteristics to adaptively select transforms without explicit signaling, consistent with the adaptive transform selection framework of this disclosure. The encoder may apply the selected transform to residual data to generate transformed coefficients for coding, while the decoder may apply the corresponding inverse transform to the transform block to generate residual data for reconstruction.

[0099] Some examples are provided in the co-pending case US Patent Application No. 63 / 711,038, filed October 23, 2024, entitled “IMPLICIT MULTIPLE TRANSFORM1616-601 WOOlQualcomm Ref. No. 2501625WO 27SELECTION FOR VIDEO CODING" and PCT Application No. PCT / US2025 / 046910, filed September 18, 2025, entitled “IMPLICIT MULTIPLE TRANSFORM SELECTION FOR VIDEO CODING" This disclosure provides additional examples, including refinements, based on Intra Prediction Type.

[0100] The following describes examples of relevant Intra Prediction Types and their associated processes to assist with understanding. Video encoder 200 and video decoder 300 are may utilize relevant Intra Prediction Types, including DIMD, Template-based Intra Mode Derivation (TIMD), TIMD Merge, TIMD Sum of Absolute Differences (TIMD SAD), Occurence-based Intra Coding (OBIC), Spatial Geometric Partitioning Mode (SGPM), Extrapolation filter-based Intra Prediction (EIP), Multi-Model Extrapolation filter-based Intra Prediction (Multi-Model EIP), Intra Prediction via Template Matching (IntraTMP), Neural Network-Based Intra Prediction (NN Intra), Regular Directional Intra, Position-dependent Prediction (PDP), Matrix-Based Intra Prediction (MIP), Template-based Multiple Reference Lines (TMRL), horizontal and vertical PLANAR modes, and Intra Block Copy (IBC). Video encoder 200 and video decoder 300 may derive an equivalent intra mode by analyzing reconstructed pixels.

[0101] DIMD, TIMD, TIMD Merge, TIMD SAD, OBIC, Regular Directional Intra, PDP, TMRL, and horizontal and vertical PLANAR modes utilize external Histogram of Gradients (HoG) to derive a first and second HoG mode. On the other hand, SGPM, EIP, Multi-Model EIP, IntraTMP, NN Intra, MIP, and IBC utilize internal HoG to derive a first and second HoG mode.

[0102] The following aspects can be considered independently or in combination. In some examples, multiple classification techniques may be used to derive implicit transform selection. In certain implementations, the number of mode difference classes used for transform selection may be mode-dependent to reduce runtime complexity. For example, simpler prediction types, such as Position-dependent Prediction and regular directional intra prediction, may utilize a single difference class, while more complex prediction types, such as Template-based Intra Mode Derivation or neural networkbased intra prediction, may utilize multiple difference classes to achieve improved coding performance.

[0103] The use of multiple lookup tables for implicit transform selection is motivated by the observation that different intra-prediction types produce residual signals with distinct statistical characteristics. For instance, neural network-based intra prediction, matrix-based intra prediction, and Template-based Intra Mode Derivation generate1616-601 WOOlQualcomm Ref. No. 2501625WO 28 residual distributions that differ in spatial frequency composition, edge orientation concentration, and coefficient variance. By maintaining separate lookup tables for each intra-prediction type, transform selection can be adaptively optimized for the unique residual behavior of each type, improving transform-domain energy compaction and reducing coding loss compared to a single global table. This separation therefore provides a data-driven mechanism for more accurately mapping mode differences to transform indices without introducing signaling overhead.

[0104] In one example configuration, the lookup tables used for implicit transform selection can each include up to eight indexed entries corresponding to distinct intraprediction categories. These categories can include Template-based Intra Mode Derivation (TIMD), Matrix-Based Intra Prediction (MIP), Template Matching Prediction (TMP), Spatial Geometric Partitioning Mode (SGPM), Extrapolation filterbased Intra Prediction (EIP), Neural Network-Based Intra Prediction (NN Intra), and a combined “Other” category representing remaining intra-prediction types. This example defines a maximum lookup table dimension of eight while allowing the encoder or decoder to utilize fewer tables depending on the implementation and mode availability.

[0105] For all Intra Prediction Types, four default directional modes are defined: PLANAR, Direct Current (DC), horizontal, and vertical modes. Video encoder 200 and video decoder 300 are configured to utilize these default modes if a secondary mode distinct from the primary mode cannot be identified. The process for each Intra Prediction Type is outlined below. If the first and second modes are identical at the end of each respective process, the aforementioned default modes are tested as candidates for the second mode. In this context, the primary mode and secondary mode respectively correspond to the primary characteristic and secondary characteristic described herein.

[0106] The following describes examples of various Intra Prediction Types and their mode selection processes to assist with understanding.

[0107] For DIMD, the first mode is the Decoder-Side Intra Mode Derivation (DIMD) mode, and the second mode is the second external Histogram of Gradients (HoG) mode.

[0108] For Occurrence-based Intra Coding (OBIC), the first mode is the DIMD mode, while the second mode is the OBIC mode. If the OBIC mode matches the first mode, the second external HoG mode is used instead.

[0109] For Template-based Intra Mode Derivation (TIMD), the first mode is the TIMD mode. The second mode is determined by the secondary TIMD mode if TIMD is1616-601 WOOlQualcomm Ref. No. 2501625WO 29 blended. If TIMD is not blended, the secondary TIMD mode is unavailable, and therefore, the second mode is the second external HoG mode.

[0110] For Template-based Intra Mode Derivation Sum of Absolute Differences (TIMD SAD), the first mode is the TIMD mode, and the second mode is the secondary TIMD mode for SAD if TIMD is blended. If TIMD SAD is not blended, the secondary TIMD mode for SAD is unavailable, and thus, the second mode is the second external HoG mode.[OHl] For TIMD Merge, the first mode is the TIMD mode, and the second mode is derived from the merge candidate's secondary TIMD mode if the corresponding candidate is a blended TIMD mode. If the candidate is not a blended TIMD mode, the secondary TIMD mode for that candidate is unavailable, and the second mode becomes the second external HoG mode. The TIMD Merge process can inherit transform information from the merge candidate to maintain coding consistency. However, when implicit Multiple Transform Selection (MTS) is enabled, this inheritance is intentionally bypassed. In such cases, transform derivation is performed independently using the Look-up Table (LUT) associated with the active intra-prediction type, ensuring that transform selection remains data-driven rather than inherited. This bypass prevents potential mismatches between merge candidates having different intra-prediction contexts and allows for consistent implicit MTS behavior across merge and non-merge blocks.

[0112] The following describes examples of alternative processes for all TIMD modes to assist with understanding. In cases where TIMD, TIMD SAD, or TIMD Merge are not blended modes, an alternative derivation can be used to save runtime. Specifically, the secondary mode is set to the first mode incremented by one (i.e., secondary mode = first mode + 1). This rule provides a deterministic secondary-mode assignment without invoking an additional Histogram of Gradients (HoG) derivation, enabling consistent transform-selection behavior while avoiding the computational cost of gradient analysis.

[0113] This shortcut process eliminates the need to derive an additional external Histogram of Gradients (HoG) for unblended TIMD modes, thereby reducing runtime complexity during encoding and decoding. Because the unblended modes already contain sufficient directional information from their first intra-prediction type, recomputing a secondary HoG contributes minimal accuracy gain relative to the processing cost. Avoiding this redundant computation can provide reduction in encoder and decoder execution time while maintaining coding efficiency.1616-601 WOOlQualcomm Ref. No. 2501625WO 30

[0114] The following describes examples of regular directional intra and PDP to assist with understanding. The first mode is set as the directional intramode. The second mode is set as the first outside HoG mode. If both modes are identical, the second outside HoG mode is used instead. For Position-dependent Prediction (PDP) and Regular Directional Intra Prediction, no additional internal or external Histogram of Gradients (HoG) computation is performed. Omitting redundant HoG derivation for these modes can reduce runtime and simplify implementation without affecting transform selection accuracy.

[0115] The following describes examples of TMRL to assist with understanding. The first mode is set as the directional intramode. The second mode is set as the first outside HoG mode. If both modes are identical, the second outside HoG mode is used instead.

[0116] The following describes examples of horizontal and vertical planar modes to assist with understanding. In the case of Horizontal Planar, the first mode is set to Horizontal Mode. In the case of Vertical Planar, the first mode is set to Vertical Mode. The second mode is set as the first outside HoG mode. If both modes are identical, the second outside HoG mode is used instead.

[0117] The following describes examples of SGPM to assist with understanding. If regression-based SGPM is used, the first mode is set to the first internal HoG mode. If regular SGPM is used, the first mode is set based on the split direction of the SGPM mode utilizing the following LUT’s: g_geoAngle2IntraAng[g_geoParams[g_sgpmSplitDir[tu.cu->sgpmSplitDir]][0]].For both regression-based SGPM and regular SGPM, the second mode is set to the second internal HoG mode. Optionally, if an external HoG is additionally derived, the second mode may be set to the first or second external HoG mode if the internal HoG returns modes identical to the first mode.

[0118] The following describes examples of EIP, multi-model EIP, MIP, Template Matching Prediction (TMP), IBC, and NN intra, to assist with understanding. The first mode is set to the first internal HoG mode. The second mode is set to the second internal HoG mode. Optionally, if an external HoG is additionally derived, the second mode may be set to the first or second external HoG mode if the internal HoG returns modes identical to the first mode.

[0119] The following describes examples of mode difference to assist with understanding. In order to save runtime, the number of difference-classes may depend on the mode. In one example of video encoder 200 and video decoder 300, Position-1616-601 WOOlQualcomm Ref. No. 2501625WO 31 dependent Prediction (PDP) and Regular Intra Prediction do not use any difference, resulting in only one class. Consequently, any additional internal or external Histogram of Gradients (HoG) computation may be omitted. This approach minimizes the computational load by avoiding the calculation of additional histograms of gradients for these modes, thereby reducing runtime during both encoding and decoding.

[0120] The following describes examples of implicit MTS lookup tables to assist with understanding. In one example, the lookup table has the following shape: g_aucIpmToTrSetMod[IntraPredictionType] [predModeDiff] [blockSizeIdx][predMode],

[0121] IntraPredictionType refers to the intra prediction type, such as MIP, DIMD, NN Intra, etc. Some modes may be merged into a single “Other” class. In one example, the maximum dimension is 8, consisting of the modes TIMD, MIP, TMP, SGPM, EIP, NNIntra, and all “Other” modes. In another example, “Others” refers to the implicit MTS algorithm described in JVET-AI0223.

[0122] The term predModeDiff represents the index derived using the first and second modes described above for the corresponding mode: diff = abs(first mode - second mode).

[0123] In some examples, each implicit multiple transform selection lookup table may be trained or generated offline using statistical data obtained from representative video sequences. The lookup tables may be populated with transform selection results derived from training data corresponding to each intra-prediction type, such that each table reflects the transform distribution and directional behavior associated with that type. This offline training allows accurate implicit transform selection during runtime without requiring additional signaling in the bitstream.

[0124] In certain implementations, related intra-prediction types may be grouped or merged to share a common lookup table to reduce memory requirements and improve processing efficiency. For example, Template-based Intra Mode Derivation and Template-based Intra Mode Derivation merge may be combined into a single class when their statistical distributions are sufficiently similar. Other intra-prediction types with comparable transform characteristics may likewise be merged into a shared category for lookup table generation and application.

[0125] In one example:• predModeDiff = 0 if diff <=1• predModeDiff = 1 if diff <=2• predModeDiff = 2 if diff <=41616-601 WOOlQualcomm Ref. No. 2501625WO 32• predModeDiff = 3 if diff <=8• predModeDiff = 4 if diff <=16• predModeDiff = 5 if diff <=32• predModeDiff = 6 if diff 32

[0126] The term blockSizeldx indicates the current block shape based on the formular:• log2BlockWidth = floorLog2(blockSym ? height : width) - 2;• log2BlockHeight = floorLog2(blockSym ? width : height) - 2;• if(log2BlockWidth <4 && log2BlockHeight <4) o bllndSize = log2BlockHeight * 4 + log2BlockWidth;• else o bllndSize = 16 + (log2BlockWidth - 4);

[0127] Due to symmetry, some blockSizeldx values may not have a corresponding table entry (e.g. block shape 8x4 will utilize tables from 4x8 through symmetry). An additional LUT is defined to derive the final blockSizeldx: bllndSize = idxMapfbllndSize],

[0128] In one example: int idxMap

[0018] = { 0, 1, 2, 3, -1, 4, 5, 6, -1, -1, 7, 8, -1, -1, -1, 9, 10, 11 };

[0129] The term predMode refers to the first mode derived through the description above and depends on the intra-prediction type.

[0130] It will be understood that the specific derivation processes described for each intra-prediction type (e.g., DIMD, TIMD, SGPM, PDP, MIP, and others) represent distinct example embodiments. Unless explicitly stated, the described steps, parameters, or operational rules for one intra-prediction type are not intended to apply to or be combined with another. Each embodiment may be practiced independently, or together with other features only where such combinations are expressly described in this disclosure. This separation preserves the integrity of each individual embodiment and avoids implying combinations of features that are not directly and unambiguously derivable from the described examples.

[0131] The above description of FIG. 1 sets forth an overview of system 100 and its interaction between source device 102 and destination device 116, together with examples of coding standards and transform selection mechanisms suitable for implementing the disclosed techniques. To further illustrate these operations, FIG. 2 provides a more detailed block diagram of video encoder 200, showing functional modules and processing paths within processing circuitry 202A that perform the transform selection, prediction, and encoding processes described above.1616-601 WOOlQualcomm Ref. No. 2501625WO 33

[0132] FIG. 2 is a block diagram illustrating an example video encoder 200, in accordance with aspects of the disclosure. FIG. 2 is provided for purposes of explanation and should not be considered limiting of the techniques as broadly exemplified and described in this disclosure. For purposes of explanation, this disclosure describes video encoder 200 according to the techniques of WC and HEVC. However, the techniques of this disclosure may be performed by video encoding devices that are configured to other video coding standards and video coding formats, such as AVI and successors to the AVI video coding format.

[0133] In the example of FIG. 2, video encoder 200 includes video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, decoded picture buffer (DPB) 218, and entropy encoding unit 220. Any or all of video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, DPB 218, and entropy encoding unit 220 may be implemented in one or more processors or in processing circuitry. For instance, the units of video encoder 200 may be implemented as one or more circuits or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Moreover, video encoder 200 may include additional or alternative processors or processing circuitry to perform these and other functions.

[0134] Video data memory 230 is an example of a memory system that may store video data to be encoded by the components of video encoder 200. Video encoder 200 may receive the video data stored in video data memory 230 from, for example, video source 104 (FIG. 1). DPB 218 is an example of a memory system that may act as a reference picture memory that stores reference video data for use in prediction of subsequent video data by video encoder 200. Video data memory 230 and DPB 218 may each be formed by any of a variety of one or more memory devices or memory units, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 230 and DPB 218 may be provided by the same memory device or separate memory devices. In various examples, video data memory 230 may be on-chip with other components of video encoder 200, as illustrated, or off-chip relative to those components.1616-601 WOOlQualcomm Ref. No. 2501625WO 34

[0135] In this disclosure, reference to video data memory 230 should not be interpreted as being limited to memory internal to video encoder 200, unless specifically described as such, or memory external to video encoder 200, unless specifically described as such. Rather, reference to video data memory 230 should be understood as reference memory that stores video data that video encoder 200 receives for encoding (e.g., video data for a current block that is to be encoded). Memory 106 of FIG. 1 may also provide temporary storage of outputs from the various units of video encoder 200.

[0136] The various units of FIG. 2 are illustrated to assist with understanding the operations performed by video encoder 200. The units may be implemented as fixed- function circuits, programmable circuits, or a combination thereof. Fixed-function circuits refer to circuits that provide particular functionality, and are preset on the operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks, and provide flexible functionality in the operations that can be performed. For instance, programmable circuits may execute software or firmware that cause the programmable circuits to operate in the manner defined by instructions of the software or firmware. Fixed-function circuits may execute software instructions (e.g., to receive parameters or output parameters), but the types of operations that the fixed-function circuits perform are generally immutable. In some examples, one or more of the units may be distinct circuit blocks (fixed-function or programmable), and in some examples, one or more of the units may be integrated circuits.

[0137] Video encoder 200 may include arithmetic logic units (ALUs), elementary function units (EFUs), digital circuits, analog circuits, and / or programmable cores, formed from programmable circuits. In examples where the operations of video encoder 200 are performed using software executed by the programmable circuits, memory 106 (FIG. 1) may store the instructions (e.g., object code) of the software that video encoder 200 receives and executes, or another memory within video encoder 200 (not shown) may store such instructions.

[0138] Video data memory 230 is configured to store received video data. Video encoder 200 may retrieve a picture of the video data from video data memory 230 and provide the video data to residual generation unit 204 and mode selection unit 202. Video data in video data memory 230 may be raw video data that is to be encoded.

[0139] Mode selection unit 202 includes a motion estimation unit 222, a motion compensation unit 224, and an intra-prediction unit 226. Mode selection unit 202 may1616-601 WOOlQualcomm Ref. No. 2501625WO 35 include additional functional units to perform video prediction in accordance with other prediction modes. As examples, mode selection unit 202 may include a palette unit, an intra-block copy unit (which may be part of motion estimation unit 222 and / or motion compensation unit 224), an affine unit, a linear model (LM) unit, or the like.

[0140] Mode selection unit 202 generally coordinates multiple encoding passes to test combinations of encoding parameters and resulting rate-distortion values for such combinations. The encoding parameters may include partitioning of CTUs into CUs, prediction modes for the CUs, transform types for residual data of the CUs, quantization parameters for residual data of the CUs, and so on. Mode selection unit 202 may ultimately select the combination of encoding parameters having rate-distortion values that are better than the other tested combinations.

[0141] Video encoder 200 may partition a picture retrieved from video data memory 230 into a series of CTUs, and encapsulate one or more CTUs within a slice. Mode selection unit 202 may partition a CTU of the picture in accordance with a tree structure, such as the MTT structure, QTBT structure, superblock structure, or the quadtree structure described above. As described above, video encoder 200 may form one or more CUs from partitioning a CTU according to the tree structure. Such a CU may also be referred to generally as a “video block” or “block.”

[0142] In general, mode selection unit 202 also controls the components thereof (e.g., motion estimation unit 222, motion compensation unit 224, and intra-prediction unit 226) to generate a prediction block for a current block (e.g., a current CU, or in HEVC, the overlapping portion of a PU and a TU). For inter-prediction of a current block, motion estimation unit 222 may perform a motion search to identify one or more closely matching reference blocks in one or more reference pictures (e.g., one or more previously coded pictures stored in DPB 218). In particular, motion estimation unit 222 may calculate a value representative of how similar a potential reference block is to the current block, e.g., according to sum of absolute difference (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared differences (MSD), or the like. Motion estimation unit 222 may generally perform these calculations using sample-by-sample differences between the current block and the reference block being considered. Motion estimation unit 222 may identify a reference block having a lowest value resulting from these calculations, indicating a reference block that most closely matches the current block.1616-601 WOOlQualcomm Ref. No. 2501625WO 36

[0143] Motion estimation unit 222 may form one or more motion vectors (MVs) that define the positions of the reference blocks in the reference pictures relative to the position of the current block in a current picture. Motion estimation unit 222 may then provide the motion vectors to motion compensation unit 224. For example, for unidirectional inter-prediction, motion estimation unit 222 may provide a single motion vector, whereas for bi-directional inter-prediction, motion estimation unit 222 may provide two motion vectors. Motion compensation unit 224 may then generate a prediction block using the motion vectors. For example, motion compensation unit 224 may retrieve data of the reference block using the motion vector. As another example, if the motion vector has fractional sample precision, motion compensation unit 224 may interpolate values for the prediction block according to one or more interpolation filters. Moreover, for bi-directional inter-prediction, motion compensation unit 224 may retrieve data for two reference blocks identified by respective motion vectors and combine the retrieved data, e.g., through sample-by-sample averaging or weighted averaging.

[0144] When operating according to the AVI video coding format, motion estimation unit 222 and motion compensation unit 224 may be configured to encode coding blocks of video data (e.g., both luma and chroma coding blocks) using translational motion compensation, affine motion compensation, overlapped block motion compensation (OBMC), and / or compound inter-intra prediction.

[0145] As another example, for intra-prediction, or intra-prediction coding, intraprediction unit 226 may generate the prediction block from samples neighboring the current block. For example, for directional modes, intra-prediction unit 226 may generally mathematically combine values of neighboring samples and populate these calculated values in the defined direction across the current block to produce the prediction block. As another example, for DC mode, intra-prediction unit 226 may calculate an average of the neighboring samples to the current block and generate the prediction block to include this resulting average for each sample of the prediction block.

[0146] When operating according to the AV 1 video coding format, intra-prediction unit 226 may be configured to encode coding blocks of video data (e.g., both luma and chroma coding blocks) using directional intra prediction, non-directional intra prediction, recursive filter intra prediction, chroma-from-luma (CFL) prediction, intra block copy (IBC), and / or color palette mode. Mode selection unit 202 may include1616-601 WOOlQualcomm Ref. No. 2501625WO 37 additional functional units to perform video prediction in accordance with other prediction modes.

[0147] Mode selection unit 202 provides the prediction block to residual generation unit 204. Residual generation unit 204 receives a raw, unencoded version of the current block from video data memory 230 and the prediction block from mode selection unit 202. Residual generation unit 204 calculates sample-by-sample differences between the current block and the prediction block. The resulting sample-by-sample differences define a residual block for the current block. In some examples, residual generation unit 204 may also determine differences between sample values in the residual block to generate a residual block using residual differential pulse code modulation (RDPCM). In some examples, residual generation unit 204 may be formed using one or more subtractor circuits that perform binary subtraction.

[0148] In examples where mode selection unit 202 partitions CUs into PUs, each PU may be associated with a luma prediction unit and corresponding chroma prediction units. Video encoder 200 and video decoder 300 may support PUs having various sizes. As indicated above, the size of a CU may refer to the size of the luma coding block of the CU and the size of a PU may refer to the size of a luma prediction unit of the PU. Assuming that the size of a particular CU is 2Nx2N, video encoder 200 may support PU sizes of 2Nx2N or NxN for intra prediction, and symmetric PU sizes of 2Nx2N, 2NxN, Nx2N, NxN, or similar for inter prediction. Video encoder 200 and video decoder 300 may also support asymmetric partitioning for PU sizes of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.

[0149] In examples where mode selection unit 202 does not further partition a CU into PUs, each CU may be associated with a luma coding block and corresponding chroma coding blocks. As above, the size of a CU may refer to the size of the luma coding block of the CU. The video encoder 200 and video decoder 300 may support CU sizes of 2Nx2N, 2NxN, or Nx2N.

[0150] For other video coding techniques such as an intra-block copy mode coding, an affine-mode coding, and linear model (LM) mode coding, as some examples, mode selection unit 202, via respective units associated with the coding techniques, generates a prediction block for the current block being encoded. In some examples, such as palette mode coding, mode selection unit 202 may not generate a prediction block, and instead generate syntax elements that indicate the manner in which to reconstruct the1616-601 WOOlQualcomm Ref. No. 2501625WO 38 block based on a selected palette. In such modes, mode selection unit 202 may provide these syntax elements to entropy encoding unit 220 to be encoded.

[0151] As described above, residual generation unit 204 receives the video data for the current block and the corresponding prediction block. Residual generation unit 204 then generates a residual block for the current block. To generate the residual block, residual generation unit 204 calculates sample-by-sample differences between the prediction block and the current block. In some examples, transform processing unit 206 may select the transform based on a primary characteristic and a secondary characteristic derived from the intra-prediction type of the current block, consistent with the characteristic-based transform selection techniques described herein.

[0152] Transform processing unit 206 applies one or more transforms to the residual block to generate a block of transform coefficients (referred to herein as a “transform coefficient block”). Transform processing unit 206 may apply various transforms to a residual block to form the transform coefficient block. For example, transform processing unit 206 may apply a discrete cosine transform (DCT), a directional transform, a Karhunen-Loeve transform (KLT), or a conceptually similar transform to a residual block. In some examples, transform processing unit 206 may perform multiple transforms to a residual block, e.g., a primary transform and a secondary transform, such as a rotational transform. In some examples, transform processing unit 206 does not apply transforms to a residual block.

[0153] When operating according to AVI, transform processing unit 206 may apply one or more transforms to the residual block to generate a block of transform coefficients (referred to herein as a “transform coefficient block”). Transform processing unit 206 may apply various transforms to a residual block to form the transform coefficient block. For example, transform processing unit 206 may apply a horizontal / vertical transform combination that may include a discrete cosine transform (DCT), an asymmetric discrete sine transform (ADST), a flipped ADST (e.g., an ADST in reverse order), and an identity transform (IDTX). When using an identity transform, the transform is skipped in one of the vertical or horizontal directions. In some examples, transform processing may be skipped.

[0154] Quantization unit 208 may quantize the transform coefficients in a transform coefficient block, to produce a quantized transform coefficient block. Quantization unit 208 may quantize transform coefficients of a transform coefficient block according to a quantization parameter (QP) value associated with the current block. Video encoder 2001616-601 WOOlQualcomm Ref. No. 2501625WO 39(e.g., via mode selection unit 202) may adjust the degree of quantization applied to the transform coefficient blocks associated with the current block by adjusting the QP value associated with the CU. Quantization may introduce loss of information, and thus, quantized transform coefficients may have lower precision than the original transform coefficients produced by transform processing unit 206.

[0155] Inverse quantization unit 210 and inverse transform processing unit 212 may apply inverse quantization and inverse transforms to a quantized transform coefficient block, respectively, to reconstruct a residual block from the transform coefficient block. Reconstruction unit 214 may produce a reconstructed block corresponding to the current block (albeit potentially with some degree of distortion) based on the reconstructed residual block and a prediction block generated by mode selection unit 202. For example, reconstruction unit 214 may add samples of the reconstructed residual block to corresponding samples from the prediction block generated by mode selection unit 202 to produce the reconstructed block. In some examples, inverse transform processing unit 212 determines a transform to apply to a quantized transform coefficient block using any of, or any combination of, the implicit transform selection techniques of this disclosure.

[0156] Filter unit 216 may perform one or more filter operations on reconstructed blocks. For example, filter unit 216 may perform deblocking operations to reduce blockiness artifacts along edges of CUs. Operations of filter unit 216 may be skipped, in some examples.

[0157] When operating according to AVI, filter unit 216 may perform one or more filter operations on reconstructed blocks. For example, filter unit 216 may perform deblocking operations to reduce blockiness artifacts along edges of CUs. In other examples, filter unit 216 may apply a constrained directional enhancement filter (CDEF), which may be applied after deblocking, and may include the application of non-separable, non-linear, low-pass directional filters based on estimated edge directions. Filter unit 216 may also include a loop restoration filter, which is applied after CDEF, and may include a separable symmetric normalized Wiener filter or a dual self-guided filter.

[0158] Video encoder 200 stores reconstructed blocks in DPB 218. For instance, in examples where operations of filter unit 216 are not performed, reconstruction unit 214 may store reconstructed blocks to DPB 218. In examples where operations of filter unit 216 are performed, filter unit 216 may store the filtered reconstructed blocks to DPB1616-601 WOOlQualcomm Ref. No. 2501625WO 40218. Motion estimation unit 222 and motion compensation unit 224 may retrieve a reference picture from DPB 218, formed from the reconstructed (and potentially filtered) blocks, to inter-predict blocks of subsequently encoded pictures. In addition, intra-prediction unit 226 may use reconstructed blocks in DPB 218 of a current picture to intra-predict other blocks in the current picture.

[0159] In general, entropy encoding unit 220 may entropy encode syntax elements received from other functional components of video encoder 200. For example, entropy encoding unit 220 may entropy encode quantized transform coefficient blocks from quantization unit 208. As another example, entropy encoding unit 220 may entropy encode prediction syntax elements (e.g., motion information for inter-prediction or intra-mode information for intra-prediction) from mode selection unit 202. Entropy encoding unit 220 may perform one or more entropy encoding operations on the syntax elements, which are another example of video data, to generate entropy-encoded data. For example, entropy encoding unit 220 may perform a context-adaptive variable length coding (CAVLC) operation, a CAB AC operation, a variable-to-variable (V2V) length coding operation, a syntax-based context-adaptive binary arithmetic coding (SB AC) operation, a Probability Interval Partitioning Entropy (PIPE) coding operation, an Exponential-Golomb encoding operation, or another type of entropy encoding operation on the data. In some examples, entropy encoding unit 220 may operate in bypass mode where syntax elements are not entropy encoded.

[0160] Video encoder 200 may output a bitstream that includes the entropy encoded syntax elements needed to reconstruct blocks of a slice or picture. In particular, entropy encoding unit 220 may output the bitstream.

[0161] In accordance with AVI, entropy encoding unit 220 may be configured as a symbol -to- symbol adaptive multi-symbol arithmetic coder. A syntax element in AVI includes an alphabet of N elements, and a context (e.g., probability model) includes a set of N probabilities. Entropy encoding unit 220 may store the probabilities as n-bit (e.g., 15-bit) cumulative distribution functions (CDFs). Entropy encoding unit 220 may perform recursive scaling, with an update factor based on the alphabet size, to update the contexts.

[0162] The operations described above are described with respect to a block. Such description should be understood as being operations for a luma coding block and / or chroma coding blocks. As described above, in some examples, the luma coding block and chroma coding blocks are luma and chroma components of a CU. In some1616-601 WOOlQualcomm Ref. No. 2501625WO 41 examples, the luma coding block and the chroma coding blocks are luma and chroma components of a PU.

[0163] In some examples, operations performed with respect to a luma coding block need not be repeated for the chroma coding blocks. As one example, operations to identify a motion vector (MV) and reference picture for a luma coding block need not be repeated for identifying a MV and reference picture for the chroma blocks. Rather, the MV for the luma coding block may be scaled to determine the MV for the chroma blocks, and the reference picture may be the same. As another example, the intraprediction process may be the same for the luma coding block and the chroma coding blocks.

[0164] Video encoder 200 represents an example of a device configured to encode video data, including a memory configured to store video data, and one or more processing units implemented in circuitry and configured to determine an intraprediction type for a current block of the video data. According to certain examples, video encoder 200 may derive a primary characteristic for the current block based at least on the determined intra-prediction type and may further derive a secondary characteristic for the current block based at least in part on the primary characteristic. Video encoder 200 may select a transform for the current block based on the intraprediction type, the primary characteristic, and the secondary characteristic. In some examples, when the primary and secondary characteristics are identical, video encoder 200 may instead select a default directional transform. The selected transform may then be applied to residual data of the current block to generate a transform block representing transformed coefficients for coding, consistent with the implicit transform selection techniques described herein.

[0165] FIG. 3 is a block diagram illustrating an example video decoder 300, in accordance with aspects of the disclosure. FIG. 3 is provided for purposes of explanation and is not limiting on the techniques as broadly exemplified and described in this disclosure. For purposes of explanation, this disclosure describes video decoder 300 according to the techniques of VVC and HEVC. However, the techniques of this disclosure may be performed by video coding devices that are configured to other video coding standards.

[0166] In the example of FIG. 3, video decoder 300 includes coded picture buffer (CPB) memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310,1616-601 WOOlQualcomm Ref. No. 2501625WO 42 filter unit 312, and DPB 314. Any or all of CPB memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, filter unit 312, and DPB 314 may be implemented in one or more processors or in processing circuitry. For instance, the units of video decoder 300 may be implemented as one or more circuits or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Moreover, video decoder 300 may include additional or alternative processors or processing circuitry to perform these and other functions.

[0167] Prediction processing unit 304 includes motion compensation unit 316 and intraprediction unit 318. Prediction processing unit 304 may include additional units to perform prediction in accordance with other prediction modes. As examples, prediction processing unit 304 may include a palette unit, an intra-block copy unit (which may form part of motion compensation unit 316), an affine unit, a linear model (LM) unit, or the like. In other examples, video decoder 300 may include more, fewer, or different functional components.

[0168] When operating according to AVI, motion compensation unit 316 may be configured to decode coding blocks of video data (e.g., both luma and chroma coding blocks) using translational motion compensation, affine motion compensation, OBMC, and / or compound inter-intra prediction, as described above. Intra-prediction unit 318 may be configured to decode coding blocks of video data (e.g., both luma and chroma coding blocks) using directional intra prediction, non-directional intra prediction, recursive filter intra prediction, CFL, IBC, and / or color palette mode, as described above.

[0169] CPB memory 320 is an example of a memory system that may store video data, such as an encoded video bitstream, to be decoded by the components of video decoder 300. The video data stored in CPB memory 320 may be obtained, for example, from computer-readable medium 110 (FIG. 1). CPB memory 320 may include a CPB that stores encoded video data (e.g., syntax elements) from an encoded video bitstream. Also, CPB memory 320 may store video data other than syntax elements of a coded picture, such as temporary data representing outputs from the various units of video decoder 300. DPB 314 is an example of a memory system that generally stores decoded pictures, which video decoder 300 may output and / or use as reference video data when decoding subsequent data or pictures of the encoded video bitstream. CPB memory 320 and DPB 314 may each be formed by any of a variety of memory devices or memory1616-601 WOOlQualcomm Ref. No. 2501625WO 43 units, such as DRAM, including SDRAM, MRAM, RRAM, or other types of memory devices. CPB memory 320 and DPB 314 may be provided by the same memory device or separate memory devices. In various examples, CPB memory 320 may be on-chip with other components of video decoder 300, or off-chip relative to those components.

[0170] Additionally or alternatively, in some examples, video decoder 300 may retrieve coded video data from memory 120 (FIG. 1). That is, memory 120 may store data as discussed above with CPB memory 320. Likewise, memory 120 may store instructions to be executed by video decoder 300, when some or all of the functionality of video decoder 300 is implemented in software to be executed by processing circuitry of video decoder 300.

[0171] The various units shown in FIG. 3 are illustrated to assist with understanding the operations performed by video decoder 300. The units may be implemented as fixed- function circuits, programmable circuits, or a combination thereof. Similar to FIG. 2, fixed-function circuits refer to circuits that provide particular functionality, and are preset on the operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks, and provide flexible functionality in the operations that can be performed. For instance, programmable circuits may execute software or firmware that cause the programmable circuits to operate in the manner defined by instructions of the software or firmware. Fixed-function circuits may execute software instructions (e.g., to receive parameters or output parameters), but the types of operations that the fixed-function circuits perform are generally immutable. In some examples, one or more of the units may be distinct circuit blocks (fixed-function or programmable), and in some examples, one or more of the units may be integrated circuits.

[0172] Video decoder 300 may include ALUs, EFUs, digital circuits, analog circuits, and / or programmable cores formed from programmable circuits. In examples where the operations of video decoder 300 are performed by software executing on the programmable circuits, on-chip or off-chip memory may store instructions (e.g., object code) of the software that video decoder 300 receives and executes.

[0173] Entropy decoding unit 302 may receive encoded video data from CPB memory 320 and entropy decodes the encoded video data to reproduce syntax elements. Prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, and filter unit 312 generate decoded video data based on the syntax elements extracted from the encoded video bitstream.1616-601 WOOlQualcomm Ref. No. 2501625WO 44

[0174] In general, video decoder 300 reconstructs a picture on a block-by-block basis. Video decoder 300 may perform a reconstruction operation on each block individually (where the block currently being reconstructed, i.e., decoded, may be referred to as a “current block”).

[0175] Entropy decoding unit 302 may entropy decode syntax elements defining quantized transform coefficients of a quantized transform coefficient block, as well as transform information, such as a quantization parameter (QP) and / or transform mode indication(s). Inverse quantization unit 306 may use the QP associated with the quantized transform coefficient block to determine a degree of quantization and, likewise, a degree of inverse quantization for inverse quantization unit 306 to apply. Inverse quantization unit 306 may, for example, perform a bitwise left-shift operation to inverse quantize the quantized transform coefficients. Inverse quantization unit 306 may thereby form a transform coefficient block including transform coefficients.

[0176] After inverse quantization unit 306 forms the transform coefficient block, inverse transform processing unit 308 may apply one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block. For example, inverse transform processing unit 308 may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotational transform, an inverse directional transform, or another inverse transform to the transform coefficient block. In some examples, inverse transform processing unit 308 determines a transform to apply to a quantized transform coefficient block using any of, or any combination of, the implicit transform selection techniques of this disclosure.

[0177] Furthermore, prediction processing unit 304 generates a prediction block according to prediction information syntax elements that were entropy decoded by entropy decoding unit 302. For example, if the prediction information syntax elements indicate that the current block is inter-predicted, motion compensation unit 316 may generate the prediction block. In this case, the prediction information syntax elements may indicate a reference picture in DPB 314 from which to retrieve a reference block, as well as a motion vector identifying a location of the reference block in the reference picture relative to the location of the current block in the current picture. Motion compensation unit 316 may generally perform the inter-prediction process in a manner that is substantially similar to that described with respect to motion compensation unit 224 (FIG. 2).1616-601 WOOlQualcomm Ref. No. 2501625WO 45

[0178] As another example, if the prediction information syntax elements indicate that the current block is intra-predicted, intra-prediction unit 318 may generate the prediction block according to an intra-prediction mode indicated by the prediction information syntax elements. Again, intra-prediction unit 318 may generally perform the intra-prediction process in a manner that is substantially similar to that described with respect to intra-prediction unit 226 (FIG. 2). Intra-prediction unit 318 may retrieve data of neighboring samples to the current block from DPB 314.

[0179] Reconstruction unit 310 may reconstruct the current block using the prediction block and the residual block. For example, reconstruction unit 310 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the current block.

[0180] Filter unit 312 may perform one or more filter operations on reconstructed blocks. For example, filter unit 312 may perform deblocking operations to reduce blockiness artifacts along edges of the reconstructed blocks. Operations of filter unit 312 are not necessarily performed in all examples.

[0181] Video decoder 300 may store the reconstructed blocks in DPB 314. For instance, in examples where operations of filter unit 312 are not performed, reconstruction unit 310 may store reconstructed blocks to DPB 314. In examples where operations of filter unit 312 are performed, filter unit 312 may store the filtered reconstructed blocks to DPB 314. As discussed above, DPB 314 may provide reference information, such as samples of a current picture for intra-prediction and previously decoded pictures for subsequent motion compensation, to prediction processing unit 304. Moreover, video decoder 300 may output decoded pictures (e.g., decoded video) from DPB 314 for subsequent presentation on a display device, such as display device 118 of FIG. 1.

[0182] Video decoder 300 represents an example of a device configured to decode video data, including a memory configured to store video data, and one or more processing units implemented in circuitry and configured to determine an intraprediction type for a current block of the video data. According to certain examples, video decoder 300 may derive a primary characteristic for the current block based at least on the determined intra-prediction type and may further derive a secondary characteristic for the current block based at least in part on the primary characteristic. Video decoder 300 may select an inverse transform for the current block based on the intra-prediction type, the primary characteristic, and the secondary characteristic. In some examples, when the primary and secondary characteristics are identical, video1616-601 WOOlQualcomm Ref. No. 2501625WO 46 decoder 300 may instead select a default directional inverse transform. The selected inverse transform may then be applied to a transform block representing residual data to generate reconstructed residual data, which is combined with a prediction block to reconstruct the current block, consistent with the implicit transform selection techniques described herein.

[0183] FIG. 4 is a flowchart illustrating an example method for encoding a current block, in accordance with aspects of the disclosure. The current block may be or include a current CU. Although described with respect to video encoder 200 (FIGS. 1 and 2), it should be understood that other devices may be configured to perform a method similar to that of FIG. 4.

[0184] In this example, video encoder 200 initially predicts the current block (400). For example, video encoder 200 may form a prediction block for the current block. Video encoder 200 then calculates a residual block for the current block (402) by determining a difference between the original, unencoded block and the prediction block. Video encoder 200 next derives a primary characteristic for the current block based at least on an intra-prediction type and derives a secondary characteristic for the current block based at least in part on the primary characteristic. Video encoder 200 selects a transform for the residual block based on the intra-prediction type, the primary characteristic, and the secondary characteristic, and applies the selected transform to the residual block (404). Video encoder 200 then quantizes the resulting transform coefficients, scans the quantized transform coefficients (406), and during or following the scan, entropy encodes the transform coefficients (408), for example using context- adaptive variable-length coding (CAVLC) or context-adaptive binary arithmetic coding (CAB AC). Finally, video encoder 200 outputs entropy-encoded data of the block (410).

[0185] FIG. 5 is a flowchart illustrating an example method for decoding a current block of video data, in accordance with aspects of the disclosure. The current block may be or include a current CU. Although described with respect to video decoder 300 (FIGS. 1 and 3), it should be understood that other devices may be configured to perform a method similar to that of FIG. 5.

[0186] Video decoder 300 receives entropy-encoded data for the current block, such as entropy-encoded prediction information and entropy-encoded data for transform coefficients of a residual block corresponding to the current block (500). Video decoder 300 entropy decodes the received data to determine prediction information for the current block and to reproduce transform coefficients of the residual block (502). Video1616-601 WOOlQualcomm Ref. No. 2501625WO 47 decoder 300 predicts the current block (504), for example using an intra-prediction or inter-prediction mode as indicated by the prediction information, to generate a prediction block for the current block. Video decoder 300 inverse scans the reproduced transform coefficients to create a block of quantized transform coefficients (506). Video decoder 300 derives a primary characteristic for the current block based at least on an intra-prediction type and derives a secondary characteristic for the current block based at least in part on the primary characteristic. Video decoder 300 selects an inverse transform for the current block based on the intra-prediction type, the primary characteristic, and the secondary characteristic, and applies the inverse transform to the quantized transform coefficients to produce a residual block (508). Finally, video decoder 300 combines the prediction block and the residual block to decode the current block (510).

[0187] FIG. 6 is a flow diagram illustrating an example method for decoding video data (600), in accordance with aspects of this disclosure. FIG. 6 is described with respect to computing device 100, video decoder 300, and memory 120 of FIG. 1. However, the techniques of FIG. 6 may be performed by different components of computing device 100, by other video decoding devices, or by additional or alternative systems configured for decoding video data.

[0188] Processing circuitry of video decoder 300 may be configured to determine an intra-prediction type (602). For example, video decoder 300 may be configured to determine an intra-prediction type for a current block of video data, such as a directional, planar, or non-directional prediction mode identified from syntax elements or reconstructed neighboring samples.

[0189] Processing circuitry of video decoder 300 may be configured to derive a primary characteristic (604). For example, video decoder 300 may derive a primary characteristic for the current block based at least on the determined intra-prediction type, such as a directional, spatial, or transform-domain descriptor representing dominant orientation or statistical structure of the block.

[0190] Processing circuitry of video decoder 300 may be configured to derive a secondary characteristic (606). For example, video decoder 300 may derive a secondary characteristic for the current block based at least in part on the primary characteristic. The secondary characteristic may refine or quantize the primary descriptor, express a measure of difference between alternative derived modes, or otherwise represent a complementary property of the block.1616-601 WOOlQualcomm Ref. No. 2501625WO 48

[0191] Processing circuitry of video decoder 300 may be configured to select an inverse transform based on type and characteristics (608). For example, video decoder 300 may select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data. The selection may occur implicitly, without explicit signaling in the bitstream.

[0192] Processing circuitry of video decoder 300 may be configured to apply an inverse transform to a transform block (610). For example, video decoder 300 may apply the selected inverse transform to the transform block to generate residual data corresponding to the current block.

[0193] Processing circuitry of video decoder 300 may be configured to reconstruct the current block based on residual and prediction blocks (612). For example, video decoder 300 may reconstruct the current block based on residual data and a prediction block by combining the reconstructed residual data with predicted samples to reproduce pixel values of the current block.

[0194] In this way, FIG. 6 illustrates an example method for decoding video data through adaptive inverse-transform selection based on intra-prediction-derived characteristics, enabling efficient reconstruction of image blocks without explicit transform signaling.

[0195] FIG. 7 is a flow diagram illustrating an example method for encoding video data (700), in accordance with aspects of this disclosure. FIG. 7 is described with respect to computing device 100, video encoder 200, and memory 106 of FIG. 1. However, the techniques of FIG. 7 may be performed by different components of computing device 100, by other video encoding devices, or by additional or alternative systems configured for coding video data.

[0196] Processing circuitry of video encoder 200 may be configured to determine an intra-prediction type (702). For example, video encoder 200 may determine an intraprediction type for a current block of video data by analyzing neighboring reconstructed pixels or syntax elements to identify a directional, planar, or non-directional prediction mode.

[0197] Processing circuitry of video encoder 200 may be configured to derive a primary characteristic (704). For example, video encoder 200 may derive a primary characteristic for the current block based at least on the determined intra-prediction type. The primary characteristic may correspond to a directional, statistical, or1616-601 WOOlQualcomm Ref. No. 2501625WO 49 transform-domain descriptor representing dominant spatial orientation or frequency distribution of the block.

[0198] Processing circuitry of video encoder 200 may be configured to derive a secondary characteristic (706). For example, video encoder 200 may derive a secondary characteristic for the current block based at least in part on the primary characteristic. The secondary characteristic may represent a complementary or refined descriptor obtained from additional contextual or spatial information of the block.

[0199] Processing circuitry of video encoder 200 may be configured to select a transform based on type and characteristics (708). For example, video encoder 200 may select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, a transform to be applied to residual data of the current block. The selection may be performed implicitly from multiple transform lookup tables without explicit signaling in the bitstream.

[0200] Processing circuitry of video encoder 200 may be configured to apply a transform to residual data (710). For example, video encoder 200 may apply the selected transform to the residual data to generate a transform block representing transformed coefficients for coding.

[0201] Processing circuitry of video encoder 200 may be configured to output data indicative of transformed coefficients (712). For example, video encoder 200 may output quantized and entropy-encoded coefficients corresponding to the transformed block for inclusion in an encoded bitstream.

[0202] In this way, FIG. 7 illustrates an example method for encoding video data through adaptive transform selection based on intra-prediction-derived characteristics, enabling efficient coding performance and improved compression efficiency without explicit transform signaling.

[0203] Additional aspects of the disclosure are detailed in numbered clauses below.

[0204] Clause 1 - A method of decoding video data, comprising: determining an intraprediction type for a current block of video data; deriving a primary characteristic for the current block based at least on the intra-prediction type; deriving a secondary characteristic for the current block based at least in part on the primary characteristic; selecting, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data; applying the inverse transform to the transform block to1616-601 WOOlQualcomm Ref. No. 2501625WO 50 generate the residual data; and reconstructing the current block based on residual data and a prediction block.

[0205] Clause 2 - The method of clause 1, wherein selecting the inverse transform comprises selecting, based on the determined intra-prediction type, and a measure of difference between the primary characteristic and the secondary characteristic, the inverse transform.

[0206] Clause 3 - The method of clause 1, wherein the selecting of the inverse transform is performed without signaling transform information in a bitstream.

[0207] Clause 4 - The method of clause 1, wherein deriving the primary characteristic comprises deriving the primary characteristic using directional analysis of reconstructed pixels in a reconstructed neighborhood surrounding the current block, the directional analysis including computation of a histogram of gradients representing orientation magnitudes in the reconstructed neighborhood.

[0208] Clause 5 - The method of clause 1, wherein deriving the primary characteristic comprises deriving the primary characteristic using a statistical descriptor of pixel variation within the current block, including covariance or variance of pixel values, the secondary characteristic being derived using pixel information distinct from that used for the primary characteristic.

[0209] Clause 6 - The method of clause 1, further comprising determining a measure of difference between the primary characteristic and the secondary characteristic based on a directional difference magnitude that is quantized into one of multiple difference classes, each difference class corresponding to a transform-selection condition, wherein selecting the inverse transform comprises selecting, based on the determined intraprediction type and the measure of difference between the primary characteristic and the secondary characteristic, the inverse transform.

[0210] Clause 7 - The method of clause 1, wherein selecting the inverse transform comprises selecting the inverse transform from among multiple transform lookup tables respectively associated with different intra-prediction types, each transform lookup table storing transform indices as a function of a measure of difference between the primary characteristic and the secondary characteristic and of a block-size index of the current block, the block-size index being derived from logarithmic functions of block width and height and mapped to a symmetric block shape using a predefined index map.

[0211] Clause 8 - The method of clause 1, further comprising determining that the primary characteristic and the secondary characteristic are identical, wherein selecting1616-601 WOOlQualcomm Ref. No. 2501625WO 51 the inverse transform comprises selecting the inverse transform from one or more default directional transforms including planar, DC, horizontal, or vertical inverse transforms.

[0212] Clause 9 - The method of clause 1, wherein selecting the inverse transform comprises bypassing transform inheritance from a merge candidate.

[0213] Clause 10 - The method of clause 1, wherein derivation of the primary characteristic, the secondary characteristic, or both omits computation of an external histogram of gradients for unblended intra-prediction modes.

[0214] Clause 11 - The method of clause 1, further comprising classifying the current block into one of a plurality of block-size groups and one of a plurality of intra-mode groups before selecting the inverse transform, wherein selecting the inverse transform comprises selecting the inverse transform based on the classification and on a sum of absolute transform-coefficient levels.

[0215] Clause 12 - A decoder apparatus for decoding video data, the apparatus comprising: a memory configured to store video data; and processing circuitry in communication with the memory, wherein the processing circuitry is configured to: determine an intra-prediction type for a current block of video data; derive a primary characteristic for the current block based at least on the intra-prediction type; derive a secondary characteristic for the current block based at least in part on the primary characteristic; select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data; apply the inverse transform to the transform block to generate residual data; and reconstruct the current block based on residual data and a prediction block.

[0216] Clause 13 - The decoder apparatus of clause 12, wherein the processing circuitry is configured to select the inverse transform based on the determined intra-prediction type and a measure of difference between the primary characteristic and the secondary characteristic.

[0217] Clause 14 - The decoder apparatus of clause 12, wherein the processing circuitry is configured to select the inverse transform without signaling transform information in a bitstream.

[0218] Clause 15 - The decoder apparatus of clause 12, wherein the processing circuitry is configured to derive the primary characteristic using directional analysis of reconstructed pixels in a reconstructed neighborhood surrounding the current block, the1616-601 WOOlQualcomm Ref. No. 2501625WO 52 directional analysis including computation of a histogram of gradients representing orientation magnitudes in the reconstructed neighborhood.

[0219] Clause 16 - The decoder apparatus of clause 12, wherein the processing circuitry is configured to select the inverse transform from among multiple transform lookup tables respectively associated with different intra-prediction types, each transform lookup table storing transform indices as a function of a measure of difference between the primary characteristic and the secondary characteristic and of a block-size index of the current block.

[0220] Clause 17 - The decoder apparatus of clause 12, wherein the processing circuitry is configured to determine that the primary characteristic and the secondary characteristic are identical and, responsive thereto, to select the inverse transform from one or more default directional transforms including planar, DC, horizontal, or vertical inverse transforms.

[0221] Clause 18 - The decoder apparatus of clause 12, wherein the processing circuitry is configured to bypass transform inheritance from a merge candidate when selecting the inverse transform.

[0222] Clause 19 - The decoder apparatus of clause 12, wherein the processing circuitry is configured to classify the current block into one of a plurality of block-size groups and one of a plurality of intra-mode groups before selecting the inverse transform, the selection being further based on the classification and on a sum of absolute transformcoefficient levels.

[0223] Clause 20 - An encoder apparatus for coding video data, the apparatus comprising: a memory configured to store video data; and processing circuitry in communication with the memory, wherein the processing circuitry is configured to: determine an intra-prediction type for a current block of video data; derive a primary characteristic for the current block based at least on the intra-prediction type; derive a secondary characteristic for the current block based at least in part on the primary characteristic; select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, a transform to be applied to residual data of the current block; apply the transform to the residual data to generate a transform block representing transformed coefficients for coding; and output data indicative of the transformed coefficients.1616-601 WOOlQualcomm Ref. No. 2501625WO 53

[0224] Clause 21 - A computer program product comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to perform any of the methods of clauses 1-11.

[0225] Clause 22 - A device comprising means for performing any of the methods of clauses 1-11.

[0226] It is to be recognized that depending on the example, certain acts or events of any of the techniques described herein can be performed in a different sequence, may be faded, merged, or left out altogether (e.g., not all described acts or events are necessary for the practice of the techniques). Moreover, in certain examples, acts or events may be performed concurrently, e.g., through multi -threaded processing, interrupt processing, or multiple processors, rather than sequentially.

[0227] In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.

[0228] By way of example, and not limitation, such computer-readable storage media may include one or more of RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair,1616-601 WOOlQualcomm Ref. No. 2501625WO 54DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer- readable media.

[0229] Instructions may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry. Accordingly, the terms “processor” and “processing circuitry,” as used herein may refer to any of the foregoing structures or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.

[0230] The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and / or firmware.

[0231] Various examples have been described. These and other examples are within the scope of the following claims.1616-601 WOOl

Claims

Qualcomm Ref. No. 2501625WO 55WHAT IS CLAIMED IS:

1. A method of decoding video data, comprising: determining an intra-prediction type for a current block of video data; deriving a primary characteristic for the current block based at least on the intraprediction type; deriving a secondary characteristic for the current block based at least in part on the primary characteristic; selecting, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data; applying the inverse transform to the transform block to generate the residual data; and reconstructing the current block based on residual data and a prediction block.

2. The method of claim 1, wherein selecting the inverse transform comprises selecting, based on the determined intra-prediction type, and a measure of difference between the primary characteristic and the secondary characteristic, the inverse transform.

3. The method of claim 1, wherein the selecting of the inverse transform is performed without signaling transform information in a bitstream.

4. The method of claim 1, wherein deriving the primary characteristic comprises deriving the primary characteristic using directional analysis of reconstructed pixels in a reconstructed neighborhood surrounding the current block, the directional analysis including computation of a histogram of gradients representing orientation magnitudes in the reconstructed neighborhood.1616-601 WOOlQualcomm Ref. No. 2501625WO 565. The method of claim 1, wherein deriving the primary characteristic comprises deriving the primary characteristic using a statistical descriptor of pixel variation within the current block, including covariance or variance of pixel values, the secondary characteristic being derived using pixel information distinct from that used for the primary characteristic.

6. The method of claim 1, further comprising determining a measure of difference between the primary characteristic and the secondary characteristic based on a directional difference magnitude that is quantized into one of multiple difference classes, each difference class corresponding to a transform-selection condition, wherein selecting the inverse transform comprises selecting, based on the determined intraprediction type and the measure of difference between the primary characteristic and the secondary characteristic, the inverse transform.

7. The method of claim 1, wherein selecting the inverse transform comprises selecting the inverse transform from among multiple transform lookup tables respectively associated with different intra-prediction types, each transform lookup table storing transform indices as a function of a measure of difference between the primary characteristic and the secondary characteristic and of a block-size index of the current block, the block-size index being derived from logarithmic functions of block width and height and mapped to a symmetric block shape using a predefined index map.

8. The method of claim 1, further comprising determining that the primary characteristic and the secondary characteristic are identical, wherein selecting the inverse transform comprises selecting the inverse transform from one or more default directional transforms including planar, DC, horizontal, or vertical inverse transforms.

9. The method of claim 1, wherein selecting the inverse transform comprises bypassing transform inheritance from a merge candidate.

10. The method of claim 1, wherein derivation of the primary characteristic, the secondary characteristic, or both omits computation of an external histogram of gradients for unblended intra-prediction modes.1616-601 WOOlQualcomm Ref. No. 2501625WO 5711. The method of claim 1, further comprising classifying the current block into one of a plurality of block-size groups and one of a plurality of intra-mode groups before selecting the inverse transform, wherein selecting the inverse transform comprises selecting the inverse transform based on the classification and on a sum of absolute transform-coefficient levels.

12. A decoder apparatus for decoding video data, the apparatus comprising: a memory configured to store video data; and processing circuitry in communication with the memory, wherein the processing circuitry is configured to: determine an intra-prediction type for a current block of video data; derive a primary characteristic for the current block based at least on the intra-prediction type; derive a secondary characteristic for the current block based at least in part on the primary characteristic; select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, an inverse transform to be applied to a transform block representing residual data; apply the inverse transform to the transform block to generate residual data; and reconstruct the current block based on residual data and a prediction block.

13. The decoder apparatus of claim 12, wherein the processing circuitry is configured to select the inverse transform based on the determined intra-prediction type and a measure of difference between the primary characteristic and the secondary characteristic.

14. The decoder apparatus of claim 12, wherein the processing circuitry is configured to select the inverse transform without signaling transform information in a bitstream.1616-601 WOOlQualcomm Ref. No. 2501625WO 5815. The decoder apparatus of claim 12, wherein the processing circuitry is configured to derive the primary characteristic using directional analysis of reconstructed pixels in a reconstructed neighborhood surrounding the current block, the directional analysis including computation of a histogram of gradients representing orientation magnitudes in the reconstructed neighborhood.

16. The decoder apparatus of claim 12, wherein the processing circuitry is configured to select the inverse transform from among multiple transform lookup tables respectively associated with different intra-prediction types, each transform lookup table storing transform indices as a function of a measure of difference between the primary characteristic and the secondary characteristic and of a block-size index of the current block.

17. The decoder apparatus of claim 12, wherein the processing circuitry is configured to determine that the primary characteristic and the secondary characteristic are identical and, responsive thereto, to select the inverse transform from one or more default directional transforms including planar, DC, horizontal, or vertical inverse transforms.

18. The decoder apparatus of claim 12, wherein the processing circuitry is configured to bypass transform inheritance from a merge candidate when selecting the inverse transform.

19. The decoder apparatus of claim 12, wherein the processing circuitry is configured to classify the current block into one of a plurality of block-size groups and one of a plurality of intra-mode groups before selecting the inverse transform, the selection being further based on the classification and on a sum of absolute transformcoefficient levels.1616-601 WOOlQualcomm Ref. No. 2501625WO 5920. An encoder apparatus for coding video data, the apparatus comprising: a memory configured to store video data; and processing circuitry in communication with the memory, wherein the processing circuitry is configured to: determine an intra-prediction type for a current block of video data; derive a primary characteristic for the current block based at least on the intra-prediction type; derive a secondary characteristic for the current block based at least in part on the primary characteristic; select, based on the determined intra-prediction type, the primary characteristic, and the secondary characteristic, a transform to be applied to residual data of the current block; apply the transform to the residual data to generate a transform block representing transformed coefficients for coding; and output data indicative of the transformed coefficients.1616-601 WOOl