A posteriori prediction mode determination for a RAHT node
The a posteriori prediction mode determination for RAHT nodes in the RAHT transformation framework addresses the challenge of large point cloud data size by enhancing compression efficiency and reducing bitrate, achieving lossless performance for diverse point cloud densities.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- OFINNO LLC
- Filing Date
- 2025-12-29
- Publication Date
- 2026-07-02
AI Technical Summary
The large data size of point clouds, comprising millions or billions of points with geometry and attribute information, poses challenges for efficient storage and transmission, necessitating advanced compression techniques to reduce data size while maintaining visual quality or ensuring lossless compression for critical applications.
Implementing a posteriori prediction mode determination for RAHT nodes within a RAHT transformation framework, which includes dynamic reduction of occupancy configurations using a dynamic OBUF scheme to enhance compression efficiency and maintain context relevance.
Achieves lossless compression performance of approximately 0.7 bits per point for dense point clouds, potentially reducing bitrate by over 25% compared to traditional methods, while preserving visual quality and suitability for various point cloud densities.
Smart Images

Figure US2025061491_02072026_PF_FP_ABST
Abstract
Description
Docket No.: 24-2055PCTTITLE A Posteriori Prediction Mode Determination for a RAHT NodeCROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 63 / 739,366, filed December 27, 2024, which is hereby incorporated by reference in its entirety.BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Examples of several of the various embodiments of the present disclosure are described herein with reference to the drawings.
[0003] FIG. 1 illustrates an exemplary point cloud coding / decoding system in which embodiments of the present disclosure may be implemented.
[0004] FIG. 2 illustrates the Morton order of eight sub-cuboids split from a cuboid.
[0005] FIG. 3 illustrates an example processing or scanning order for the first three levels of an occupancy tree.
[0006] FIG. 4 illustrates an example of already-coded occupancies of cuboids that may be used to code the occupancy of a current child cuboid.
[0007] FIG. 5 illustrates an example of a dynamic reduction function DR that may be used in dynamic OBUF.
[0008] FIG. 6 illustrates a flowchart of an example method for coding the occupancy (e.g . , as indicated by a single bit) of a current child cuboid using dynamic OBUF.
[0009] FIG. 7 illustrates an example of an occupied cube of size NxNxN (where N > 1) that corresponds to a TriSoup node of an occupancy tree.
[0010] FIG. 8A illustrates an example cube corresponding to a TriSoup node with a number K of TriSoup vertices Vk.
[0011] FIG. 8B illustrates an example refinement to the TriSoup model by coding a centroid residual vector Cresinto the bitstream such as to use C+Cresinstead of C as pivoting vertex for the triangles.
[0012] FIG. 80 illustrates an example of coding a centroid residual vector Ores in / from the bitstream such that an adjusted centroid C+Cres is used instead of centroid 0 for generating TriSoup triangles of a cuboid corresponding to a portion of a point cloud.
[0013] FIG. 9A and FIG. 9B illustrate examples of voxelization.
[0014] FIG. 10 illustrates an example process for encoding geometry and attributes of a current point cloud.
[0015] FIG. 11 illustrates an example process for encoding attributes associated with a portion of geometry of the decoded geometry.
[0016] FIG. 12 illustrates an example process for decoding geometry and attributes of a current point cloud.
[0017] FIG. 13 illustrates an example process for decoding attributes associated with a portion of geometry of the decoded geometry.
[0018] FIG. 13A illustrates an example of point-to-point projection distance between a point of a portion of geometry and its nearest neighbor points of a motion compensated geometry.Docket No.: 24-2055PCT
[0019] FIG. 13B illustrates an example of point-to-point projection distance between a point of the reference point cloud for attributes and its nearest neighbor point of a motion compensated geometry.
[0020] FIG. 14 illustrates an example process for determining attribute predictors of attributes associated with portion of geometry.
[0021] FIG. 15 illustrates an example process for encoding attributes based on attribute predictors associated with portion of geometry.
[0022] FIG. 16 illustrates another example process for encoding attributes based on attribute predictors associated with portion.
[0023] FIG. 17 illustrates an example process for decoding attributes based on attribute predictors associated with portion of geometry.
[0024] FIG. 18 illustrates another example process for decoding attributes based on attribute predictors associated with portion of geometry.
[0025] FIG.19 illustrates an example RAHT transformation applied on child nodes of an octree parent node along three successive directions.
[0026] FIG. 20 illustrates an example RAHT transformation being applied to all octree nodes at depth ‘d’ to determine DC coefficients at depth d-1 and AC coefficients.
[0027] FIG. 21A illustrates an example process for encoding attribute information for child RAHT nodes at depth d of a parent RAHT node at depth d-1 using top-down coding and inter-depth prediction, according to some embodiments.
[0028] FIG. 21 B illustrates an example process for decoding attribute information for child RAHT nodes at depth d of a parent RAHT node at depth d-1 using top-down decoding and inter-depth prediction, according to some embodiments.
[0029] FIG. 22 illustrates an example process for up-sampling the mean sums of attributes values of RAHT nodes at depth d-1, such as including the parent RAHT node and the already-coded neighboring RAHT nodes, according to some embodiments.
[0030] FIG. 23 illustrates an example process for encoding attributes of a RAHT node using a prediction mode, according to some embodiments.
[0031] FIG. 24 illustrates an example process for performing block related to obtaining the prediction mode for encoding attributes of the RAHT node, according to embodiments.
[0032] FIG. 25 illustrates an example process for decoding attributes of a transform node using a prediction mode, according to some embodiments.
[0033] FIG. 26 illustrates an example process for obtaining a prediction mode for decoding attributes of the transform node, according to embodiments.
[0034] FIG. 27A illustrates an example of the top-down traversal of the transform process when a maximum depth for prediction mode coding is used.Docket No.: 24-2055PCT
[0035] FIG. 27B illustrates an example of the top-down traversal of the transform process when the maximum depth for prediction mode coding, the upper depth for average prediction mode, and the lower depth for average prediction mode are used.
[0036] FIG. 28 illustrates an example process for encoding attributes of the transform node and determining an a posteriori prediction mode for the transform node, according to some embodiments.
[0037] FIG. 29 illustrates an example process for decoding attributes of the transform node and determining an a posteriori prediction mode for the transform node, according to some embodiments.
[0038] FIG. 30A illustrates an example of the top-down traversal of the transform process when a maximum depth for prediction mode coding is activated and when the a posteriori prediction mode is activated, according to some embodiments.
[0039] FIG. 30B illustrates an example of the top-down traversal of the transform process when a maximum depth for prediction mode coding, an upper depth for average mode and a lower depth for average mode are used and when the a posteriori prediction mode is activated, according to some embodiments.
[0040] FIG. 31 illustrates flowchart of an example method for associating an a posteriori prediction mode with a transform node of a transform tree, according to some embodiments.
[0041] FIG. 32 illustrates a block diagram of an example computer system in which embodiments of the present disclosure may be implemented.DETAILED DESCRIPTION
[0042] In the following description, numerous specific details are set forth in order to provide a thorough understanding of the disclosure. However, it will be apparent to those skilled in the art that the disclosure, including structures, systems, and methods, may be practiced without these specific details. The description and representation herein are the common means used by those experienced or skilled in the art to most effectively convey the substance of their work to others skilled in the art. In other instances, well-known methods, procedures, components, and circuitry have not been described in detail to avoid unnecessarily obscuring aspects of the disclosure
[0043] References in the specification to "one embodiment,” "an embodiment,” "an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
[0044] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but couldDocket No.: 24-2055PCThave additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0045] The term “computer-readable medium” includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing, containing, or carrying instruction(s) and / or data. A computer-readable medium may include a non-transitory medium in which data can be stored and that does not include carrier waves and / or transitory electronic signals propagating wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as compact disk (CD) or digital versatile disk (DVD), flash memory, memory or memory devices. A computer-readable medium may have stored thereon code and / or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and / or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, or the like.
[0046] Furthermore, embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a computer-readable or machine-readable medium. A processor(s) may perform the necessary tasks.
[0047] T raditional visual data describes an object or scene using a series of points that each comprise a position in two dimensions (x and y) and one or more optional attributes like color. Volumetric visual data adds another positional dimension to this traditional visual data. Volumetric visual data describes an object or scene using a series of points that each comprise a position in three dimensions (x, y, and z) and one or more optional attributes like color, reflectance, time stamp, etc. Compared to traditional visual data, volumetric visual data may provide a more immersive way to experience visual data.
[0048] For example, an object or scene described by volumetric visual data may be viewed from any (or multiple) angles, whereas traditional visual data may generally only be viewed from the angle in which it was captured or rendered. Volumetric visual data may be used in many applications, including Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR). Sparse volumetric visual data may be used in the automotive industry for the representation of 3D maps (cartography) or as input to assisted driving systems. In the latter use case, volumetric visual data is typically input to driving decision algorithms. In another example, volumetric visual data may be used to store valuable objects in digital form In applications for preserving cultural heritage, the goal is to keep a representation of objects that may be threatened by natural disasters. For example, statues, vases, and temples may be entirely scanned and stored as volumetric visual data having several billions of samples. This use case forDocket No.: 24-2055PCTvolumetric visual data may be particularly relevant for valuable objects in locations where earthquakes, tsunamis, and typhoons are frequent. Volumetric visual data may be in the form of a volumetric frame that describes an object or scene captured at a particular time instance or in the form of a sequence of volumetric frames (referred to as a volumetric sequence or volumetric video) that describes an object or scene captured at multiple different time instances.
[0049] One format for storing volumetric visual data is point clouds. A point cloud comprises a collection of points in three-dimensional (3D) space. Each point in a point cloud may comprise geometry information that indicates the points position in 3D space. For example, the geometry information may indicate the point’s position in 3D space using three Cartesian coordinates (x, y, and z) or using spherical coordinates (r, phi, theta) (e.g., when acquired by a rotating sensor). The positions of points in a point cloud may be quantized according to a space precision, which may be the same or different in each dimension. The quantization process may create a grid in 3D space. One or more points residing within each sub-grid volume may be mapped to the sub-grid center coordinates, referred to as voxels. A voxel (also referred to as a volumetric pixel) may be considered as a 3D extension of pixels corresponding to the 2D image grid coordinates. For example, similar to a pixel being the smallest unit when dividing the 2D space (or 2D image) into discrete, uniform (eg., equally sized) regions, a voxel may be the smallest unit of volume when dividing 3D space into discrete, uniform regions. The sub-grid center coordinates (which correspond to voxels) may be referred to as a voxelized grid. A point in a point cloud may further comprise one or more types of attribute information. Attribute information may indicate a property of a point’s visual appearance. For example, attribute information may indicate a texture (e.g., color) of the point, a material type of the point, transparency information of the point, reflectance information of the point, a normal vector to a surface of the point, a velocity at the point, an acceleration at the point, a time stamp indicating when the point was captured, or a modality indicating how the point was captured (e.g., running, walking, or flying). In another example, a point in a point cloud may comprise light field data in the form of multiple view-dependent texture information. Light field data may be another type of optional attribute information.
[0050] The points in a point cloud may describe an object or a scene. For example, the points in a point cloud may describe the external surface and / or the internal structure of an object or scene. The object or scene may be synthetically generated by a computer or may be generated from the capture of a real-world object or scene. The geometry information of a real-world object or scene may be obtained by 3D scanning and / or photogrammetry. 3D scanning may include laser scanning, structured light scanning, and / or modulated light scanning. 3D scanning may obtain geometry information by moving one or more laser heads, structured light cameras, and / or modulated light cameras relative to an object or scene being scanned. Photogrammetry may obtain geometry information by triangulating the same feature or point in different spatially shifted 2D photographs. Point cloud data may be in the form of a point cloud frame that describes an object or scene captured at a particular time instance or in the form of a sequence of point cloud frames (referred to as a point cloud sequence or point cloud video) that describes an object or scene captured at multiple different time instances.Docket No.: 24-2055PCT
[0051] The data size of a point cloud frame or sequence may be too large for storage and / or transmission in many applications. For example, a single point cloud may comprise over a million points or even billions of points, where each point may comprise geometry information and one or more optional types of attribute information. The geometry information of each point may comprise three Cartesian coordinates (x, y, and z) or spherical coordinates (r, phi, theta) that are each represented, for example, using at least 10 bits per component or 30 bits in total. The attribute information of each point may comprise a texture corresponding to three color components (e.g., R, G, and B color components) that are each represented, for example, using 8-10 bits per component or 24-30 bits in total. A single point therefore comprises at least 54 bits of information in this example, with at least 30 bits of geometry information and at least 24 bits of texture. If a point cloud frame includes a million such points, each point cloud frame would require 54 million bits or 54 megabits to represent. In case of dynamic point clouds that change over time, at a frame rate of 30 frames per second, a data rate of 1.62 gigabits per second would be required to transmit the points of the point cloud sequence. Therefore, raw representations of point clouds may require a large amount of data, and the practical deployment of point-cloud-based technologies may need compression technologies that enable the storage and distribution of point clouds with reasonable cost.
[0052] Encoding may be used to compress and / or reduce the data size of a point cloud frame or sequence to provide for more efficient storage and / or transmission. Decoding may be used to decompress a compressed point cloud frame or sequence for display and / or other forms of consumption (e.g., by a machine learning-based device, neural network-based device, artificial intelligence-based device, or other forms of consumption by other types of machinebased processing algorithms and / or devices). Compression of point clouds may be lossy (introducing differences relative to the original data) for the distribution to and visualization by an end-user, for example, on AR or VR glasses or any other 3D-capable device. Lossy compression may allow for a high ratio of compression but may imply a tradeoff between compression and visual quality perceived by an end-user. Other frameworks, like medical applications or autonomous driving, may require lossless compression to avoid altering the results of a decision obtained based on the analysis of the transmitted and decompressed point cloud frame.
[0053] FIG. 1 illustrates an exemplary point cloud coding system 100 in which embodiments of the present disclosure may be implemented. Point cloud coding system 100 comprises a source device 102, a transmission medium 104, and a destination device 106. Source device 102 encodes a point cloud sequence 108 into a bitstream 110 for more efficient storage and / or transmission. Source device 102 may store and / or transmit bitstream 110 to destination device 106 via transmission medium 104. Destination device 106 decodes bitstream 110 to display point cloud sequence 108 or for other forms of consumption. Destination device 106 may receive bitstream 110 from source device 102 via a storage medium or transmission medium 104. Source device 102 and destination device 106 may be any one of a number of different devices, including a cluster of interconnected computer systems acting as a pool of seamless resources (also referred to as a cloud of computers or cloud computer), a server, a desktop computer, a laptop computer, a tablet computer, a smart phone, a wearable device, a television, a camera, a video gaming console, a set-top box, a video streaming device, an autonomous vehicle, or a head mounted display. A headDocket No.: 24-2055PCTmounted display may allow a user to view a VR, AR, or MR scene and adjust the view of the scene based on movement of the user's head. A head mounted display may be tethered to a processing device (e.g., a server, desktop computer, set-top box, or video gaming counsel) or may be fully self-contained.
[0054] To encode point cloud sequence 108 into bitstream 110, source device 102 may comprise a point cloud source 112, an encoder 114, and an output interface 116. Point cloud source 112 may provide or generate point cloud sequence 108 from a capture of a natural scene and / or a synthetically generated scene. A synthetically generated scene may be a scene comprising computer generated graphics. Point cloud source 112 may comprise one or more point cloud capture devices (e.g., one or more laser scanning devices, structured light scanning devices, modulated light scanning devices, and / or passive scanning devices), a point cloud archive comprising previously captured natural scenes and / or synthetically generated scenes, a point cloud feed interface to receive captured natural scenes and / or synthetically generated scenes from a point cloud content provider, and / or a processor to generate synthetic point cloud scenes.
[0055] As shown in FIG. 1, a point cloud sequence 108 may comprise a series of point cloud frames 124. A point cloud frame may describe an object or scene captured at a particular time instance. Point cloud sequence 108 may achieve the impression of motion when a constant or variable time is used to successively present point cloud frames 124 of point cloud sequence 108. A point cloud frame may comprise a collection of points 126 in 3D space. Each of points 126 may comprise geometry information that indicates the point's position in 3D space. For example, the geometry information may indicate the point’s position in 3D space using three Cartesian coordinates (x, y, and z). One or more of points 126 may further comprise one or more types of attribute information. Attribute information may indicate a property of a point’s visual appearance. For example, attribute information may indicate a texture (e.g., color) of a point, a material type of a point, transparency information of a point, reflectance information of a point, a normal vector to a surface of a point, a velocity at a point, an acceleration at a point, a time stamp indicating when a point was captured, a modality indicating how a point was captured (e.g., running, walking, or flying). In another example, one or more of points 126 may comprise light field data in the form of multiple view-dependent texture information. Light field data may be another type of optional attribute information. Color attribute information of one or more of points 126 may comprise a luminance value and two chrominance values. The luminance value may represent the brightness (or luma component, Y) of the point. The chrominance values may respectively represent the blue and red components of the point (or chroma components, Cb and Cr) separate from the brightness. Other color attribute values are possible based on different color schemes (e.g., an RGB or monochrome color scheme).
[0056] Encoder 114 may encode point cloud sequence 108 into bitstream 110. To encode point cloud sequence 108, encoder 114 may apply one or more lossy compression techniques and / or prediction techniques to reduce redundant information in point cloud sequence 108. Redundant information is information that may be predicted at a decoder and therefore may not be needed to be transmitted to the decoder for accurate decoding of point cloud sequence 108. For example, Motion Picture Expert Group (MPEG) introduced a geometry-based point cloud compression (G- PCC) standard (ISO / IEC standard 23090-9: Geometry-based point cloud compression). G-PCC specifies theDocket No.: 24-2055PCTencoded bitstream syntax and semantics for transmission and / or storage of a compressed point cloud frame and the decoder operation for reconstructing the compressed point cloud frame from the bitstream. During standardization of G-PCC, a reference software (ISO / IEC standard 23090-21 : Reference Software for G-PCC) was developed to encode the geometry and attribute information of a point cloud frame. To encode geometry information of a point cloud frame, the G-PCC reference software encoder may perform voxelization by quantizing positions of points in a point cloud, which creates a grid in 3D space. The G-PCC reference software encoder may map the points to the center coordinates of the sub-grid volume (or voxel) that their quantized locations reside. The G-PCC reference software encoder may perform geometry analysis using an occupancy tree to compress the geometry information. The G-PCC reference software encoder may entropy encode the result of the geometry analysis to further compress the geometry information. To encode attribute information of a point cloud, the G-PCC reference software encoder may apply a transform tool, such as Region Adaptive Hierarchical Transform (RAHT), the Predicting Transform, and / or the Lifting Transform. The Lifting Transform may be built on top of the Predicting Transform but with an extra update / lifting step. Consequently, these two transforms may be referred to as Predicting / Lifting Transform or pred lift. Encoder 114 may operate in a same or similar manner to an encoder provided by the G-PCC reference software.
[0057] Output interface 116 may be configured to write and / or store bitstream 110 onto transmission medium 104 for transmission to destination device 106. In addition, or alternatively, output interface 116 may be configured to transmit, upload, and / or stream bitstream 110 to destination device 106 via transmission medium 104. Output interface 116 may comprise a wired and / or wireless transmitter configured to transmit, upload, and / or stream bitstream 110 according to one or more proprietary and / or standardized communication protocols, such as Digital Video Broadcasting (DVB) standards, Advanced Television Systems Committee (ATSC) standards, Integrated Services Digital Broadcasting (ISDB) standards, Data Over Cable Service Interface Specification (DOCSIS) standards, 3rd Generation Partnership Project (3GPP) standards, Institute of Electrical and Electronics Engineers (IEEE) standards, Internet Protocol (IP) standards, and Wireless Application Protocol (WAP) standards.
[0058] Transmission medium 104 may comprise a wireless, wired, and / or computer readable medium. For example, transmission medium 104 may comprise one or more wires, cables, air interfaces, optical discs, flash memory, and / or magnetic memory. In addition or alternatively, transmission medium 104 may comprise one more networks (e.g., the Internet) or file servers configured to store and / or transmit encoded video data.
[0059] To decode bitstream 110 into point cloud sequence 108 for display or other forms of consumption, destination device 106 may comprise an input interface 118, a decoder 120, and a point cloud display 122. Input interface 118 may be configured to read bitstream 110 stored on transmission medium 104 by source device 102. In addition, or alternatively, input interface 118 may be configured to receive, download, and / or stream bitstream 110 from source device 102 via transmission medium 104. Input interface 118 may comprise a wired and / or wireless receiver configured to receive, download, and / or stream bitstream 110 according to one or more proprietary and / or standardized communication protocols, such as those mentioned above.Docket No.: 24-2055PCT
[0060] Decoder 120 may decode point cloud sequence 108 from encoded bitstream 110. For example, decoder 120 may operate in a same or similar manner to a decoder provided by G-PCC reference software. In some examples, decoder 120 may decode a point cloud sequence that approximates point cloud sequence 108 due to, for example, lossy compression of point cloud sequence 108 by encoder 114 and / or errors introduced into encoded bitstream 110 during transmission to destination device 106.
[0061] Point cloud display 122 may display point cloud sequence 108 to a user. Point cloud display 122 may comprise a cathode rate tube (CRT) display, a liquid crystal display (LCD), a plasma display, a light emitting diode (LED) display, a 3D display, a holographic display, a head mounted display, or any other display device suitable for displaying point cloud sequence 108.
[0062] It should be noted that point cloud coding / decoding system 100 is presented by way of example and not limitation. In the example of FIG. 1, point cloud coding / decoding system 100 may have other components and / or arrangements. For example, point cloud source 112 may be external to source device 102. Similarly, point cloud display 122 may be external to destination device 106 or omitted altogether where point cloud sequence is intended for consumption by a machine and / or storage device. In another example, source device 102 may further comprise a point cloud decoder and destination device 106 may comprise a point cloud encoder. In such an example, source device 102 may be configured to further receive an encoded bit stream from destination device 106 to support two- way point cloud transmission between the devices.
[0063] As mentioned above, an encoder may quantize the positions of points in a point cloud according to a space precision, which may be the same or different in each dimension of the points. The quantization process may create a grid in 3D space. The encoder may map any points residing within each sub-grid volume to the sub-grid center coordinates, referred to as a voxel (or a volumetric pixel). A voxel may be considered as a 3D extension of pixels corresponding to 2D image grid coordinates.
[0064] The encoder may represent or code the point cloud using an occupancy tree. For example, the encoder may split the initial volume or cuboid (also referred to as a bounding box) containing the point cloud into sub-cuboids. The encoder may then recursively split each sub-cuboid that contains at least one point of the point cloud. The encoder may not further split sub-cuboids that do not contain at least one point of the point cloud. A sub-cuboid that contains at least one point of the point cloud may be referred to as an occupied sub-cuboid. A sub-cuboid that does not contain at least one point of the point cloud may be referred to as an unoccupied sub-cuboid. The encoder may split an occupied cuboid into, for example, two sub-cuboids (to form a binary tree), four sub-cuboids (to form a quadtree), or eight sub-cuboids (to form an octree). The encoder may split an occupied cuboid to obtain sub-cuboids all with the same size and shape at a given depth level of the occupancy tree by splitting following a plane passing through the middle of edges of the cuboid.
[0065] The initial volume or cuboid containing the point cloud may correspond to the root node of the occupancy tree Each occupied sub-cuboid, split from the initial volume / cuboid, may correspond to a node (of the root node) in a second level of the occupancy tree. Each occupied sub-cuboid, split from an occupied sub-cuboid in the second level,Docket No.: 24-2055PCTmay correspond to a node (off the occupied sub-cuboid in the second level from which it was split) in a third level of the occupancy tree. The occupancy tree structure may continue to form in this manner for each recursive split iteration until, for example, a maximum depth level of the occupancy tree is reached or each occupied sub-cuboid has a volume corresponding to one voxel.
[0066] Each non-leaf node of the occupancy tree may comprise or be associated with an occupancy word representing an occupancy state of the cuboid corresponding to the node. For example, a node of the occupancy tree corresponding to a cuboid that is split into 8 sub-cuboids may comprise or be associated with a 1-byte occupancy word. Each bit (referred to as an occupancy bit) of the 1 -byte occupancy word may represent or indicate the occupancy of a different one of the eight sub-cuboids. Occupied sub-cuboids may be represented or indicated by a binary value of 1 in the 1-byte occupancy word and unoccupied sub-cuboids may be represented or indicated by a binary value of 0 in the 1-byte occupancy word. In other examples, occupied and un-occupied sub-cuboids may be represented or indicated by opposite 1 -bit binary values in the 1 -byte occupancy word.
[0067] Each bit of an occupancy word may represent or indicate the occupancy of a different one of the eight subcuboids following the so-called Morton order. For example, the least significant bit of an occupancy word may represent or indicate the occupancy of a first one of the eight sub-cuboids following the Morton order, the second least significant bit of an occupancy word may represent or indicate the occupancy of a second one of the eight subcuboids following the Morton order, etc.
[0068] FIG. 2 illustrates the Morton order of eight sub-cuboids 202-216 split from a cuboid 200. Sub-cuboids 202-216 are labeled based on their Morton order, with child node 202 being the first in Morton orderand child node 216 being the last in Morton order. The Morton order for sub-cuboids 202-216 is a local lexicographic order in xyz.
[0069] The geometry of the point cloud is represented by, and therefore may be determined from, the initial volume and the occupancy words of the nodes in the occupancy tree. The encoder may therefore transmit the initial volume and the occupancy words of the nodes in the occupancy tree in a bitstream to a decoder for reconstructing the point cloud. Before transmitting the initial volume and the occupancy words of the nodes in the occupancy tree, the encoder may entropy encode the occupancy words. For example, the encoder may encode an occupancy bit of an occupancy word of a node corresponding to a cuboid, based on one or more occupancy bits of occupancy words of other nodes corresponding to cuboids that are adjacent or spatially close to the cuboid of the occupancy bit being encoded.
[0070] An encoder and / or decoder may code occupancy bits of occupancy words in sequence of a scan order. For example, an encoder and / or decoder may scan an occupancy tree in breadth-first order: all the occupancy words of the nodes of a given depth (or level) within the occupancy tree may be scanned before scanning the occupancy words of the nodes of the next depth (or level). Within a depth, the encoder and / or decoder may scan the occupancy words of nodes in the Morton order. Within a node, the encoder and / or decoder may scan the occupancy bits of the occupancy word of the node further in the Morton order.
[0071] FIG. 3 illustrates an example of this scanning order for the first three levels of an occupancy tree 300. At each level of occupancy tree 300, a plurality of cuboids (e.g . , cubes) are generated. In FIG. 3, a cube 302 corresponding toDocket No.: 24-2055PCTthe root node of occupancy tree 300 is divided into eight sub-cubes. Two sub-cubes 304 and 306 of the eight subcubes are occupied, while the other six sub-cubes are unoccupied. Following the Morton order, a first eight-bit occupancy word occWi.i is constructed to represent the occupancy word of the root node. The least significant occupancy bit of the first eight-bit occupancy word occWu represents or indicates the occupancy of the first sub-cube of the eight sub-cubes in Morton order, the second least significant occupancy bit of the first eight-bit occupancy word occWu represents or indicates the occupancy of the second sub-cube of the eight sub-cubes in Morton order, etc.
[0072] Each of the two occupied sub-cubes 304 and 306 corresponds to a node off the root node in a second level of occupancy tree 300. The two occupied sub-cubes 304 and 306 are each further split into eight sub-cubes. One of the sub-cubes 308 of the eight sub-cubes split from sub-cube 304 is occupied, while the other seven sub-cubes are unoccupied. Three of the sub-cubes 310, 312, and 314 of the eight sub-cubes split from sub-cube 306 are occupied, while the other five sub-cubes of the eight sub-cubes split from sub-cube 306 are unoccupied. Two second eight-bit occupancy words occW2,i and occW2,2 are constructed in this order to respectively represent the occupancy word of the node corresponding to sub-cube 304 and the occupancy word of the node corresponding to sub-cube 306.
[0073] Each of the four occupied sub-cubes 308, 310, 312, and 314 corresponds to a node in a third level of occupancy tree 300. The four occupied sub-cubes 308, 310, 312, and 314 are each further split into eight sub-cubes or 32 sub-cubes in total. Four third eight-bit occupancy words occWa.i, occW^, occW;:3 and occW.34 are constructed in this order to respectively represent the occupancy word of the node corresponding to sub-cube 308, the occupancy word of the node corresponding to sub-cube 310, the occupancy word of the node corresponding to sub-cube 312, and the occupancy word of the node corresponding to sub-cube 314.
[0074] Following the scanning order discussed above, the occupancy words of this exemplary occupancy tree 300 may be entropy coded (e.g., entropy encoded by an encoder and entropy decoded by a decoder) as the succession of the seven occupancy words occWi ,1 to occW . As a consequence of the breadth-first scanning order, when entropy coding the occupancy word of a current child node belonging to a current parent node, the occupancy words of all nodes having the same depth (or level) as the current parent node have already been entropy coded. In addition, the occupancy words of all nodes having the same depth (or level) as the current child node and having a lower Morton order than the current child node have also already been entropy coded. Part of these already coded occupancy words may be used to entropy code the occupancy word of the current child node. For example, the already coded occupancy words of neighboring parent and child nodes may be used to entropy code the occupancy word of the current child node. When entropy coding a particular occupancy bit of the occupancy word of the current child node, the occupancy bits of the occupancy word having a lower Morton order than the particular occupancy bit have also already been entropy coded and may be used to code the occupancy bit of the occupancy word of the current child node.
[0075] FIG. 4 illustrates an example neighborhood of cuboids with already-coded (previously-coded) occupancy bits that may be used to entropy code the occupancy bit of a current child cuboid 400. The neighborhood of cuboids with already-coded occupancy bits may be determined based on the scanning order of an occupancy tree representing theDocket No.: 24-2055PCTgeometry of the cuboids in FIG.4 as discussed above. As illustrated in FIG. 4, current child cuboid 400 belongs to a current parent cuboid 402. Following the scanning order of the occupancy words and occupancy bits of nodes of the occupancy tree, the occupancy bits of four child cuboids 404, 406, 408, and 410, belonging to the same current parent cuboid 402, have already been coded. Also, the occupancy bit of child cuboids 412 of preceding parent cuboids have already been coded. Furthermore, the occupancy bits of parent cuboids 414, for which the occupancy bits of child cuboids have not already been coded, have already been coded. Therefore, the already-coded occupancy bits of cuboids 404, 406, 408, 410, 412, and 414 may be used to code the occupancy bit of the current child cuboid 400.
[0076] The number of possible occupancy configurations for a neighborhood of a current child cuboid may be 2N, where N is the number of cuboids in the neighborhood of the current child cuboid with already-coded occupancy bits. The neighborhood of the current child cuboid may comprise several dozens of cuboids, among them the 26 adjacent parent cuboids sharing a face, an, edge, or a vertex with the parent cuboid of the current child cuboid and also several adjacent child cuboids (with occupancy bits already coded) sharing a face, an edge, or a vertex with the current child cuboid. Even limited to a subset of the adjacent cuboids, the occupancy configuration for a neighborhood of the current child cuboid may have billions of possible occupancy configurations making its direct use impractical. The occupancy configuration for a neighborhood of the current child cuboid may be used by an encoder and / or decoder to select the context (or equivalently the probability model), among a set of contexts, of a binary entropy coder (e.g. , binary arithmetic coder) that codes the occupancy bit of the current child cuboid. The context-based binary entropy coding may be similar to the Context Adaptive Binary Arithmetic Coder (CABAC) used in MPEG-H Part 2 (also known as High Efficiency Video Coding (HEVC)).
[0077] Several methods may be used by an encoder and / or decoder to reduce the occupancy configurations for a neighborhood of a current child cuboid being coded to a practical number of reduced occupancy configurations. Firstly, the 26or 64 occupancy configurations of the six adjacent parent cuboids sharing a face with the parent cuboid of the current child cuboid may be reduced to 9 occupancy configurations by using geometry invariance. Secondly, an occupancy score for the current child cuboid may be obtained from the 226occupancy configurations of the 26 adjacent parent cuboids. The score may be further reduced into a ternary occupancy prediction (“predicted occupied”, “unsure”, “predicted unoccupied”) by applying score thresholds. Thirdly, the number of occupied and the number of unoccupied adjacent child cuboids may be used instead of the individual occupancies of these child cuboids.
[0078] An encoder and / or decoder employing one or more of the above methods may reduce the number of possible occupancy configurations for a neighborhood of a current child cuboid to a more manageable number (e.g., a few thousands). However, it has been observed that instead of associating a reduced number of contexts (or probability models) directly to the reduced occupancy configurations, another mechanism may be used, namely Optimal Binary Coders with Update on the Fly (OBUF). An encoder and / or decoder may implement OBUF to limit the number of contexts to a lower number (e.g., 32 contexts).Docket No.: 24-2055PCT
[0079] OBUF may use a limited number (e.g. , 32) of contexts that may be fixed. These contexts may be ordered, referred to by a context index (e.g., a context index in the range of 0 to 31), and associated from a lowest virtual probability to a highest virtual probability to code a 1. A Look-Up Table (LUT) of context indices may be initialized at the beginning of a point cloud coding process. For example, the LUT may initially point to a context (e.g., context with context index 15), among the limited number of contexts, with the median virtual probability to code a 1 for all input. This LUT may take an occupancy configuration for a neighborhood of current child cuboid as input and output the context index associated with the occupancy configuration. Consequently, the LUT may have as many entries as reduced occupancy configurations (e.g., around a few thousand). The coding of the occupancy bit of a current child cuboid may follow the steps of determining the reduced occupancy configuration of the current child node, obtaining a context index by applying the reduced occupancy configuration as an entry to the LUT, coding the occupancy bit of the current child cuboid by using the context pointed to (or indicated) by the context index, and finally updating the LUT entry corresponding to the reduced occupancy configuration depending on the value of the coded occupancy bit of the current child cuboid. If a binary 0 (e.g., indicating the current child cuboid is unoccupied) is coded, the LUT entry may be decreased to a lower context index value, and if a binary 1 (e.g., indicating the current child cuboid is occupied) is coded, the LUT entry may be increased to a higher context index value. The update process of the context index may be based on a theoretical model of optimal distribution for virtual probabilities associated with the limited number of contexts. This virtual probability for a context may be fixed by a model and may be different from the internal probability of the context that evolves during the coding of bits of data. The evolution of the internal context may follow a well-known process similar to the process in CABAC.
[0080] An encoder and / or decoder may implement a "dynamic OBUF” scheme that may handle a much larger number of occupancy configurations for a neighborhood of a current child cuboid than can be handled by general OBUF, while maintaining complexity within reasonable bounds. The use of a larger number of occupancy configurations for a neighborhood of a current child cuboid may lead to improved compression capabilities. By using an occupancy tree compressed by OBUF, an encoder and / or decoder may reach a lossless compression performance as good as 1 bit per point (bpp) for coding the geometry of dense point clouds. An encoder and / or decoder may implement dynamic OBUF to potentially further reduce the bitrate by more than 25% to 0.7 bpp.
[0081] OBUF may not take as input a large variety of reduced occupancy configurations for a neighborhood of a current child cuboid, thus potentially leading to a loss of useful correlation. The size of the LUT of context indices may be increased to handle more various occupancy configurations for a neighborhood of a current child cuboid as input. However, by doing so, statistics maybe diluted, and compression performance maybe reduced. For example, if the LUT has millions of entries and the point cloud has a hundred thousand points, then most of the entries are never visited. Worse yet, many entries may be visited only a few times and their associated context indices may not be updated enough times to reflect any meaningful correlation between the occupancy configuration value and the probability of occupancy of the current child cuboid. Dynamic OBUF may be implemented to mitigate the dilution ofDocket No.: 24-2055PCTstatistics due to the increase in the number of occupancy configurations for a neighborhood of a current child cuboid. This mitigation is performed by a “dynamic reduction” of occupancy configurations in dynamic OBUF.
[0082] Dynamic OBUF may add an extra step of reduction of occupancy configurations for a neighborhood of a current child cuboid before applying the LUT of context indices. This step may be called a dynamic reduction because it evolves based on the progress of the coding of the point cloud or, more precisely, based on already visited occupancy configurations.
[0083] As discussed above, many possible occupancy configurations for a neighborhood of a current child cuboid are potentially involved but only a subset may be visited during the coding of a point cloud. This subset may characterize the type of the point cloud. For example, when coding AR or VR dense point clouds, most of the visited occupancy configurations may exhibit occupied adjacent cuboids of a current child cuboid. On the other hand, when coding sensor-acquired sparse point clouds, most of the visited occupancy configurations may exhibit only a few occupied adjacent cuboids of a current child cuboid. The role of the dynamic reduction may be to obtain a more precise correlation based on the most visited occupancy configuration while putting aside (or reducing aggressively) other occupancy configurations that are much less visited. The dynamic reduction may be updated on-the-fly, as detailed below, after each visit of an occupancy configuration during the coding of occupancy data.
[0084] FIG. 5 illustrates an example of a dynamic reduction function DR that may be used in dynamic OBUF. The dynamic reduction function DR may be obtained by masking bits ft of occupancy configurations 500:P = Pi ... ftmade of K bits The size of the mask may decrease when occupancy configurations are visited a certain number of times. The initial dynamic reduction function DR0may mask all bits for all occupancy configurations such that it is a constant function DR°(P) = 0 for all occupancy configurations ft After each coding of an occupancy bit, the dynamic reduction function may evolve from a function DRnto an updated function DRn+1. The function may be defined by:P’ = DRn(P) = Pl ... Pkn(p)where kn(P) 510 is the number of non-masked bits. The initialization of DR0may correspond to ko(P)=O, and the natural evolution of the reduction function towards finer statistics may lead to an increasing number of non-masked bits kn(P) kn+i (P). The dynamic reduction function may be entirely determined by the values of knfor all occupancy configurations ft
[0085] The visits to occupancy configurations may be tracked by a variable NV(ft) for all dynamically reduced occupancy configurations ft= DRr(P). After the coding of an occupancy bit based on an occupancy configuration pv, the corresponding number of visits N V(PV') may be increased by one. If this number of visits N V(PV') is greater than a threshold thv,NV(PV’) > thvthen the number of unmasked bits kn(P) may be increased by one for all occupancy configurations ft being dynamically reduced to ft'. Practically, this corresponds to replacing the dynamically reduced occupancy configuration pv’ by the two new dynamically reduced occupancy configurations P0' and ft’ defined byDocket No.: 24-2055PCT0°’ = pv-o = 0V1.. vkn(p)Oand p = 0V’1 = 0V, ... pvkn(p)1.In other words, the number of unmasked bits has been increased by one kn+i (P) = kn(P) + 1 for all occupancy configurations 0 such that DRn(0) = 0V’. The number of visits of the two new dynamically reduced occupancy configurations may then be initialized to zero:NV(P°’) = NV(p1') = 0. (I)At the start of the coding, the initial number of visits for the initial dynamic reduction function DR0may be set to NV(DR°(P)) = NV(0) = 0,and the evolution of NV on dynamically reduced occupancy configurations may now be entirely defined.
[0086] When a dynamically reduced occupancy configuration 0V’ is replaced by the two new dynamically reduced occupancy configurations 0°’ and 01', the corresponding LUT entry LUT[PV’] may be replaced by the two new entries LUT[0°'j and LUT[P1'] that are initialized by the context index associated with pv',LUT[p0’] = LUT[P1’] = LUT[pv’], (II)and then evolve separately. The evolution of the LUT of context indices on dynamically reduced occupancy configurations may thus be entirely defined.
[0087] The reduction function DRnmay be modeled by a series of growing binary trees Tn520 whose leaf nodes 530 are the reduced occupancy configurations 0' = DRn(0). The initial tree maybe the single root node associated with 0 = DR°(0). The replacement of the dynamically reduced to 0V' by 0°’ and 01’ corresponds to growing the tree Tnfrom the leaf node associated with 0V’ by attaching to it two new nodes associated with 0°’ and 01’. The tree Tn+1may be obtained by this growth. The number of visits NV and the LUT of context indices may be defined on the leaf nodes and evolve with the growth of the tree through equations (I) and (II).
[0088] In some examples, dynamic OBUF may be practically implemented by storage of the array NV[0'] and the LUT[0'] of context indices, as well as the trees Tn520. An alternative to the storage of the trees may be to store the array kn[0] 510 of the number of non-masked bits.
[0089] A limitation for implementing dynamic OBUF may be its memory footprint. In some applications, a few million occupancy configurations may be practically handled, leading to about 20 bits 0, constituting an entry configuration 0 to the reduction function DR. Each bit 0j may correspond to the occupancy status of a neighboring cuboid of a current child cuboid or a set of neighboring cuboids of a current child cuboid.
[0090] Higher bits 0; (e.g., 0o, 01, etc.) may be the first bits to be unmasked during the evolution of the dynamic reduction function DR. Therefore, the order of neighbor-based information put in the bits 0; may impact the compression performance. In some examples, neighboring information may be ordered from highest priority to lower priority and put in this order into the bits 0i, from higher to lower weight. For example, the priority may be, from the most important to the least important, occupancy of sets of adjacent neighboring child cuboids, then occupancy of adjacent neighboring child cuboids, then occupancy of adjacent neighboring parent cuboids, then occupancy of non- adjacent neighboring child nodes, and finally occupancy of non-adjacent neighboring parent nodes. Adjacent nodesDocket No.: 24-2055PCTsharing a face with the current child node may also have higher priority than adjacent nodes sharing an edge or, worse, only a vertex with the current child node.
[0091] FIG. 6 illustrates a flowchart of an exemplary method for coding the occupancy bit of a current child cuboid using dynamic OBUF. The method of the flowchart begins at block 602. At block 602, an encoder and / or decoder may determine the occupancy configuration P of already-coded cuboids in a neighborhood of the current child cuboid. At block 604, the encoder and / or decoder may dynamically reduce the occupancy configuration [3 into a reduced occupancy configuration P’ = D Rn(P). At block 606, the encoder and / or decoder may lookup context index LUT[P’] in the LUT of the dynamic OBUF. At block 608, the encoder and / or decoder may select the context (or probability model) pointed to by the context index. At block 610, the encoder and / or decoder may entropy code (e.g., arithmetic code) the occupancy bit of the current child cuboid based on the context. Thus, the occupancy bit of the current child cuboid may be coded based on occupancy bits of the already-coded cuboids neighboring the current child cuboid .
[0092] Although not shown in FIG. 6, the encoder and / or decoder may further update the reduction function DR1into DRn+1and update the context index LUT[P’] based on the occupancy bit of the current child cuboid. In addition, the method of FIG. 6 may be repeated for additional or all child cuboids of parent cuboids corresponding to nodes of the occupancy tree in a scan order, such as the scan order discussed above with respect to FIG. 3.
[0093] In general, the occupancy tree is a lossless compression technique. The occupancy tree may be adapted to provide lossy compression by modifying the point cloud on the encoder side (e.g., down-sampling, removing points, moving points, etc.) but the lossy compression performance may be reduced / weak. However, the use of the occupancy tree as a lossless compression technique may be very useful for dense point clouds.
[0094] One approach to lossy compression for point cloud geometry may be to set the maximum depth of the occupancy tree to not reach the smallest volume size of one voxel but instead to stop at a bigger volume size (e.g., NxNxN cubes, where N > 1). The geometry of the points belonging to each occupied leaf node associated with the bigger volumes may then be modeled. This approach may be particularly suited for dense and smooth point clouds that may be locally modeled by smooth functions like planes or polynomials. The coding cost may become the cost of the occupancy tree plus the cost of the local model in each of the occupied leaf nodes.
[0095] A scheme for modeling the geometry of the points belonging to each occupied leaf node, associated with a volume size larger than one voxel, may use sets of triangles as local models. This scheme may be referred to as the “TriSoup” scheme. TriSoup is short for “Triangle Soup” because the connectivity between triangles may not be part of the models. An occupied leaf node, of an occupancy tree, that corresponds to a cuboid with a volume greater than one voxel may be referred to as a TriSoup node. An edge belonging to at least one cuboid corresponding to a TriSoup node may be referred to as a TriSoup edge. A TriSoup node may comprise a presence flag (Sk) for each TriSoup edge of its corresponding occupied cuboid. A presence flag (Sk) of a TriSoup edge may indicate (a presence of or) whether a TriSoup vertex (Vk) is present or not on the TriSoup edge. At most one TriSoup vertex (Vk) may be presenton a TriSoup edge. For each vertex (Vk) present on a TriSoup edge of an occupied cuboid, the TriSoup node corresponding to the occupied cuboid may further comprise a position (pk) of the vertex (Vk) along the TriSoup edge.Docket No.: 24-2055PCT
[0096] In addition to the occupancy words of an occupancy tree, an encoder may entropy encode, for each TriSoup node of the occupancy tree, a TriSoup vertex presence flag (and a position of a TriSoup vertex, if present, along a TriSoup edge) of each TriSoup edge belonging to the TriSoup node. A decoder may similarly entropy decode the TriSoup vertex presence flags and positions of each TriSoup vertex along a respective TriSoup edge belonging to a TriSoup node of the occupancy tree, in addition to the occupancy words of the occupancy tree.
[0097] FIG. 7 illustrates an example of an occupied cube 700 of size NxNxN (where N > 1) that corresponds to a TriSoup node of an occupancy tree. Occupied cube 700 comprises TriSoup edges 710-721. The TriSoup node, corresponding to occupied cube 700, comprises a presence flag (sk) for each TriSoup edge of TriSoup edges 7 IQ- 721. The presence flag of TriSoup edge 714 indicates that a TriSoup vertex Vi is present on TriSoup edge 714. The presence flag of TriSoup edge 715 indicates that a TriSoup vertex V2 is present on TriSoup edge 715. The presence flag of TriSoup edge 716 indicates that a TriSoup vertex V3 is present on TriSoup edge 716. The presence flag of TriSoup edge 717 indicates that a TriSoup vertex V4 is present on TriSoup edge 718. The presence flags of the remaining TriSoup edges each indicates that a TriSoup vertex is not presenton their corresponding TriSoup edge. The TriSoup node, corresponding to occupied cube 700, further comprises a position (pk) for each TriSoup Vertex presentalong one of its TriSoup edges 710-721 More specifically, the TriSoup node (corresponding to occupied cube 700) further comprises a position pi for TriSoup vertex Vi , a position P2 for TriSoup vertex V2, a position pa for TriSoup vertex V3, and a position p4 for TriSoup vertex V4. The TriSoup vertices may be shared among TriSoup nodes along TriSoup edge(s) in common.
[0098] In some examples, a presence flag (Sk) and, if the presence flag (Sk) indicates the presence of a vertex, a position (pk) (the presence flag (Sk) and position (pk) individually or collectively referred to as vertex information) of the vertex along a current T riSoup edge may be entropy coded based on already-coded presence flags and positions (of present TriSoup vertices) of TriSoup edges that neighbor the current TriSoup edge. A presence flag (Sk) and, if the presence flag (Sk) indicates the presence of a vertex, a position (pk) on (e.g., indicating a position of the vertex along) a current TriSoup edge may be additionally or alternatively entropy coded based on occupancies of cuboids that neighbor the current T riSoup edge. Similar to the entropy coding of the occupancy bits of the occupancy tree, a configuration [3TS for a neighborhood (also referred to as a neighborhood configuration |3TS) of a current TriSoup edge may be obtained and dynamically reduced into a reduced configuration PTS’ = DRn(PTS) by using a dynamic OBUF scheme for TriSoup. A context index LUTf TS'] maybe obtained from the OBUF LUT and at least a part of the vertex information of the current TriSoup edge may be entropy coded using the context (or probability model) pointed to by the context index.
[0099] In order to use a binary entropy coder to entropy code at least part of the vertex information of the current TriSoup edge, the TriSoup vertex position (pk) (if present) along its TriSoup edge may be binarized. A number of bits Nb may be set for the quantization of the TriSoup vertex position (pk) along the TriSoup edge of length N that is uniformly divided into 2Nbquantization intervals. By doing so, the TriSoup vertex position (pk) may be represented by Nb bits (pkj, j=1 ,... ,Nb) that may be individually coded by the dynamic OBUF scheme as well as the bit correspondingDocket No.: 24-2055PCTto the presence flag (Sk). The neighborhood configuration PTS, the OBUF reduction function DRn, and thus the context index may depend on the nature / characteristic / property of the coded bit (presence flag (Sk), highest position bit (pki ), second highest position bit (pk2), etc.). Therefore, there may be several dynamic OBUF schemes implemented, with each dedicated to a specific bit of information (presence flag (Sk) or position bit (p )) of the vertex information.
[0100] FIG.8A illustrates a cuboid 800 (e.g., a cube) corresponding to a TriSoup node with a number K of TriSoup vertices Vk. Within cuboid 800, T riSoup triangles may be constructed from the T riSoup vertices Vk if at least three (Ks3) TriSoup vertices are present on the TriSoup edges of cuboid 800. In the example of FIG. 8A, 4 TriSoup vertices are present and therefore TriSoup triangles are constructed. The TriSoup triangles may be constructed around the centroid vertex C defined as the mean of the TriSoup vertices Vk In some examples, to construct the TriSoup triangles, a dominant direction may first be determined, then vertices Vk may be ordered by turning around this direction, and finally the following K TriSoup triangles (listed as triples of vertices) are constructed: V1V2C, V2V3C, .... VKVIC. The dominant direction may be chosen among the three directions parallel to the axis of the 3D space to increase or maximize the 2D surface of the triangles when projected along the dominant direction. By doing so, the dominant direction may be somewhat perpendicular to a local surface defined by the points of the point cloud belonging to the TriSoup node.
[0101] FIG. 8B illustrates a refinement to the TriSoup model by coding a centroid residual vector Cresinto the bitstream such as to use C+Cres instead of C as a pivoting vertex for constructing / generating the triangles. By doing so, the vertex C+Cres may be closer to the points of the point cloud than the centroid C used to model the points, which reduces the reconstruction error and leads to lower distortion at the cost of a small increase in bitrate needed for coding Ores-
[0102] FIG. 8C illustrates a more detailed example of coding a centroid residual vector Cresin / from the bitstream such that an adjusted centroid C+Cres is used instead of centroid C for generating T riSoup triangles of a cuboid 800 (corresponding to a TriSoup node) corresponding to a portion of a point cloud, according to some embodiments. For example, the triangles may be generated based on adjusted centroid C+Cresand adjacent pairs of vertices of an ordering of the vertices V1-V4, determined as described above with respect to FIG.8A. Further, as described above, the TriSoup triangles of the cuboid may be voxelized at the decoder to generate voxels representing (or modeling) the portion, of the point cloud, corresponding to the cuboid. A unit vector n (i.e., also referred to as a normalized vector) may be determined as a normalized mean vector of normal vectors to the triangles (V1V2C, V2V3C, .... VKVIC) constructed by centroid C and pairs of the vertices of the cuboid by pivoting around the centroid C (e.g., as described in FIG. 8A). For example, the unit vector n may be determined as the normalized vector based on a mean of crossproducts representing areas of the trianglesx V2C + V2C x V3C + ••• + VKC x V C / K. For example, the unit vector n may be determined by dividing the mean vector (n) by the norm (or length) of the mean vector (i.e., n = n / ||n||).Docket No.: 24-2055PCT
[0103] A value resulting from each cross product is equal to an area of a parallelogram formed by the two vectors in the cross product. Therefore, the value may be representative of an area of a triangle formed by the two vectors because the area of the triangle is equal to half of the value. Accordingly, since the vector n indicates a direction of the triangles (e.g. , TriSoup triangles) representing (e.g., modeling) the portion of the point cloud, the vector n may be indicative of the direction normal to a local surface representative of the portion of the point cloud. In some examples, to maximize the effect of the centroid residual while minimizing its coding cost, a one-component residual aresalong the line (C, n) 810 may be coded instead of a 3D residual vector.^res ^res^The residual value aresmay be determined by the encoder as the intersection between the current point cloud and the line (C, n), which is along the same direction of the normalized vector n. For example, a set of points, of the portion of the point cloud, closest (e.g., within a threshold distance, a threshold number of points) to the line may be determined. The set of points may be projected on the line and the residual value ares may be determined as the mean component along the line of the projected points. In some examples, the mean may be determined as a weighted mean whose weights depend on the distance of the set of points from the line. For example, a point from the set closer to the line may have a higher weight than another point from the set farther from the line.
[0104] In some examples, the residual value ores may be quantized. For example, it may be quantized by a uniform quantization function having quantization step similar to the quantization precision of the TriSoup vertices Vk. By doing so, the quantization error may be maintained to be uniform over all vertices Vk and C+Cressuch that the local surface is uniformly approximated.
[0105] In some examples, the residual value ares may be binarized and entropy coded into the bitstream, e.g., by using a unary-based coding scheme. In some examples, the residual value aresmay be coded using a set of flags. For example, a flag fo may be coded to indicate if the residual value aresis equal to zero. If the flag fo indicates the residual value Ores is zero, no further syntax elements may be needed. If the flag fo indicates the residual value ares is not zero, a sign bit indicating a sign may be coded and the residual magnitude |ares|- 1 may be coded using an entropy code. For example, the residual magnitude may be coded using a unary coding scheme that codes successive flags f (i>1) indicating if the residual value magnitude |ores| is equal to T. A binary entropy coder may binarize the residual value cires into the flags f (i>0) and entropy code the binarized residual value as well as the sign bit.
[0106] In some examples, compression of the residual value aresmay be improved by determining bounds as shown in FIG. 8C. As shown, the line (C, n) 810 intersects the current cuboid 800 (corresponding to a TriSoup node) at two bounding points 820 and 821 and the encoder may impose that the adjusted centroid vertex C+Cres is located between the two bounding points 820 and 821. These bounding points 820 and 821 also bound the residual value ares(which may be quantized) as belonging to an integral interval [m, M] where m < 0 < M. By doing so, some bits of the binarized residual value aresmay be inferred. For example, if m=M=0, then residual value aresis necessarily equal to zero. In another example, if m=0<M, then the sign bit is necessarily positive. More generally, if the residual value ares is not equal to zero and its sign is known, its magnitude |ares| may be determined to be bounded by either |m| or MDocket No.: 24-2055PCTsuch that the magnitude may be coded by a truncated unary coding scheme that may infer the value of the last of successive flags f (lai).
[0107] In some examples, the binary entropy coder used to code the binarized residual value aresmay be a context- adaptive binary arithmetic coder (CABAC) such that the probability model (also referred to as a context or an entropy coder) used to code at least one bit (e.g., f: or sign bit) of the binarized residual value Ores are updated depending on precedingly coded bits. In some examples, the probability model of the binary entropy coder may be determined based on contextual information such as the values of the bounds m and M, the position of vertices Vk, or the size of the cuboid. In some examples, the selection of the probability model (i.e., also referred equivalently as an entropy coder or context) may be performed by a dynamic OBUF scheme with the contextual information described above as inputs.
[0108] The reconstruction of a decoded point cloud from the set of TriSoup triangles may be referred to as “voxel ization" and may be performed, e.g., by ray tracing or rasterization, for each triangle individually before duplicate voxels from the voxelized triangles are removed.
[0109] FIG. 9A illustrates an example of voxelization using ray tracing, according to some embodiments. For example, ray-triangle intersection algorithms, such as the Moller-Trumbore algorithm, rely on launching rays to determine whether rays intersect with TriSoup triangles and if so, at what points of the TriSoup triangles. Rays may be launched from integral coordinates that correspond to the centers of voxels. As illustrated by FIG. 9A, rays such as ray 900 may be launched parallel to one of the three coordinate axes of the 3D space, starting from integral coordinates (sometimes referred to as integer coordinates) such as an origin point 905 (shown as origin or starting point Pstart).
[0110] An intersection point 904 (shown as Pint), if any, between ray 900 and a TriSoup triangle 901 belonging to a cube 902, corresponding to a TriSoup node, may be rounded (e.g., quantized) to obtain a decoded point corresponding to a voxel. For example, a ray, launched parallel to a coordinate axis in 3D space, may intersect a TriSoup triangle if and only if the projection, along the ray direction, of the center of a voxel belongs to the TriSoup triangle. In other words, the ray may be determined to intersect the TriSoup triangle if the point of intersection corresponds to the center of the voxel. In some examples, this intersection may be determined by applying a raytriangle intersection algorithm (e.g., tracing or ray casting technique) such as the Moller-Trumbore algorithm to generate voxels representing the triangle.
[0111] Ray tracing techniques such as the Moller-Trumbore algorithm is based on generating, with respect to a triangle, barycentric coordinates of points of intersection between rays and a plane of the triangle. Then, points of the triangle may be determined from the barycentric coordinates.
[0112] FIG. 9B illustrates an example of voxelization using barycentric coordinates (u, v, w) of a point 912 (P) relative to a TriSoup triangle 910 having vertices labeled A, B, and C in the 3D space, according to some embodiments. In some examples, point 912 may be determined as an intersection between a ray and a plane of TriSoup triangle 910 (e.g., containing or passing through the three vertices A, B, and C of TriSoup triangle 910). For example, the ray mayDocket No.: 24-2055PCTbe launched parallel to one of the three coordinate axes in 3D space. In some examples, this intersection point 912 may be uniquely represented as a sum of the three vertices of TriSoup triangle 910:P= uA + vB + wCunder the condition u + v + w = 1. Therefore, any point P of the plane (containing TriSoup triangle 910) has unique coordinates (u,v,w) in the barycentric coordinate system. A point with barycentric coordinates (u,v,w) includes an ordered triple of numbers u, v, and w. A point with barycentric coordinates (u,v,w) that sum to 1 (i.e., u + v + w= 1) is known as homogeneous barycentric coordinates or normalized barycentric coordinates. The barycentric coordinates of the intersection point with respect to TriSoup triangle 910 may be determined using, e.g. , the well-known Mbller- Trumbore algorithm.
[0113] By converting points with Cartesian coordinates in 3D space to homogeneous barycentric coordinates, the three vertices A, B, C of TriSoup triangle 910 have respective barycentric coordinates A(1 ,0,0), B(0,1,0) and C(0,0,1). In some examples, the convex hull (i.e., TriSoup triangle 910) of the three vertices A, B, and C is equal to the set of all points such that the barycentric coordinates u, v, and w is each greater than or equal to zero:0 < u, v, w
[0114] Therefore, in some examples, the intersection point may be determined to belong to TriSoup triangle 910 based on the intersection point having barycentric coordinates with an ordered triple of values that is each greater than or equal to zero. Relatedly, if at least one of barycentric coordinates (i.e., one of u, v, or w) is negative or less than 0, then the intersection point may be determined to not belong to TriSoup triangle because it will be on the plane, but not on an edge or within the TriSoup triangle. In some examples, a point determined to belong to TriSoup triangle 910 may be the ray intersecting TriSoup triangle 910 (e.g., within or at an edge of TriSoup triangle 910).
[0115] Attribute coding is a process to code attributes of a current point cloud, e.g., attributes associated with the geometry of the current point cloud. Attributes coding may be performed globally on the decoded (e.g., reconstructed) geometry of a current point cloud, but such global coding induces high memory traffic and footprint as well as high computation complexity. A two-pass encoding / decoding on the geometry and then on the attributes, after completion of geometry encoding / decoding, is required and induces even higher memory traffic and footprint as well as overall latency before outputting geometry and attributes of a first point of the decoded point cloud.
[0116] Attribute Coding Units (ACU) has been introduced to enable local coding of attributes. ACU may be determined by segmenting an overall decoded geometry (e.g., decoded geometry 1013 of FIG. 10) of a current point cloud (e.g., current point cloud 1011 of FIG. 10) into a setof ACUs. Each ACU comprises (e.g., contains) a portion of geometry of the decoded geometry, which indicates 3D positions in the 3D space of a subset of points of the decoded geometry (e.g., as decoded by the decoder or encoded and then decoded by the encoder). As used herein, points of the decoded geometry may refer to voxels, as described above.
[0117] The attributes encoding / decoding is then localized to portions of the overall decoded geometry and the attribute coding of each ACU (associated with each portion of geometry of the current point) may be processedDocket No.: 24-2055PCTlocally. Memory traffic and footprint as well as computation complexity are then reduced compared to global attribute coding.
[0118] For example, the geometry of the current point cloud may be restricted to a subset of nodes of the occupancy tree and the restricted geometry, and the associated attributes may be encoded / decoded locally by segmenting the restricted geometry into AC Us.
[0119] For example, the 3D space encompassing a point cloud may be split into regions defined by subsets of nodes of the occupancy tree. The geometry of a first region of the 3D space may be encoded to obtain a first part of the decoded geometry of the current pint cloud that is segmented into a first set of ACUs that are attribute encoded. Then, the geometry of a second region of the 3D space may be encoded to obtain a second part of the decoded geometry of the current point cloud that is segmented into a second set of ACUs that are attribute encoded, etc. The memory footprint and traffic are thus limited within a few regions, due to some neighborhood prediction between regions, and the latency of the point cloud codec is reduced to the time needed for encoding geometry and attributes of a few regions. Smaller regions will lead to smaller memory footprint and traffic, and to shorter latency. In some examples, the regions may be slices (also referred to as bricks) of a volume encompassing / containing the point cloud.
[0120] FIG. 10 illustrates an example process 1000 for encoding geometry and attributes of a current point cloud (1011), according to some embodiments.
[0121] For example, process 1000 may be performed by an encoder (e.g., encoder 114 of FIG. 1). In some examples, blocks 1010-1030 may represent components within the encoder.
[0122] The current point cloud (1011) may be a point cloud frame of a sequence of point cloud frames of a dynamic point cloud.
[0123] At block 1010, an encoder may encode into a bitstream (1090) a geometry information (1012) representative of the geometry of the current point cloud (1011). The encoder may obtain a decoded (e.g., reconstructed) geometry (1013) of the point cloud (1011) as discussed above.
[0124] At block 1020, the encoder may determine at least one portion of geometry (1022) of the decoded geometry (1013). Each portion of geometry (1022) comprises a subset of points of the decoded geometry (1013), e.g., comprising positions in the 3D space of the subset of points.
[0125] In some examples, the decoded geometry (1013) may be segmented into a set of ACUs, and each ACU of the set of ACUs comprises a respective portion of geometry (1022) of the decoded geometry (1013),
[0126] For example, the encoder may further encode, in the bitstream (1090), portion information (1021), for example, as part of the attribute information (1032).
[0127] For example, the portion information (1021) may indicate segmentation selections of the decoded geometry (1013) into the set of ACUs, e.g., the portion information (1021) indicates how the decoded geometry (1013) is segmented into the set of ACUs.
[0128] Attributes of the current point cloud (1011), e.g., attributes associated with the points of the current point cloud, are typically coded after the coding (e.g., including encoding and / or decoding) of the underlying geometry hasDocket No.: 24-2055PCTbeen performed. If the geometry coding is a lossless coding (e.g., by using an octree scheme), the encoder has direct access to the attribute values associated with the decoded geometry. The attributes associated with each point of the current point cloud are the attributes associated with the corresponding point of the decoded geometry (1013).
[0129] In some examples, if the geometry coding is a lossy coding (e.g., by using a TriSoup scheme), the decoded geometry (1013) differs from the geometry of the current point cloud. In these examples, the attributes (10111) of the current point cloud maybe mapped by the encoder from the geometry of the current point cloud (1011) to the decoded geometry (1013) such as to determine mapped attributes (1031) associated with the decoded geometry (1013). For example, the mapped attributes are assigned to each point of the decoded geometry (1013).
[0130] At block 1025, the encoder may determine mapped attributes (1031) of the decoded geometry (1013) by mapping the attributes (10111) of the current point cloud (1011) to the decoded geometry (1013).
[0131] As discussed above, attributes may indicate a property of a point's visual appearance such as texture, color, material, transparency, reflectance, time stamp, velocity, etc. For attributes that are colors, this attribute mapping performed by the encoder is known as a recoloring process because the colors of the original geometry are used to color (e.g., recolor) the decoded geometry.
[0132] In some examples, attributes comprise colors and the mapped attributes may be determined based on recoloring the attributes.
[0133] In some examples, mapped attributes may be determined based on a k nearest neighbor (KNN) search algorithm (e.g., using a space partitioning algorithm such as a KD Tree search, a Ball / metric Tree search, brute force search, etc.) to determine nearest points from the geometry of the current point cloud (1011 ) to the decoded geometry (1013). For example, a mapped attribute of a point of the decoded geometry (1013) may be the average attribute values associated with the nearest points of the current point cloud (1011) relative to the point of the decoded geometry (1013).
[0134] In some examples, when the geometry compression is lossless, the decoded geometry (1013) is the same as the geometry of the current point cloud (1011). In these examples, the attribute mapping associates attributes of each point of the current point cloud ( 1011 ) to the same point of the decoded geometry (1013).
[0135] At block 1030, the encoder may encode, in the bitstream (1090), the mapped attributes (1031) associated with each portion of geometry (1022). For example, the attributes of points associated with a portion of geometry (1022) may be encoded before encoding the attributes of points belonging to another portion of the decoded geometry (1013). Thus, attribute encoding may be performed locally. The encoder may encode attribute information (1032) representing the encoded attributes.
[0136] For example, attributes associated with portions of the decoded geometry (1013) are encoded based on a scanning order. The encoder may thus distinguish between already coded / decoded portions of the decoded geometry (1013), i.e., portions of the decoded geometry (1013) whose attributes have been encoded / decoded, from other portions of the decoded geometry (1013) whose attributes have not been encoded / decoded yet.Docket No.: 24-2055PCT
[0137] FIG. 11 illustrates an example process 1100 for encoding attributes associated with a portion of geometry (1022) of the decoded geometry (1013), according to some embodiments.
[0138] For example, process 1100 may be performed by an encoder (e.g., encoder 114 of FIG. 1). In some examples, blocks 1110-1130 may represent components within the encoder.
[0139] Process 1100 comprises operations of block 1030 that encodes, in a bitstream (1190), mapped attributes (1031) associated with the portion of geometry (1022) as attribute information (1032).
[0140] At block 1110, an encoder selects an attribute coding mode (1112) for encoding the attributes of the portion of geometry (1022). The encoder may further encode, in the bitstream (1190), mode information (1042) that indicates the selected attribute coding mode (1112).
[0141] For example, the attribute information (1032) encoded in bitstream (1190) may comprise the mode information (1042).
[0142] In some implementations of point cloud coding, the selection (block 1110) of the attribute coding mode (1112) by the process 1100 is performed by a Rate-Distortion Optimization (RDO) such as to select the attribute coding mode (1112) as being a candidate attribute coding mode with a smallest cost (Cmode) from a list of candidate attribute coding modes. The cost (Cmode) of a candidate attribute coding mode may be computed as a Lagrange cost, which is a combination of a bitrate RmOde and a distortion Dmode as Cmode = Dmode + A*Rmode, where A>0 is a fixed Lagrange parameter. The bitrate RmOde may be obtained for a candidate attribute coding mode as being a sum of bitrates caused by coding both the mode information (1042) and the attribute information (1032) in the bitstream (1190). The distortion may be obtained for the candidate attribute coding mode by comparing the attributes associated with portion of geometry (1022) against decoded attributes associated with the decoded portion of geometry (1222) and obtained based on the candidate attribute coding mode.
[0143] RDO may put in competition many candidate attribute coding modes including, usually, at least one inter- prediction-based attribute coding modes and at least one intra-prediction-based attribute coding modes.
[0144] An inter-prediction-based attribute coding mode is an attribute coding mode using attribute predictors obtained based on an inter-prediction mode.
[0145] An intra-prediction-based attribute coding mode is an attribute coding mode using attribute predictors obtained based on an intra-prediction mode.
[0146] At block 1120, the encoder determines attribute predictors (1124) of attributes associated with the portion of geometry (1022). For example, for each point of the portion of geometry (1022), one respective attribute predictor is determined for predicting the attribute associated with that point.
[0147] When an inter-prediction-based attribute coding mode is selected, the encoder may obtain, at block 1120, attribute predictors (1124) from an inter-prediction attribute parametric model based on Motion Vector (MV) field (1121), reference point cloud for attributes (1122) and the portion of geometry (1022). Parameters of the interprediction attribute parametric model may be set by the selected attribute coding mode (1112).Docket No.: 24-2055PCT
[0148] When the intra-prediction-based attribute coding mode is selected, the encoder may determine, at block 1120, attribute predictors (1124) from an intra-prediction attribute parametric model based on at least one already-decoded portion (1123) of the decoded geometry (1013). Parameters of the intra-prediction attribute parametric model may be set by the selected attribute coding mode (1112).
[0149] At block 1130, the encoder encodes, in the bitstream (1190), the mapped attributes (1031) associated with the portion of geometry (1022) based on the attribute predictors (1124) as attribute information (1032).
[0150] FIG. 12 illustrates an example process 1200 for decoding geometry and attributes of a current point cloud, according to some embodiments.
[0151] For example, process 1200 may be performed by a decoder (e.g., decoder 120 of FIG. 1). In some examples, blocks 1210-1230 may represent components within the decoder.
[0152] The decoded point cloud may be a point cloud frame of a sequence of point cloud frames of a dynamic point cloud. The decoded point cloud may comprise decoded geometry (1212) and decoded attributes (1233), each decoded attribute (1233) being associated with one point of the decoded geometry (1212).
[0153] At block 1210, a decoder may obtain a decoded geometry (1212) by decoding geometry information (1211) from a bitstream (1290), as discussed above. For example, the bitstream (1290) is generated by the encoding method of FIG. 10.
[0154] At block 1220, the decoder may determine at least one portion of the decoded geometry (1212).
[0155] For example, the decoded geometry (1212) may be segmented into a set of ACUs and each ACU of the set of ACUs comprises a respective portion of geometry (1222) of the decoded geometry (1212), e.g., comprising positions in the 3D space of a subset of points of the decoded geometry (1212).
[0156] For example, the decoder may further decode, from the bitstream (1290), portion information (1221), for example, from part of the attribute information (1232).
[0157] For example, the portion information (1221) may indicate segmentation selections of the decoded geometry (1212) into the set of ACUs, e.g., the portion information (1221) indicates how the decoded geometry (1212) is segmented into the set of ACUs.
[0158] At block 1230, the decoder may obtain decoded attributes (1233) associated with each portion of geometry (1212) by decoding attribute information (1232). For example, the attributes of points associated with a portion of geometry (1222) may be decoded before decoding the attributes of points belonging to another portion of the decoded geometry (1212).
[0159] For example, attributes associated with portions of the decoded geometry (1212) are decoded according to a scanning order. The decoder may thus distinguish between already decoded portions of the decoded geometry (1212), e.g., portions of the decoder geometry (1212) whose attributes have been decoded, from other portions of the decoded geometry (1212) whose attributes have not been yet decoded.
[0160] FIG. 13 illustrates an example process 1300 for decoding attributes associated with a portion of geometry (1222) of the decoded geometry (1212), according to some embodiments.Docket No.: 24-2055PCT
[0161] For example, process 1300 may be performed by a decoder (e.g., decoder 120 of FIG. 1). In some examples, blocks 1310-1330 may represent components within the decoder.
[0162] Process 1300 comprises operations of block 1230 that obtains decoded attributes (1233) associated with portion of geometry (1222) by decoding attribute information (1232) from the bitstream 1290.
[0163] At block 1310, a decoder selects an attribute coding mode (1311) for decoding the attributes of the portion of geometry (1222). The decoder may further decode, from the bitstream (1290), mode information (1231) that indicates the selected attribute coding mode (1311).
[0164] For example, the mode information (1231) may indicate either intra-prediction attribute mode or interprediction attribute mode.
[0165] At block 1320, the decoder determines attribute predictors (1324) of attributes associated with the portion of geometry (1222). For example, for each point of the portion of geometry (1222), one respective attribute predictor is determined for predicting the attribute associated with that point.
[0166] When an inter-prediction attribute mode is selected, the decoder may obtain, at block 1320, attribute predictors (1324) from an inter-prediction attribute parametric model based on Motion Vector (MV) field (1321), reference point cloud for attributes (1322) and the portion of geometry (1222). Parameters of the inter-prediction attribute parametric model may be set by the selected attribute coding mode (1311).
[0167] For example, FIGS. 13A and 13B described below shows two examples of the inter-prediction attribute parametric model that may be applied to project attributes of the reference point cloud for attributes (1322) onto the portion of geometry (1222) to determine attribute predictors (1324) for attributes of points / vertices of the portion of geometry (1222).
[0168] FIG. 13A illustrates an example of point-to-point projection distance between a point of the portion of (decoded) geometry (1313) and its nearest neighbor points of a motion compensated geometry (1319), according to some embodiments.
[0169] The motion compensated geometry (1319) is obtained by motion compensation of the reference point cloud for attributes (1316) based on MV field (1312).
[0170] In some examples, the reference point cloud for attributes (1316) may be the reference point cloud for attributes (1122, 1322). The portion of the geometry (1313) maybe the portion of geometry (1022, 1222).
[0171] For example, MV field (1312) may include a set of motion vectors including a motion vector (MV) used to translate reference point (1315) (r1) of the reference point cloud for attributes (1316) to a reference point (1314) (r1’) of the motion compensated geometry (1319). Then the nearest point of the motion compensated geometry (1319), which may include reference point (1314), in a neighborhood of the point (1317) of the portion of geometry (1313) may be selected in a search (1318) by minimizing a point-to-point projection distance between each candidate point of the motion compensated geometry (1319) in the neighborhood of the point (1317) and the point (1317).Discontinuities of the motion compensated geometry (1319) maybe introduced due to MV field (1312) which may not provide a granular translation of points. For example, MV field (1312) may include one MV applied to reference pointDocket No.: 24-2055PCTcloud for attribute (1316) or an MV determined for a set of cuboids or per cuboid of reference point cloud for attributes (1316), but not per point of reference point cloud for attributes (1316) due to high complexity. Also, since search (1318) is performed on points of the motion compensated geometry (1319), the entirely of the motion compensated geometry (11319) needs to be generated and maintained before search (1318) for each point of the portion of geometry (1313) can be performed.
[0172] For example, the point-to-point projection distance may be calculated between a point of the portion of the portion of geometry (1313) and its nearest neighbor point of a motion compensated geometry (1319).
[0173] In some embodiments, each point-to-point projection distance may be a difference between a point of the reference point cloud for attributes (1316) and its nearest neighbor point of a motion compensated geometry (1319).
[0174] For example, the motion compensated geometry (1319) may be obtained by performing motion compensation of the portion of the geometry (1313).
[0175] In some embodiments, the motion compensated geometry (1319) may be obtained by motion compensation of the decoded geometry of the current point cloud based on MV field.
[0176] FIG. 13B illustrates an example of point-to-point projection distance between a point (1329) of the reference point cloud for attributes (1316) and itsnearest neighbor point of a motion compensated geometry (1328), according to some embodiments.
[0177] The motion compensated geometry (1328) is obtained by motion compensation of the portion of geometry (1313) based on MV field (1327). For example, MV field (1327) may include a motion vector (-MV) that is the inverse (e.g., having an opposite sign) as that in MV field (1312).
[0178] For example, point (1317) of the portion of geometry (1313) may be translated by a motion vector (-MV) of MV field (1327) to determine a motion-compensated position for point (1317) and shown as point (1326) (p1) of the motion compensated geometry (1328). The nearest point of the motion compensated geometry (1328), which may include point (1326), in a neighborhood of the point (1329) of the reference point cloud for attributes (1316) may be selected in a search (1325) by minimizing a point-to-point projection distance between each candidate point of the motion compensated geometry (1328) in the neighborhood of the point (1329) and the point (1329) of the reference point cloud for attributes (1316).
[0179] In some embodiments, the attribute projection quality may be determined based on a comparison of the mapped attributes and the corresponding projected attributes. The attribute projection quality determining may be performed by the comparison of attributes expressed on a same geometry (i.e., the portion of the (decoded) geometry (1313)) to avoid any geometry discrepancy. In some examples, the attribute projection quality may be based on differences between compared attributes. An attribute may be represented as a vector, so an attribute difference may be computed as an attribute distance between vectors representing attributes that are compared. Thus, attribute distance and attribute difference are used interchangeably herein.
[0180] In some embodiments, the attribute projection quality may be based on point-to-point attribute distances calculated for the set of points of the portion of geometry.Docket No.: 24-2055PCT
[0181] In some embodiments, the attribute projection quality may be an average or a maximum of the point-to-point attributes distances.
[0182] For example, a point-to-point attribute distance may be a difference between attributes associated with a point of the set of points of the portion of geometry and corresponding projected attributes associated with this point.
[0183] Returning to FIG. 13, when an intra-prediction attribute mode is selected, the decoder may determine, at block 1320, attribute predictors (1324) from an intra-prediction attribute parametric model based on at least one already- decoded portion (1323) of the geometry (1222). Parameters of the intra-prediction attribute parametric model may be set by the selected attribute coding mode (1311).
[0184] At block 1330, the decoder decodes attribute information (1232) from the bitstream (1290) and obtains the decoded attributes (1233) associated with the portion of geometry (1222) based on the decoded attribute information and the attributes predictors (1324).
[0185] FIG. 14 illustrates an example process 1400 for determining attribute predictors (1431) of attributes associated with a portion of geometry (1421), according to some embodiments.
[0186] For example, process 1400 may be performed by a coder such as an encoder (e.g., encoder 114 of FIG. 1) or a decoder (e.g., decoder 120 of FIG 1). In some examples, blocks 1410-1430 may represent components within the coder.
[0187] The process 1400 is performed identically at both the encoder and the decoder to determine same attribute predictors (1431). The operations of FIG. 14 may be performed separately by the encoder and the decoder.
[0188] Blocks 1410-1420 relate to a process for determining inter-prediction-based attribute predictors and block 1430 relates to a process for determining intra-prediction-based attribute predictors.
[0189] In some examples, the process for determining inter-prediction-based attribute predictors corresponds to block 1120 of FIG. 11 when the selected attribute coding mode (1112) indicates an inter-prediction attribute mode (inter mode) and to block 1320 of FIG. 13 when the selected attribute coding mode (1112) indicates an inter-prediction attribute mode (inter mode). The process for determining intra-prediction-based attribute predictors corresponds to block 1120 of FIG. 11 when the selected attribute coding mode (1112) indicates an intra-prediction attribute mode (intra mode) and to block 1320 of FIG. 13 when the selected attribute coding mode (1112) indicates an intraprediction attribute mode (intra mode).
[0190] For example, the portion of geometry (1421) may be a portion of geometry (1022) such as a portion of the decoded geometry (1013) of FIG. 10, and the attribute predictors (1431) are attribute predictors (1124) of FIG. 11. For example, the portion of geometry (1421) may be the portion of geometry (1222) of the decoded geometry (1212) of FIG. 12, and the attribute predictors (1431) are the attribute predictors (1324) of FIG. 13. Reference point cloud for attributes (1411) may correspond to the reference point cloud for attributes (1122) of FIG. 11 and the reference point cloud for attributes (1322) of FIG. 13.
[0191] Blocks 1410-1420 show operations of the process for determining inter-prediction-based attribute predictors. The process obtains projected attributes (1422) of the decoded geometry (1013, 1212). Specifically, part of projectedDocket No.: 24-2055PCTattributes (1422) is used as attribute predictors (1431) for attributes associated with points of portion of geometry (1421).
[0192] As further explained in FIG. 13Aand FIG. 13B, in some examples, the process for determining inter- prediction-based attribute predictors includes motion compensation of the geometry of the reference point cloud for attributes (1411) to generate a motion compensated geometry (1413).
[0193] In these implementations, reference point cloud for attributes (1411) may correspond to reference point cloud for attributes (1122) of FIG. 11 and / or to reference point cloud for attributes (1322) of FIG. 13.
[0194] At block 1410, the coder (e.g., encoder / decoder) obtains the motion compensated geometry (1413) by performing motion compensation of the geometry of the reference point cloud for attributes (1411) based on MV field (1412) that corresponds to MV field (1121) of FIG. 11 and MV field (1321) of FIG. 13.
[0195] The encoder may obtain the MV field (1121) by performing a motion search such that MV field (1121) approximates the 3D motion field of attributes from the reference point cloud for attributes (1411) to the attributes of the current point cloud (1011). The motion search is typically an iterative method that tests locally multiple candidate motion vectors, and selects the candidate motion vector, among the candidate motion vectors, that minimizes a distortion (e.g., cost) between the attributes of the current point cloud (1011) and attributes of the motion compensated geometry (1413) using the candidate motion vector.
[0196] In some embodiments, the distortion, used by the motion search, for a point of the current point cloud (1011) may be determined by comparing the attribute of this point and the attribute of (one of) its closest neighbor in the motion compensated geometry
[0197] The encoder may encode information that indicates the reference point cloud for attributes (1411) among multiple candidate reference point cloud for attributes. The encoder may further encode the MV field (1121 ) as geometry information (1012). The decoder may obtain the reference point cloud for attributes (1411) and MV field (1321) by decoding geometry information (1211) and attribute information (1232) from the bitstream (1290).
[0198] At block 1420, the coder determines projected attributes (1422) associated with the decoded geometry (1013, 1212) based on attributes of the reference point cloud for attributes (1411).
[0199] Projected attributes (1422) and attributes of the decoded geometry (1013, 1212) belong to the same geometry and the prediction of the attributes portion of geometry (1022, 1222) based on the projected attributes is much more efficient because the geometry discrepancy has been removed.
[0200] In some examples, the encoder / decoder may generate (e.g., build or configure) an attributes projection model that may be generated from a motion compensated geometry (1413). The attribute projection model may be used to perform attribute projection associated with the motion compensated geometry (1413). For example, the motion compensated geometry (1413) is obtained by performing motion compensation of the geometry of the reference point cloud for attributes (1411) and the attribute projection determines a projection of the motion compensated geometry (1413) onto the decoded geometry (1421). For example, the motion compensated geometry (1413) is obtained byDocket No.: 24-2055PCTperforming motion compensation of the decoded geometry (1421) and the attribute projection determines a projection of the motion compensated geometry (1413) onto the geometry of the reference point cloud for attributes (1411).
[0201] In other words, since points of motion compensated geometry (1413) represent motion compensated points of reference point cloud for attributes (1411) (or decoded geometry (1421), points of motion compensated geometry (1413) may correspond respectively to points of decoded geometry (1421) (or reference point cloud for attributes (1411), respectively). Therefore, the result of the attributes projection process at block 1420 may be a form of projection of attributes of reference point cloud for attributes (1411) (after applying MV field 1412) onto decoded geometry (1013, 1212).
[0202] In some examples, the attribute projection model may be a data structure used to efficiently perform attributes projection onto a specific position of a point. For example, the data structure may be a spatial-partitioning data structure such as a tree data structure (e.g. , a KD tree or an octree). In some examples, the data structure is used to search for a set of one or more points Npts, belonging to the motion compensated geometry (1413), with positions in the neighborhood of a particular point p from the decoded geometry (1421) according to an example and from reference point cloud for attributes 1411 according to another example. In some examples, the set of points NPts may be determined to be within the neighborhood for point p based on distances of the set of points Nptsfrom the point p being within or less than a threshold value. The threshold value may be a predetermined value or computed by the encoder and signaled to the decoder in the bitstream. In some other examples, the set of points Nptsmay be determined as a number (or quantity) of points with positions that are closest to the point p. The number may be a predetermined quantity or computed by the encoder and signaled to the decoder in the bitstream. The distance may be a Manhattan distance (i.e., L1 norm), a Euclidean distance (i.e., L2 norm), Chebyshev distance (i.e., L-infinity or L°° norm), or a Minkowski distance.
[0203] The attributes values associated with each point within the set of points Nptsmay be used at block 1420 for determining projected attributes values for the point p. For example, the projected attributes values may be attribute predictors of attributes for the point p. In some examples, a value of each attribute of point p may be predicted or projected based on values of corresponding attributes (with the same type as the each attribute) of the set of one or more points Npts.
[0204] In some examples, a projected attribute (e.g., attribute predictor) for the point p may be determined as an average of the attribute values, of the set of one or more points Npts, weighted by respective distance between the points Nptsand the point p. The distance may be a Manhattan distance (i.e., L1 norm), a Euclidean distance (i.e., L2 norm), Chebyshev distance (i.e., L-infinity or L°° norm), or a Minkowski distance.
[0205] For example, the projected attributes representing the attribute predictors for the point p may be determined to be the corresponding values of the attributes of the point from the motion compensated geometry (1413) closest to the point p. These examples enable faster projection because only one point is searched and selected from motion compensated geometry (1413) to determine an attributes predictor for point p from decoded geometry (1013, 1212).Docket No.: 24-2055PCT
[0206] At block 1430, the coder may determine attribute predictors (1431) for the portion of geometry (1022, 1222) based on intra-prediction attribute mode using at least one already-coded portion (1432).
[0207] For example, the already-coded portions (1432) may be the already-coded portions (1123) of FIG. 11 or the already-coded portions (1323) of FIG. 13.
[0208] The encoder / decoder may generate (e.g., determine) the intra-prediction attribute mode based on the intra prediction mode (selected attribute coding mode 1112, 1311).
[0209] For example, the attribute predictors (1431) may be obtained by extrapolating at least one attribute associated with at least one already-coded portions (1432).
[0210] For example, the intra prediction mode (selected attribute coding mode (1112, 1311)) may indicate that the at least one already-coded portion (1432) may comprise at least one spatial neighbor portion of the portion of geometry (1421).
[0211] In some examples, the portion of geometry (1421) may be encompassed by a current ACU and the spatial neighbor portion of the portion of geometry (1421) may comprise points encompassed by an ACU having a part of its boundary overlapping with at least a portion of a boundary of the current ACU. For example, when ACUs are cuboids in shape, boundary may be defined as faces, edges and vertices of the cuboid. Sharing a part of the boundary may be defined as having a common face, a common edge, or a common vertex (e.g., corner).
[0212] In some examples, the intra prediction mode (selected attribute coding mode 1112, 1311) may indicate that the attribute predictors (1431) may be obtained based on extrapolation of at least one attribute associated with the at least one spatial neighbor portion of the portion of geometry (1421 )
[0213] For example, extrapolation of at least one attribute associated with the at least one spatial neighbor portion of the portion of geometry (1421) may be determined by fitting a 3D attribute model for the at least one attribute associated with the at least one spatial neighbor portion of the portion of geometry (1421), and by extending (e.g., extrapolating) the fitted 3D attribute model to the portion of geometry (1421). A 3D attribute model may take spatial coordinates as input and provide modeled attributes as output; model parameters are fit (e.g., learned) on at least one attribute associated with the at least one spatial neighbor portion of the portion of geometry (1421).
[0214] In some examples, the intra prediction mode (selected attribute coding mode 1112, 1311) may indicate attribute predictors (1431 ) are determined based on averages of attributes associated with the spatial neighbor portion of the portion of geometry (1421).
[0215] In some examples, the intra prediction mode (selected attribute coding mode 1112, 1311) may indicate attribute predictors (1431) are determined based on a maximum of attributes associated with the spatial neighbor portion of the portion of geometry (1421).
[0216] In some examples, the intra prediction mode (selected attribute coding mode 1112, 1311) may indicate a spatial direction along which the spatial neighbor portion of the portion of geometry (1421) are selected.
[0217] In some examples, the intra prediction mode (selected attribute coding mode 1112, 1311) may indicate a maximum number of spatial neighbor portions of the portion of geometry (1421) are selected.Docket No.: 24-2055PCT
[0218] In some examples, the intra prediction mode (selected attribute coding mode 1112, 1311) may indicate a maximum distance between the spatial neighbor portion of the portion of geometry (1421) and the portion of geometry (1421). For example, the maximum distance maybe between any point of the geometry (1421) and any other point of the portion of geometry (1421). For example, the maximum distance may be between centers (or opposite corners) of the spatial neighbor portion of the portion of geometry (1421) and the portion of geometry (1421).
[0219] In some examples, the intra prediction mode (selected attribute coding mode 1112, 1311) may indicate an index of / to a list of indices, each index indicating one of the above intra-prediction attribute modes or one of their combinations.
[0220] FIG. 15 illustrates an example process 1500 for encoding attributes based on attribute predictors (1124) associated with portion of geometry (1022), according to some embodiments.
[0221] For example, process 1500 may be performed by an encoder (e.g., encoder 114 of FIG. 1). In some examples, blocks 1510-1530 may represent components within the encoder.
[0222] The encoder may determine residual attributes (1511) by subtracting the attribute predictors (1124) from the mapped attributes (1031) associated with portion of geometry (1022). At block 1510, the encoder may determine transformed coefficients (1512) by applying a 3D transform (e.g., RAHT transform, Haar transform) to the residual attributes (1511) based on portion of geometry (1022).
[0223] At block 1520, the encoder may determine quantized residual attributes or quantized coefficients (1521) by quantizing the residual attributes (1511) or the transformed coefficients (1512), respectively.
[0224] At block 1530, the encoder may entropy encode the residual attributes (1511) or the transformed coefficients (1512) or the quantized residual attributes or the quantized coefficients (1521) as attribute information (1032).
[0225] FIG. 16 illustrates another example process 1600 for encoding attributes based on attribute predictors (1124) associated with portion of geometry (1022), according to some embodiments.
[0226] For example, process 1600 may be performed by an encoder (e.g., encoder 114 of FIG. 1). In some examples, blocks 1610-1640 may represent components within the encoder.
[0227] At block 1610, the encoder may determine attribute coefficients (1611) by applying a 3D transform to the attribute predictors (1124) based on portion of geometry (1022).
[0228] At block 1620, the encoder may determine mapped attribute coefficients (1621) by applying a 3D transform (e.g., RAHT transform, Haar transform) to the mapped attributes (1031) based on portion of geometry (1022).
[0229] The encoder may determine residual coefficients (1631) by subtracting the attribute coefficients (1611) from the mapped attribute coefficients (1621).
[0230] At block 1630, the encoder may determine quantized coefficients (1641) by quantizing the residual coefficients (1631).
[0231] At block 1640, the encoder may entropy encode the residual coefficients (1631) or the quantized coefficients (1641) as attribute information (1032).Docket No.: 24-2055PCT
[0232] FIG. 17 illustrates an example process 1700 for decoding attributes based on attribute predictors (1324) associated with portion of geometry (1222), according to some embodiments.
[0233] For example, process 1700 (e.g., corresponding to block 1330 of FIG. 13) maybe performed by a decoder (e.g., decoder 120 of FIG. 1). In some examples, blocks 1710-1730 may represent components within the decoder.
[0234] At block 1710, the decoder may entropy decode from the bitstream (1290) quantized coefficients (1711) from attribute information (1232).
[0235] At block 1720, the decoder may determine residual coefficients (1721) by inverse quantizing the quantized coefficients (1711).
[0236] In some examples, the decoder may entropy decode from the bitstream (1290) residual coefficients (1721) from the bitstream (1290).
[0237] At block 1730, the decoder may determine residual attributes (1731) by applying inverse 3D transform (e.g., RAHT transform, Haar transform) to the residual coefficients (1721) based on portion of geometry (1222).
[0238] The decoder may determine decoded attributes (1233) by adding the residual attributes (1731) with the attribute predictors (1324).
[0239] FIG. 18 illustrates another example process 1800 for decoding attributes based on attribute predictors (1324) associated with portion of geometry (1222), according to some embodiments.
[0240] For example, process 1800 (e.g., corresponding to block 1330 of FIG. 13) maybe performed by a decoder (e.g., decoder 120 of FIG. 1). In some examples, blocks 1810-1840 may represent components within the decoder.
[0241] At block 1810, the decoder may entropy decode from the bitstream (1290) quantized coefficients (1811) from attribute information (1232).
[0242] At block 1820, the decoder may determine residual coefficients (1821) by inverse quantizing the quantized coefficients (1811).
[0243] In some examples, the decoder may entropy decode from the bitstream (1290) residual coefficients (1821).
[0244] At block 1830, the decoder may determine attribute coefficients (1831) by applying 3D transform (e.g., RAHT transform, Haar transform) to the attribute predictors (1324) based on the portion of geometry (1222).
[0245] The decoder may determine decoded attribute coefficients (1841) by adding the residual coefficients (1821) with the attribute coefficients (1831).
[0246] At block 1840, the decoder may determine decoded attributes (1233) by applying an inverse 3D transform to the decoded attribute coefficients (1841) based on portion of geometry (1222).
[0247] In some examples (e.g., used in G-PCC), the 3D transform that may be used / selected for transforming residual attributes (1511) of FIG. 15, mapped attributes (1031) and attribute predictors (1124) of FIG. 16 and attribute predictors (1324) of FIG. 18, maybe the region-adaptive hierarchical transform (RAHT) scheme. Inverse 3D transform of this 3D transform may be used / selected for inverse transforming residual coefficients (1721) of FIG. 17 and decoded attributes coefficients (1841) of FIG. 18. The 3D transform may be a RAHT transform or a Haar transform.Docket No.: 24-2055PCT
[0248] The RAHT scheme is based on the iterative use of a two-point transform. In the framework of point cloud attribute coding, the two-point RAHT transform is to be understood as being applied to two sets Ai and A2 of attributes having respectively wi and W2 number of attributes and respective associated coefficients CAI and CA2 representative of the sum of attribute values over their respective set divided by the square root of the number of attributes.
[0249] The two-point RAHT transform depends on the weights wi and W2 and is defined by a 2x2 matrix as follows
[0250] When applied to the two coefficients CAI and CA2, two new coefficients DC and AC are determined.
[0251] As illustrated below, the above property (*) on coefficients still holds for the DC coefficient.—CA1UA2
[0252] The two-point RAHT transform may be applied iteratively to DC coefficients. This is the RAHT iterative method. Once determined, AC coefficients do not undergo any further transformation. At the start of the RAHT iterative method, there are as many initial sets A of attributes as there are points in the coded geometry S. Each initial set A of attributes thus contains one attribute (wi=1 ) and the coefficient CAI is equal to the value of this one attribute, thus fulfilling the property (*). By induction, the property (*) holds for all subsequent DC coefficients determined after iterative application of the two-point RAHT transform. In some examples, the points of the coded geometry S may refer to voxels generated / determined to represent points of the coded geometry S.
[0253] Therefore, at any stage of the RAHT iterative method, determined coefficients are the union of a set of DC coefficients fulfilling the property (*) and a set of AC coefficients. The RAHT iterative method may continue until DC coefficients are depleted and only one DC coefficient is left. In this case, this one DC coefficient is equal to CA where A is the set of all attributes to be transformed. The RAHT iterative method may a priori follow any order among pairs of DC coefficients.
[0254] For example, the attributes to be transformed maybe residual attributes (1511), attribute predictors (1124), mapped attributes (1031) or attribute predictors (1324) and the one obtained DC coefficient CA may be transformed coefficients (1512), attributes coefficients (1611), mapped attributes coefficients (1621) or attributes coefficients (1831) respectively.
[0255] The two-point inverse RAHT transform may be defined by a 2x2 matrix as followsDocket No.: 24-2055PCT
[0256] and is applied to DC and AC coefficients such as to obtain (e.g., recover or reconstruct) the two coefficients CAI and cA2
[0257] The inverse iterative RAHT method applies the inverse two-point RAHT to DC and AC coefficient in reverse order relative to their obtainment by the iterative RAHT scheme. At the end of the inverse iterative RAHT scheme, coefficients CAI associated with the initial sets A of attributes are obtained. These coefficients CAI are equal to the values of the one attributes associated with the initial sets A. For example, attributes associated with the initial sets A may be the residual attributes (1731) or decoded attributes (1233).
[0258] In some examples (e.g., such as in G-PCC), the RAHT iterative scheme may follow an octree in a specific iterative order. Basically, the up to eight DC coefficients associated with the up to eight occupied child nodes of a parent node in the octree undergo a cascade of two-point RAHT transformations until one DC coefficient remains, together with up to seven AC coefficients. This one DC coefficient is pushed at the parent node level, and the scheme is repeated at upper octree depth (e.g., at lower depth indexes) until the root node is reached.
[0259] FIG. 19 illustrates an example RAHT transformation, of the RAHT scheme, applied on child nodes of an octree parent node along three successive directions, according to some embodiments.
[0260] The parent node 1900 has five occupied child nodes with associated coefficients c; and weights Wi. A first RAHT transformation 1910 is performed along a first direction 1911. If there are two adjacent occupied child nodes 1913 along this direction, they undergo a two-point RAHT transform to determine a new DC coefficient 1914 and an AC coefficient 1915 pushed to a set 1950 of AC coefficients. If there is only one occupied child node 1916 along this direction, the node is left as is and its DC coefficient is kept 1917. By doing so, the child nodes are collapsed along the first direction to determine a new set 1919 of nodes, here a set of three nodes, with associated new DC coefficients. Then, a second RAHT transformation 1920 is performed along a second direction 1921 in a similar way to determine child nodes 1922, that have been collapsed along the first two directions 1911 and 1921 , together with AC coefficients 1923 pushed to the set 1950 of AC coefficients. Finally, a third RAHT transformation 1930 is performed along a third direction 1931 in a similar way to determine a unique child node 1932, resulting from the collapse along all three directions, together with AC coefficients 1933 pushed to the set 1950 of AC coefficients.
[0261] The unique collapsed child node 1932 has an associated DC coefficient that is pushed to the parent node as illustrated in FIG.20.
[0262] FIG. 20 illustrates an example RAHT transformation being applied to all octree nodes at depth ‘d’ to determine DC coefficients at depth d-1 and AC coefficients, according to some embodiments. Occupied nodes 2000 of an octree at depth ‘d’ are illustrated. These nodes undergo a RAHT transformation along the three directions such as to push DC coefficients up to their occupied parent nodes 2010 belonging to the octree at depth d-1. For example, the threeDocket No.: 24-2055PCTDC coefficients of the child nodes 2001 undergo a RAHT transformation along the three directions to determine a unique DC coefficient associated with their parent node 2011 and two AC coefficients 2021 pushed to a set 2020 of AC coefficients. By performing this method for all occupied nodes 2000 of the octree at depth ‘d’, the DC coefficients associated with occupied nodes of the octree at depth ‘d’ are transformed into DC coefficients associated with occupied nodes 2010 of the octree at depth d-1 and a set 2020 of AC coefficients.
[0263] This bottom-up method may be repeated depth per depth until reaching the minimum depth (the root node) and the result of the RAHT transformation over the complete octree is a set of coefficients comprising a unique DC coefficient and a set of (many) AC coefficients.
[0264] The RAHT transformation method typically starts from the highest / deepest depth (e.g. , farthest from the root node with a low depth index) where occupied child nodes correspond to a unique point (voxel) of the coded geometry S associated with a unique attribute among the set 'a' of attributes. The DC coefficient at highest / deepest depth is thus set as the value of the unique attribute associated with each occupied node and the weights 'w' are set to 1.
[0265] The inverse RAHT method on an octree is a top-down method from the root node (with lowest depth index or with most shallow depth) down to the last depth (with highest depth index or with deepest depth) made of leaf nodes that each contain only one point (voxel) of the point cloud, thus only one associated attribute. The DC coefficients of occupied nodes 2010 of the octree at depth d-1 are inverse transformed into DC coefficients of occupied nodes 2000 of the octree at depth ‘d’ by applying the inverse two-point RAHT transform to the DC coefficient of each of the occupied node of the octree at depth d-1 and to the related AC coefficients from set 2020 of AC coefficients. The inverse two-point RAHT transform is applied along the three directions, in reverse order, such as to invert the node transformation process of FIG. 19. By doing so, DC coefficients of the leaf nodes are obtained, and their values correspond to the attributes associated with the unique point of each of the leaf nodes.
[0266] Like geometry coding of a point cloud, coding of attributes associated with the points of a current point cloud may benefit from inter-frame prediction using a motion compensated point cloud. The motion compensated point cloud inherits attributes from a reference point cloud that has been motion compensated. For example, during motion, points keep their associated attributes. The motion compensated attributes, e.g., the attributes associated with the points of the motion compensated point cloud, may be used to better compress the attributes of the coded geometry of the current point cloud.
[0267] Inter RAHT scheme maybe used at blocks 1610, 1620, 1730, 1830, and 1840. Inter RAHT scheme defines an inter prediction mode that uses inter prediction for predicting the values of the DC and the AC coefficients determined by the RAHT iterative method or the inverse RAHT iterative method. In some examples, because the generation of DC and AC coefficients follows an octree, it may be beneficial to maintain a common attribute octree structure for both the portion of geometry (1022, 1222) and a motion compensated portion of geometry. A common bounding box encompassing both portion geometries may be determined, and an octree partitioning may be performed, from a root node associated with the common bounding box, for both portion geometries. This leads to two octree partitioning that are different when the point geometries are not equal, which is likely. The two octrees have a common subtreeDocket No.: 24-2055PCTstarting from the root node. On this subtree, occupied node topology is the same and a common set of DC and AC coefficients is determined for both portion geometries. Thus, the subset of DC and AC coefficients associated with nodes of the common subtree and determined from the attributes of the portion of geometry (1022, 1222) may be predicted from DC and AC coefficients determined from the attributes of the motion compensated portion of geometry. Practically, the encoder and / or decoder may determine coefficient residual values by subtracting the DC and AC coefficients determined from the attributes of the motion compensated portion of geometry from the DC and AC coefficients associated with nodes of the common subtree and determined from the attributes of the portion of geometry (1022, 1222).
[0268] The DC and AC coefficients that are not associated with nodes of the common subtree may not be predicted and may be transformed directly in a similar way as performed for the case without inter prediction.
[0269] In some examples, instead of predicting AC coefficients, predicted DC coefficients of the portion of geometry (1022, 1222) may be determined at some depth, assuming both the octree of the portion of geometry (1022, 1222) and the octree of the motion compensated portion of geometry have a same occupancy of a node at this depth. The predicted DC coefficients may be determined from their co-located DC coefficients of the motion compensated portion of geometry. DC residual values may be determined by subtracting the predicted DC coefficients from the DC coefficients of the portion of geometry (1022, 1222). The RAHT transformation then goes up in the octree starting from DC residual values replacing the DC coefficients of the portion of geometry (1022, 1222).
[0270] A RAHT scheme process that does not use information from a reference point cloud different from the current point cloud is called an intra RAHT scheme. Intra RAHT scheme defines an intra prediction mode for predicting the values of the DC and the AC coefficients determined by the RAHT iterative method or the inverse RAHT iterative method.
[0271] Intra prediction may be performed between DC and AC coefficients of an intra RAHT scheme and is referred to as inter-depth prediction. In some implementations of point cloud coding, this inter-depth prediction within portion of geometry (1022, 1222) has been integrated into the RAHT scheme as being an intra prediction mode.
[0272] The inter-depth prediction mode may predict the DC coefficients associated with a current RAHT node of a RAHT tree at depth ‘d’ by using interpolation of DC coefficients associated with nodes of the octree at lower depth d-1 (e.g. , lower depth index corresponding to shallower depths of the RAHT tree). Lower depth indicates a depth level closer to root node of RAHT tree and higher depth indicates a depth level closer to leaf nodes of RAHT tree (i.e., farther from root node). The inter-depth prediction mode may also predict AC coefficients of a current RAHT node by using attribute information from a parent node and attribute information from already-coded neighboring RAHT nodes of the parent node of the current RAHT node.
[0273] A RAHT tree comprises RAHT nodes that are linked together according to a parent-child relationships defined as an occupancy tree over the decoded geometry. A RAHT node of the RAHT tree is a node of the occupancy tree that is associated with a DC coefficient and possibly one or more AC coefficients obtained by applying the iterativeDocket No.: 24-2055PCTRAHT (encoder) or inverse iterative RAHT method (decoder) to the points belonging to the sub-volumes associated with the occupied nodes of the occupancy tree.
[0274] For example, already-coded neighboring RAHT nodes of the parent node of the current RAHT node may include nodes that are siblings to the parent node, e.g. the seven siblings in an octree structure. In some examples, already-coded neighboring RAHT nodes may include nodes that share a face (e.g., a portion of a face), an edge (e.g., a portion of an edge), or a vertex (e.g., corner) with the parent node.
[0275] In some examples, the inter-depth prediction as discussed above may be implemented in a bounded domain such as, for example, the mean attribute domain which is naturally bounded by the attribute value range. The bounded property of the mean attribute domain is advantageous as it correlates to a more physical meaning and provides better numerical stability of the inter-depth prediction, thus leading to a more efficient prediction.
[0276] Mean sums of attributes values ai d calculated for RAHT nodes at a parent RAHT node depth ‘d’ may then be used to predict a mean sum of attribute values aj,cassociated with each child RAHT node (at depth d+1) of the parent RAHT node. The mean sums of attributes values a.d calculated for RAHT nodes at a parent node depth d may comprise a mean sum of attributes values a.d calculated for the parent RAHT node and possibly a mean sum of attributes values a, calculated from one or more already-coded neighboring RAHT nodes of the RAHT parent node
[0277] FIG. 21A illustrates an example process 2100 for encoding attribute information for child RAHT node(s) (2110) (at depth d) of a parent RAHT node (2111) (at depth d-1) using top-down coding and inter-depth prediction, according to some embodiments.
[0278] For example, process 2100 may be performed by an encoder (e.g., encoder 114 of FIG. 1).
[0279] In this example, a set of already-coded neighboring RAHT nodes (2112) of the parent RAHT node (2111) may include RAHT nodes of the RAHT tree at depth d-1 that share at least a vertex with the parent RAHT node (2111).
[0280] The encoder determines the DC coefficients CAi.d for the parent RAHT node (2111) and each of the already- coded neighboring RAHT nodes (2112) and calculates a mean sum of the attributes value a.d for the parent RAHT node (2111) and a mean sum of attributes value ai d for each of the already-coded neighboring RAHT nodes (2112) by dividing the coefficients CAi.d of each of those RAHT nodes by the square root of its corresponding number of attributes For example, a mean sum of attributes ai d may be obtained as = ^a A‘ .
[0281] The encoder may then obtain a predicted mean sum of attributes aljC,up for each child RAHT node (2110) by up-sampling the mean sums of attributes values ai d calculated for the RAHT nodes at the parent node depth ‘d’.
[0282] Then, the encoder determines a predicted DC coefficient CAi.c.pred for each child RAHT node (2110) by multiplying the predicted mean sum of attributes aiC,up calculated for each child RAHT node (2110) by the square root of its corresponding number of attributes
[0283] The encoder then determines, based on equation 1, predicted AC coefficients from predicted DC coefficient CAi.c.pred and determines residual AC coefficients by subtracting the predicted AC coefficients from original AC coefficients associated with each child RAHT node (2110). Original AC coefficients associated with a child RAHTDocket No.: 24-2055PCTnode (2110) are determined by the usual RAHT transform from original attributes to be coded. The residual AC coefficients may be encoded in a bitstream, for example through quantization (blocks 1520, 1630) and entropy coding (block 1530, 1640).
[0284] FIG. 21B illustrates an example process 2100B for decoding attribute information for child RAHT nodes, at depth d, of a parent RAHT node at depth d-1 using top-down decoding and inter-depth prediction, according to some embodiments. When inter-depth prediction is used in the RAHT transform and coding, the decoder performs top- down decoding because the inter-depth prediction goes from depth ‘d’ to depth d-1.
[0285] For example, process2100B may be performed by a decoder (e.g., decoder 120 of FIG. 1).
[0286] The decoder employs the same inter-depth prediction process to generate the predicted mean sum of attributes values CAi.c.pred for each child node 2110 and the predicted AC coefficients. It also reconstructs the residual AC coefficients from a bitstream, for example through entropy decoding (blocks 1710, 1810) and inverse quantization (block 1720, 1820). The decoder then determines decoded AC coefficients by adding the predicted AC coefficients to the reconstructed residual AC coefficients.
[0287] FIG. 22 illustrates an example process 2200 for up-sampling the mean sums of attributes values of RAHT nodes at depth ‘d-11, such as including the parent RAHT node (2111) and the already-coded neighboring RAHT nodes (2112), according to some embodiments. For clarity and ease of explanation, this example is illustrated in two dimensions, but extension to three dimensions will be understood in light of the description herein.
[0288] In this example, the up-sampling operation considers (e.g., uses) a distance metric dk relating the child RAHT node (2110) to a RAHT node k at depth ‘d-1’, e.g., either the parent RAHT node (2111) or each of the three (in this example) already-coded neighboring RAHT nodes (2112). This distance metric dk may represent a geometric distance between a center point of the sub-volume corresponding to the child RAHT node (2110) and a center point of the sub-volume corresponding to the RAHT node k (2111 , 2112). The inverse dk1of the distance metric dk may represent the relative weight of correlation between attribute information from the child RAHT node (2110) and the RAHT node k. Other weighting factors, or additional weighting factors, may be used in other implementations of the up-sampling operation.
[0289] In some examples, the predicted mean sum of attributes ai.c.up for the child RAHT node (2110) may be given by the weighted sum:<where akindicates the mean sum of attributes calculated for the RAHT node k.
[0290] When neither intra (e.g., inter-depth prediction) nor inter prediction are used, AC coefficients are directly coded. The direct coding of AC coefficients is equivalent to coding residual AC coefficients relative to a null predictor and therefore may be referred to as a null prediction mode for coding the current RAHT node.
[0291] As described herein, transformed coefficients of transform nodes (e.g., RAHT nodes, Haar nodes) may be (e.g., more efficiently) encoded and / or decoded using prediction modes for coding attributes of the transform nodes.Docket No.: 24-2055PCTVarious types of transform may be applied such as the RAHT transform or a Haar transform. RAHT transform may be considered a lossy variation of the Haar transform, which may be implemented as a lossless transform. One main difference between the RAHT and Haar transforms is the Haar transform may use equal weights (e.g., set weights to 1) as an equal average and the RAHT transform may use weights of different values to result in a weighted average. Both types of transforms use attribute values of a node at a lower level of the octree to predict attributes of the nodes at the next level. RAHT coefficients of RAHT nodes may be (e.g., more efficiently) encoded and / or decoded using prediction modes for coding attributes of the RAHT nodes. A prediction mode for a RAHT node of the RAHT tree may be a null prediction mode or one of an intra or inter prediction modes, as described herein with respect to FIG. 14. Indications of prediction modes may be entropy coded in a bitstream, for example, to improve coding performance.
[0292] FIG. 23 shows an example method for encoding attributes. More specifically, FIG. 23 shows an example process 2300 of a method for encoding attributes of a RAHT node (2301) using a prediction mode. Although the process illustrated in FIG. 23 shows an example method for encoding attributes of a RAHT node, the example method may by applied for encoding attributes of a transform node (e.g., RAHT node, Haar node). Accordingly, the RAHT node (2301) may be an example of a transform node. The example process 2300 may be generally used for a transform (eg., RAHT transform, Haar transform). One or more steps of the example process 2300 may be performed and / or implemented by an encoder (e.g., encoder 114 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG. 32. In some examples, blocks 2310-2340 may represent components within the encoder.
[0293] The encoder may transform attributes of a transform node (e.g., RAHT node, Haar node) to obtain sets of residual transform coefficients. The encoder may transform attributes (2305) of the RAHT node (2301) to obtain sets (e.g., {Ck}) of residual RAHT coefficients (2331). The sets (e.g., {Ok}) of residual transform coefficients (e.g., RAHT coefficients (2331)) may result from applying / using a prediction mode (2321). The sets (e.g., {Ck}) of residual transform coefficients (e.g., RAHT coefficients 2331) may be encoded into a bitstream (2390).
[0294] The attributes (2305) of a transform node may be transformed (e.g, RAHT, Haar) into sets of transform coefficients. At block 2310, for example, the attributes (2305) of a RAHT node (2301) may be RAHT transformed into sets of RAHT coefficients (2311). For example, the attributes (2305) of a RAHT node (2301) may be forward RAHT transformed (or use a forward RAHT transform to be transformed) into sets of RAHT coefficients (2311).
[0295] At block 2320, a prediction mode (2321) may be obtained for the current node (e.g., the RAHT node (2301). The prediction mode (2321) may be determined (e.g., inferred, selected), for example, based on (e.g., from) a set of prediction modes. The set of prediction modes may be from a group including an inter prediction mode and an intra prediction mode. The set of prediction modes may be from a group including an intra prediction mode, an inter prediction mode, and a null prediction mode.
[0296] FIG. 24 shows an example method for encoding attributes. More specifically, FIG. 24 shows an example process 2400 of a method for performing block 2320 related to obtaining the prediction mode (2321) for encoding attributes of the RAHT node (2301). Although the process illustrated in FIG. 24 showsan example method forDocket No.: 24-2055PCTperforming block 2320 related to obtaining the prediction mode (2321) for encoding attributes of the RAHT node (2301), the example method may be generally applied for encoding attributes of a transform node (e.g., RAHT node, Haar node). Accordingly, the RAHT node (2301) may be an example of a transform node. The example process 2400 may be generally used fora transform (e.g., RAHT transform, Haar transform). One or more steps of the example process 2400 may be performed and / or implemented by an encoder (e.g., encoder 114 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG. 32. In some examples, blocks 2410-2440 may represent components within the encoder.
[0297] The encoder may perform block 2320, for example, after sets of transform coefficients (e.g., RAHT coefficients (2311)) have been obtained at block 2310. More specifically, the encoder may make an inference decision (e.g., at block 2410). The inference decision may indicate whether the prediction mode (2321) is either an inferred prediction mode (e.g., pmf (2441), output at block 2440) ora selected prediction mode (e.g., psei(2421), output at block 2420). The selected prediction mode may be selected from a set of prediction modes (e.g., at block 2420). At block 2410, the encoder may make the inference decision. The encoder may make the inference decision, for example, based on neighboring residual transform coefficients that may be associated with at least one already-coded neighboring transform node. The encoder may make the inference decision, for example, based on neighboring residual RAHT coefficients (2351). The neighboring residual RAHT coefficients (2351) may comprise, for example, residual RAHT coefficients associated with at least one already-coded neighboring RAHT node (2350) of the RAHT node (2301). The inference decision may be, for example, based on information indicating the magnitudes (e.g., n , mk.u. and mk.v) of the neighboring residual transform coefficients (e.g. residual RAHT coefficients (2351)).
[0298] The residual transformed coefficients (e.g., RAHT coefficients, Haar coefficients) may comprise sets of component coefficients. The residual transformed coefficients may comprise sets of component coefficients, for example, to represent attributes associated with the transform node (e.g., RAHT node, Haar node). In some examples, residual transformed coefficients (e.g., RAHT coefficients) may comprise a series {Ck}k=i K of K sets of component coefficients to represent attributes associated with the transform node (e.g, RAHT node (2301)). Each set of component coefficients may comprise three component coefficients, for example, if the attributes to be coded are color attributes. The three component coefficients may correspond to three color components of a color attribute of a voxel or a point. For example, one component coefficient may be for luma, and two component coefficients may be for chroma. For example, the color space may be YCbCror YUV. The luma component of set Ck of coefficients may be denoted as Ck.Y. The two chroma components may be denoted as Ck.u and Ck.v, respectively. The set Ck may be represented as:Ck= {Ck.Y, Ck.u, Ck.v}.
[0299] Each component coefficient (e.g., Ck.Y, Ck.u and Ck.v) may be represented by a sign (e.g., Sk.Y, Sk.u and Sk.v) and a series of bits representing its magnitude (or absolute value) (e.g., mk.Y, mk,u and rrik.v). For example, a component coefficient Ck.Y may be represented by a sign Sk.Y and a series of bits representing its magnitude (or absolute value) mk.Y.Docket No.: 24-2055PCT
[0300] The inference decision may be made. The inference decision may be made, for example, based on the information indicating the magnitudes being less than or equal to a threshold (e.g., small magnitudes). Small magnitudes may indicate that neighboring prediction modes (2352) may be accurate and that the prediction mode (2321) may be inferred. The neighboring prediction modes (2352) may be, for example, prediction modes associated with at least one already-coded neighboring transform node (e.g., already-coded neighboring RAHT nodes (2350) of the RAHT node 2301. The prediction mode (2321) may be inferred, for example, at block 2440, from the neighboring prediction modes (2352).
[0301] At block 2440, an inferred prediction mode (e.g., p,nf) (2441) may be obtained. An inferred prediction mode for the current node maybe obtained, for example, based on determining a neighboring prediction mode (2352). The neighboring prediction mode (2352) may be a most common / used prediction mode among the neighboring prediction modes. The determination / selection of the most common / used prediction mode may be referred to as a majority voting process.
[0302] At block 2420, a prediction mode for a current node may be selected. The prediction mode (e.g., psei) (2421) may be selected from a set of prediction modes. The prediction mode (e.g., psei) (2421 ) may be selected from a set of prediction modes, for example, based on rate distortion optimization (RDO) costs of the set of prediction modes. As an example, the prediction mode with the lowest RDO cost may be selected.
[0303] The inference decision may indicate the prediction mode (2321) being either the inferred prediction mode (e.g., pnf) (2441) or the selected prediction mode (e.g., psei) (2421). At block 2430, the encoder may encode (e.g., entropy encode) the selected prediction mode. The encoder may encode (e.g., entropy encode) the selected prediction mode, for example, as part of prediction mode information in a bitstream. The encoder may encode (e.g., entropy encode) an indication of the selected prediction mode (e.g., psei) (2421) as part of prediction mode information 2322 in a bitstream (2390), for example, if the inference decision indicates that the prediction mode (2321) is the selected prediction mode (e.g., psei) (2421). The indication of the selected prediction mode may be skipped or omitted from being (e.g., signaling as) part of prediction mode information (2322). The indication may be skipped or omitted from being (e.g., signaling as) part of prediction mode information (2322), for example, if the inference decision indicates that the prediction mode (2321) is the inferred prediction mode (e.g., pint) (2441). In some examples, the indication may not be needed in bitstream (2390). The indication may not be needed in bitstream (2390), for example, because the decoder may make the same inference decision and infer the same inferred prediction mode (e.g., pint) (2441) as the encoder (e.g., as further described herein with respect to FIG. 25).
[0304] The encoder may encode (e.g., entropy encode) the indication of the selected prediction mode (e.g., psei) (2421). The encoder may entropy encode the indication of the selected prediction mode (e.g., psei) (2421), for example, based on a context selected from a set of contexts. For example, the context selection may be based on neighboring prediction modes (2352). The neighboring prediction modes (2352) may be, for example, prediction modes (2352) associated with at least one already-coded neighboring transform node. The neighboring predictionDocket No.: 24-2055PCTmodes (2352) may be, for example, prediction modes (2352) associated with at least one already-coded neighboring RAHT node (2350) of the RAHT node (2301).
[0305] The obtained prediction mode (2321) may be associated with the transform node (e.g., RAHT node (2501)). The obtained prediction mode (2321) may be used for encoding attributes of (e.g., further) transform nodes, such as RAHT nodes or Haar nodes.
[0306] Referring back to FIG. 23, the encoder may determine residual transformed coefficients (e.g., RAHT coefficients, Haar coefficients). A prediction mode may be applied to (e.g., used for) the transform node to obtain (e.g., compute or derive) sets (e.g., {Ck}) of residual transformed coefficients from sets of transformed coefficients. For example, at block 2330, the encoder may determine residual RAHT coefficients. The prediction mode (2321) may be applied to (e.g., used for) the RAHT node (2301) to obtain (e.g., compute or derive) sets (e.g., {Ck}) of residual RAHT coefficients (2331) from the sets of RAHT coefficients (2311). For example, the encoder may use the prediction mode (2321 ) to generate a predictor that is subtracted from the set of RAHT coefficients (2311) to obtain the sets {Ck} of residual RAHT coefficients (2331).
[0307] Generating (e.g., constructing) the predictor may be performed. Generating (e.g., constructing) the predictor may be performed, for example, based on the already-coded neighboring nodes (2350) and their neighboring prediction modes (2352). For example, the constructed predictor may be an average predictor. The average predictor may be a linear combination of an intra predictor derived from an intra prediction mode and an inter predictor derived from an inter prediction mode. The weights in the linear combination may be determined. The weights in the linear combination may be determined, for example, based on neighboring prediction modes (2352).
[0308] At block 2340, the encoder may encode (e.g., entropy encode) residual coefficients. The sets (e.g., {Ck}) of residual transformed coefficients (2331) may be entropy encoded into the bitstream as coefficient information. For example, the sets (e.g., {Ck}) of residual RAHT coefficients (2331) may be entropy encoded into the bitstream (2390) as coefficient information (2342).
[0309] In some examples, the encoding (e.g., at block 2340) may comprise encoding a series of flags (e.g., {fk}) indicating if the sets (e.g., {Ck}) of residual transformed coefficients (e.g., residual RAHT coefficients (2331)) are zero. The encoding may comprise encoding values of sets (e.g., Ck.y, Ck.u and Ck.v), for example, after encoding the series of flags. The encoding values of sets (e.g., Ck.y, Ck.u and Ck.v) may be performed, for example, based on (e.g., by) encoding flags (e.g., fk.Y, fk.u and fk.v) indicating if respective values are zero, signs (e.g., Sk.y, Sk.u and Sk.v) and magnitudes (or absolute values) (e.g., nrik.Y, mk.u and nrik.v) for non-zero component coefficients in each of the sets.
[0310] The encoder may entropy encode the series of flags (e.g., {fk}) and / or the sets (e.g., {Ck}) of residual transformed coefficients (e.g., residual RAHT coefficients (2331 )), for example, based on a context selected from a set of contexts. For example, the context selection may be based on neighboring prediction modes (2352).
[0311] FIG. 25 shows an example method for decoding attributes. More specifically, FIG. 25 shows an example process 2500 for decoding attributes of a RAHT node 2501 using a prediction mode. Although the process illustrated in FIG. 25 showsan example method for decoding attributes of a RAHT node (2501) using a prediction mode, theDocket No.: 24-2055PCTexample method may be generally applied for decoding attributes of a transform node (e.g., RAHT node, Haar node). Accordingly, the RAHT node (2501) may be an example of a transform node. The example process 2500 may be generally used for a transform (e.g., RAHT, Haar). One or more steps of the example process 2500 may be performed and / or implemented by a decoder (e.g., decoder 120 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG. 32. In some examples, blocks 2510-2540 may represent components within the decoder. The bitstream (2590) may be (e.g., typically) obtained from the encoding method as shown in FIG. 23. The decoder may decode sets (e.g., {Ck}) of residual transformed coefficients (e.g., RAHT coefficients (2531)) from the bitstream (2590). The decoder may apply / use an inverse transform (e.g., RAHT, Harr), for example at block 2510, to obtain decoded attributes of a transform node. For instance, the decoder may apply / use an inverse RAHT transform (e.g., at block 2510) to obtain decoded attributes (2505) of a RAHT node (2501).
[0312] A prediction mode may be obtained for the transform node (e.g., current node). Referring for example to FIG.25, at block 2520, a prediction mode (2521) may be obtained for the RAHT node (2501) (e.g., current node). The prediction mode (2521) may be inferred or selected from a set of prediction modes.
[0313] FIG. 26 shows an example method for decoding attributes. More specifically, FIG. 26 shows an example process 2600 for performing block 2520 in FIG. 25. Although the process illustrated in FIG. 26 shows an example method for decoding attributes of a RAHT node (2501 ) using a prediction mode, the example method may be generally applied for decoding attributes of a transform node (e.g., RAHT node, Haar node). Accordingly, the block 2520 may be related to obtaining the prediction mode (e.g., prediction mode (2521 )) for decoding attributes of the transform node (e.g., RAHT node (2501)). The RAHT node (2501) may be an example of a transform node. The example process 2600 may be generally used for a transform (e.g., RAHT, Haar). One or more steps of the example process 2600 may be performed and / or implemented by a decoder (e.g., decoder 120 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG. 32. In some examples, blocks 2610-2630 may represent components within the decoder.
[0314] At block 2610, the decoder may make an inference decision. More specifically, the decoder may make an inference decision that may indicate whether the prediction mode (2521) is either the inferred prediction mode (e.g., Pint) 2631 (e.g., obtained at block 2630) or the selected prediction mode (e.g., psei) 2621 selected from a set of prediction modes (e.g., selected at block 2620). The decoder may make the inference decision, for example, based on neighboring residual transformed coefficients, which may be associated with at least one already-coded neighboring transform node of the transform node. Referring to FIG. 26, at block 2610, the decoder may make the inference decision, for example, based on neighboring residual transformed coefficients (e.g., residual RAHT coefficients (2351)). The neighboring residual RAHT coefficients (2351) may be, for example, residual RAHT coefficients associated with at least one already-coded neighboring RAHT node (2350) of the RAHT node (2501). For example, the inference decision may be based on information indicating the magnitudes (eg., mk.y, mk.u, and mk.v) of neighboring residual transformed coefficients (e.g., residual RAHT coefficients (2351)). For example, the inference decision may be made based on the information indicating the magnitudes being less than or equal to a thresholdDocket No.: 24-2055PCT(e.g., small magnitudes). Small magnitudes may indicate that neighboring prediction modes (2352) (e.g., the prediction modes associated with at least one already-coded neighboring RAHT nodes (2350) of the RAHT node (2501)) may be (e.g., are likely to be) accurate and the prediction mode (2521) may be inferred (e.g., at block 2630) from the neighboring prediction modes (2352).
[0315] At block 2630, the decoder may obtain an inferred prediction mode for the current node. The inferred prediction mode (e.g., pint) 2631 may be obtained, for example, based on determining a (e.g., most) common / used prediction mode among neighboring prediction modes (2352). The determination / selection of the (e.g., most) common / used prediction mode may be referred to as a majority voting process. Block 2630 may be performed identically as block 2440 of FIG. 24.
[0316] The inference decision (e.g., made at block 2610) may indicate the prediction mode (2521) being either the inferred prediction mode (e.g., pint) 2631 or the selected prediction mode (e.g., psei) 2621. The decoder may decode (e.g., entropy decode) (e.g., at block 2620) an indication of the selected prediction mode (e.g., psei) 2621 , for example, if the inference decision indicates that the prediction mode (2521) is the selected prediction mode (e.g., psei) 2621. The decoder may decode (e.g., entropy decode) the indication of the selected prediction mode (e.g., psei) 2621, for example, as part of prediction mode information (2522), from bitstream (2590)
[0317] At block 2620, the decoder may decode a prediction mode for the current node. More specifically, the decoder may decode (e.g., entropy decode) the indication of the selected prediction mode (e.g., psei) 2621, for example, based on a context selected from a set of contexts. The context selection may be based on, for example, neighboring prediction modes (2352). The neighboring prediction modes 2352 may be, for example, prediction modes 2352 associated with at least one already-coded neighboring transform node (e.g., already-coded neighboring RAHT node (2350) of the RAHT node (2501)). The prediction mode may be associated with the transform node and may be used for decoding attributes of (e.g., further) transform nodes, for example, the prediction mode (2521) may be associated with the RAHT node (2501) and may be used for decoding attributes of (e.g., further) RAHT nodes.
[0318] Referring back to FIG. 25, at block 2540, the decoder may decode (eg., entropy decode) the residual of coefficients. More specifically, sets (e.g., {Ck}) of residual RAHT coefficients (2531) may be obtained by entropy decoding coefficient information (2542) from the bitstream (2590).
[0319] In some examples, the decoding (e.g., at block 2540) may comprise decoding a series of flags (e.g., {fk}) indicating if the sets (e.g., {Ck}) of residual transformed coefficients (e.g., residual RAHT coefficients 2531) are zero. The decoding may comprise decoding values of sets (e.g., Ck.y, Ck.u and Ck.v), for example, after decoding the series of flags. The decoding values of sets (e.g., Ck.y, Ck.u and Ck.v) may be performed, for example, based on (e.g., by) decoding flags (e.g., fk.y, fk.u and fk.v) indicating if respective values are zero, signs (e.g., Sk.y, Sk.u and Sk.v) and magnitudes (or absolute values) (e.g., n , mk.u and mk.v) for non-zero component coefficients in each of the sets.
[0320] The decoder may entropy decode the series of flags (e.g., {fk}) and / or the sets (e.g., {Ck}) of residual transformed coefficients (e.g., residual RAHT coefficients 2531), for example, based on a context selected from a set of contexts. The context selection maybe based on neighboring prediction modes (2352).Docket No.: 24-2055PCT
[0321] The decoder may determine transformed coefficients (e.g., RAHT coefficients, Haar coefficients). The prediction mode may be applied to (e.g., used for) the transform node to obtain (e.g., compute or derive) sets of transformed coefficients. Referring to back to FIG.25, at block 2530, the decoder may determine RAHT coefficients. The prediction mode (2521) may be applied to (e.g., used for) the RAHT node (2501) to obtain (e.g., compute or derive) sets of RAHT coefficients 2511. For example, the decoder may use the prediction mode (2521) to generate a predictor that may be added to the sets (e.g., {Ck}) of residual RAHT coefficients (2531) to obtain sets of RAHT coefficients (2511).
[0322] Generating (e.g., constructing) the predictor may be performed. Generating (e.g., constructing) the predictor may be performed, for example, based on the already-coded nodes neighboring nodes (2350) and their neighboring prediction modes (2352). For example, the constructed predictor may be an average predictor. The average predictor may be a linear combination of an intra predictor derived from an intra prediction mode and an inter predictor derived from an inter prediction mode. The weights in the linear combination may be determined. The weights in the linear combination may be determined, for example, based on neighboring prediction modes (2352).
[0323] The sets of transformed coefficients may be (e.g., all) inverse transformed (e.g., RAHT transformed, Haar transformed). At block 2510, for example, the sets of RAHT coefficients (2511) maybe (e.g., all) inverse (e.g., RAHT) transformed. The sets of RAHT coefficients (2511) may be (e.g., all) inverse (e.g., RAHT) transformed, for example, to obtain decoded attributes (2505) associated with the RAHT node (2501).
[0324] Iterative transform (e.g., RAHT transform, Haar transform) may be performed. For example, the iterative RAHT transform may be performed in two phases. In a first phase, a bottom-up (or ascend) traversal of a RAHT tree (e.g., octree) may be performed. The RAHT tree may be an example of a transform tree. A bottom-up (or ascend) traversal of a RAHT tree (e.g., octree) may be performed, for example, to determine (e.g., compute) information (e.g., number / quantity of attributes, weights, inter predictor, etc.) of each of the RAHT nodes. In a second phase, the transform and coding may be performed according to a top-down (descend) traversal of the RAHT tree. Encoding and decoding of RAHT coefficients may be performed, for example, if the descend phase occurs. Encoding and decoding of RAHT coefficients may be performed during the descend (traversal) phase, for example, because the inter-depth prediction may use information (e.g., weights) determined (e.g., computed) during the ascend phase and AC coefficients associated with already-coded (e.g., already-predicted) neighboring RAHT nodes at the parent node depth of the current node. Inter-depth prediction may be coded down from the root node of the RAHT tree. Encoding and decoding of RAHT coefficients may start from the root RAHT node of the RAHT tree and may be performed depth per depth descending from the root node until leaf RAHT nodes (corresponding to voxels) may be reached. For each depth, the traversal of RAHT nodes may be in a specific order. For example, the traversal of RAHT nodes may follow a Morton order or a raster scan order.
[0325] The RAHT bitstream resulting from the encoding of RAHT coefficients may be constructed based on (e.g, according to) the traversal of the RAHT tree. A prediction mode may be selected and / or encoded in the RAHT bitstream, for example, for each RAHT node. The prediction mode may be a null prediction mode, an intra predictionDocket No.: 24-2055PCTmode, or an inter prediction mode. A prediction mode (e.g. , null, intra, inter) may be selected. A prediction mode (e.g ., null, intra, inter) may be selected, for example, based on (e.g., by) minimizing a Rate Distortion Optimization (RDO) metric (e.g., selecting one prediction mode with the minimum RDO cost). The prediction mode (e.g., null, intra, inter) may be encoded in the RAHT bitstream, for example, by the RAHT encoding process. AC coefficients may be determined and / or encoded in the RAHT bitstream, for example, based on the selected prediction mode.
[0326] Decoding of transformed coefficients (e.g., RAHT coefficients) may follow the same traversal order as the encoding of transformed coefficients (e.g., RAHT coefficients). Prediction modes and AC coefficients may be decoded in the same order as encoded by the encoding of transformed coefficients (e.g., RAHT coefficients).
[0327] A (e.g., large) portion of the bitstream (2390, 2590) may comprise the entropy coded indications of prediction modes (2321, 2521) associated with transform nodes (e.g., RAHT nodes (2301, 2501)). The cost of the coding of an indication of a prediction mode (2321 , 2521) may be reduced, for example, by inferring the prediction mode associated with the transform nodes (e.g., RAHT node (2301, 2501)) to a predetermined prediction mode. For example, a fixed value may be used to indicate the predetermined prediction mode. The prediction mode associated with the RAHT node (2301 , 2501) may be inferred to a predetermined prediction mode, for example, when / if the depth of the RAHT node (2301, 2501) is higher than a predetermined depth (e.g., a maximum depth) for prediction mode coding.
[0328] FIG. 27A shows an example of the top-down traversal of a transform process (e.g. RAHT process). More specifically, FIG.27A shows the top-down traversal of the RAHT process 2700A when a threshold of a maximum depth for prediction mode coding (2710) is used. Although FIG. 27A shows the top-down traversal of a RAHT process, the concepts illustrated in FIG.27A may be applied to a top-down traversal of a transform process (e.g., RAHT process, Haar process). The RAHT process 2700A may be an example of a transform process. One or more steps of the example process 2700A may be performed and / or implemented by an encoder or a decoder (e.g., encoder 114 or decoder 120 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG.32 The encoder (or decoder) may associate an average prediction mode with the RAHT node (2301 , 2501). The encoder (or decoder) may associate an average prediction mode with a transform node, for example, by comparing a transform depth (depth in the transform tree) of the transform node with the threshold of a maximum depth for prediction mode coding. For example, the encoder (or decoder) may associate an average prediction mode with the RAHT node (2301, 2501), for example, by comparing a RAHT depth (2702) (depth in the RAHT tree) of the RAHT node with the threshold of a maximum depth for prediction mode coding (2710).
[0329] The top-down traversal of the transform process (e.g., RAHT process) may start from a transform root node (e.g., RAHT root node (2700)), for example, at depth zero). The transform root node (e.g., RAHT root node (2700)) may be associated with an initial volume encompassing a portion of the decoded geometry of the point cloud. The decode geometry may be partitioned into a plurality of portions. Each of the plurality of portions may correspond to an initial volume and an associated transform root node (e.g., RAHT root node). Transform leaf nodes (e.g., RAHT leaf nodes (2701)), for example at a final depth, corresponding to voxels may be reached. Transform leaf nodes (e.g.,Docket No.: 24-2055PCTRAHT leaf nodes (2701)) corresponding to voxels may be (e.g., eventually) reached, for example, following a breadth-first traversal of the transform tree (e.g., RAHT tree) (e.g. an octree) by proceeding depth per depth of the transform tree. Information representative of the selection of a prediction mode associated with the transform node (e.g., RAHT node (2301, 2501)) maybe coded (e.g., at step 2711) in the bitstream (e.g., bitstream 2390, 2590). Information representative of the selection of a prediction mode associated with the transform node (e.g., RAHT node 2301, 2501) maybe coded in the bitstream 2390, 2590, for example, if / when the transform depth (e.g., RAHT depth 2702) of the transform node (e.g., RAHT node 2301 , 2501) is lower than or equal to the maximum depth for prediction mode coding (2710). An indication of the selected prediction mode maybe coded. No such information maybe coded and, instead, the prediction mode associated with the transform node (e.g., RAHT node 2301, 2501) may be inferred to a predetermined prediction mode 2712, for example, if / when the transform depth (e.g., RAHT depth 2702) is higher than the maximum depth for prediction mode coding (2710). The prediction mode associated with the transform node (e.g., RAHT node 2301 ,2501) may be inferred to a value indicating the intra prediction mode as shown, for example, in FIG. 27A. The choice of intra prediction mode may be reasonable because inter prediction tends to become less accurate for smaller nodes.
[0330] Inter prediction information may be relevant for transform nodes (e.g., RAHT nodes) with depths higher than the maximum depth for prediction mode coding 2710. In some examples, an average prediction mode may be used to propagate inter predictive information at depths higher than the maximum depth for prediction mode coding 2710.
[0331] For the transform node (e.g., RAHT node 2301,2501), an average predictor derived based on the average prediction mode may be a linear combination of an intra predictor and an inter predictor as follows:Predaverage = P * P redj ntra + (1 - |j)*Predinter (0=S < 1)The weight p of the linear combination may be determined based on the neighboring prediction modes (2352) of the transform node (e.g., RAHT node (2301, 2501)).
[0332] Neighboring already-coded transform nodes (e.g., already-coded RAHT nodes) may include the parent node, the grand-parent node, spatially close already-coded transform nodes (e.g., RAHT nodes) having the same depth (e.g., cousin nodes), and / or spatially close already-coded transform nodes (e.g., already-coded RAHT nodes) having a depth minus one (e.g., uncle / aunt nodes), relative to the transform node (e.g., RAHT node 2301, 2501). The weight p may be related to the proportion of intra prediction modes among the neighboring prediction modes 2352.
[0333] An average predictor may be associated with a transform nodes (eg., RAHT node (2301, 2501)). An average predictor may be associated with a RAHT node (2301 , 2501 ), for example, depending on an activation status of the average prediction mode. The average predictor may be associated with the RAHT node (2301, 2501), for example, if / when (e.g., only if / when) the average prediction mode is activated. The average prediction mode may be activated, for example, based on an upper depth for average prediction mode (2720) and a lower depth for average prediction mode (2730) (see FIG. 27B). The upper depth for average prediction mode (2720) may be lower than or equal to the maximum depth for prediction mode coding (2710), and the lower depth for average prediction mode (2730) may be higher than the maximum depth for prediction mode coding (2710).Docket No.: 24-2055PCT
[0334] FIG. 27B shows an example of the top-down traversal of the RAHT process. More specifically, FIG. 27B shows the top-down traversal of the RAHT process 2700B when the maximum depth for prediction mode coding (2710), the upper depth for average prediction mode (2720), and the lower depth for average prediction mode (2730) are used. Although FIG.27B shows the top-down traversal of a RAHT process, the concepts illustrated in FIG. 27B may be applied to a top-down traversal of a transform process (e.g., RAHT process, Haar process). The RAHT process 2700B may be an example of a transform process. One or more steps of the example process 2700B may be performed and / or implemented by an encoder or a decoder (e.g., encoder 114 or decoder 120 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG. 32.
[0335] The inter predictor derived based on the inter prediction mode may be replaced by the average predictor 2721. The inter predictor derived based on the inter prediction mode may be replaced by the average predictor 2721, for example, when / if the transform depth (e.g., RAHT depth 2702) of the transform node (e.g., RAHT node 2301, 2501) is between the upper depth for average prediction mode (2720) inclusive and the maximum depth for prediction mode coding (2710) inclusive, and the prediction mode (2321, 2521) obtained at blocks 2320 and 2520 is an inter prediction mode. The average predictor 2721 may be derived, for example, based on the average prediction mode. The inter prediction mode may be replaced by an average prediction mode. The inter prediction mode may be replaced by an average prediction mode, for example, when / if the transform depth (e.g., RAHT depth 2702) of the transform node (e.g., RAHT node 2301, 2501) is between the upper depth for average prediction mode 2720 inclusive and the maximum depth for prediction mode coding 2710 inclusive, and the prediction mode 2321, 2521 obtained at steps 2320 and 2520 is an inter prediction mode. The indication of the prediction mode (2321, 2521) that is signaled in the bitstream (2390, 2590) as prediction mode information (2322, 2522) may still indicate an inter prediction mode for transform nodes (e.g., RAHT nodes, Haar nodes) at a transform depth (e.g., RAHT depth, Haar depth) between the upper depth 2720 inclusive and maximum depth 2710 inclusive.
[0336] The intra predictor derived based on the intra prediction mode may be replaced by the average predictor 2731. The intra predictor derived based on the intra prediction mode may be replaced by the average predictor 2731 , for example, when / if the transform depth (e.g., RAHT depth 2702) of the transform node (e.g., RAHT node 2301, 2501) is between the maximum depth for prediction mode coding 2710 exclusive and the lower depth for average prediction mode (2730) inclusive, and the prediction mode (2321, 2521) obtained at blocks 2320 and 2520 is an intra prediction mode. The average predictor (2731) may be derived, for example, based on the average prediction mode. The intra prediction mode may be replaced by an average prediction mode. The intra prediction mode may be replaced by an average prediction mode, for example, when / if the transform depth (e.g., RAHT depth (2702)) of the transform node (e.g., RAHT node (2301, 2501)) is between the maximum depth for prediction mode coding (2710) exclusive and the lower depth for average prediction mode (2730) inclusive, and the prediction mode (2321, 2521) obtained at blocks 2320 and 2520 is an intra prediction mode. The indication of the prediction mode (2321, 2521) that is signaled in the bitstream (2390, 2590) as prediction mode information (2322, 2522) may still indicate an intraDocket No.: 24-2055PCTprediction mode for transform nodes (e.g., RAHT nodes, Haar nodes) ata transform depth (e.g., RAHT depth, Haar depth) between the maximum depth (2710) exclusive and lower depth (2730) inclusive.
[0337] The replacement of the inter prediction mode and / or intra prediction mode implies that the best prediction mode as signaled in the bitstream (e.g., best prediction mode selected by the encoder) has been overwritten. More specifically, at higher transform (e.g., RAHT, Haar) depths such as at transform depths higher than the maximum depth for prediction mode coding (2710, FIG. 27A) or higher than the lower depth for average mode (2730, FIG. 27B), all prediction modes are inferred to be “intra prediction mode” even if an inter prediction mode would have been better. This process may induce the average predictor to become closer (e.g., with higher pi) to the intra predictor than to the inter predictor, for example, because most indications of neighboring prediction modes (2352) may indicate intra prediction mode. One approach to mitigate this effect may be to extend the neighborhood of transform nodes (e.g., RAHT nodes (2301, 2501)) to parent and grand-parent nodes in order to propagate inter prediction mode being the selected prediction mode from the top (lower transform depths) of the transform tree. Nevertheless, the average predictor quickly converges to be close to the intra predictor at transform (e.g., RAHT, Haar) depths higher than the maximum depth for prediction mode coding (2710, FIG.27A) or higher than the lower depth for average prediction mode (2730, FIG. 27B) The accuracy of information of the reference frame used by inter prediction may be reduced, and a mostly-intra average predictor may be suboptimal when motion compensation is very accurate, for example, in static portions of the point cloud.
[0338] Accordingly, when the average prediction mode is activated, the obtained prediction mode (2321, 2521) may not reflect the best prediction mode for this transform (e.g., RAHT, Haar) node. Therefore, coding a next transform node also may not be efficient because this coding may be based on neighboring prediction modes (2352) that do not necessarily indicate the best prediction modes. There are at least two implications of applying the average prediction mode explained above that negatively impact the compression performance: firstly, the choice of a context, based on the neighboring prediction modes (2352), for entropy coding the indication of the prediction mode (2321 , 2521) or the residual transform (e.g., RAHT, Haar) coefficients (2351) may be suboptimal; and secondly, the construction of an average predictor (blocks 2330, 2530) or the average predictor of FIG. 27A or FIG. 27B, may be inaccurate because the parameter pi is deduced based on likely non-optimal (i.e., resulting from using replaced prediction modes) neighboring prediction modes (2352).
[0339] Embodiments described herein may relate to improved coding of prediction mode associated with a transform node (e.g., the RAHT node (2301, 2501), a Haar node) by decoding, based on a prediction mode associated with the transform node (e.g., RAHT node (2301, 2501), Haar node), transform (e.g., RAHT, Haar) coefficients representative of attributes of the transform node (e.g., RAHT node (2301 , 2501), Haar node), and by associating one a posteriori prediction mode, determined (e.g., derived) based on the decoded transform coefficients, with the transform node (e.g., RAHT node (2301, 2501), Haar node).
[0340] The “a posteriori prediction mode” refers to a prediction mode that is determined once the transform (e.g., RAHT, Haar) coefficients of a transform node (e.g., RAHT node (2301, 2501), Haar node) are decoded (at theDocket No.: 24-2055PCTencoder and at the decoder) based on the prediction mode (2321, 2521) obtained in blocks 2320, 2520. The prediction mode obtained in blocks 2320, 2520 may be inferred (e.g., derived from neighboring prediction modes (2352) or equal to an average prediction mode) or selected (e.g., by RDO with an indication of the selected prediction mode being coded in a bitstream) without putting into competition all prediction modes of a set of prediction modes. Since the a posteriori prediction mode is derived from the decoded transform (e.g., RAHT, Haar) coefficients of the transform node (e.g., RAHT node, Haar node), the a posteriori prediction mode may be much more likely to indicate a true, best prediction mode for the transform coefficients of the transform node than the prediction mode (2321, 2521). Neighboring prediction modes (2352) may instead be neighboring respective a posteriori prediction modes that are more likely to be accurate and further be used to improve context selection for entropy coding (2320, 2530) an indication of prediction mode associated with the transform node (e.g., RAHT node (2301, 2501), Haar node) or for entropy coding (2340, 2540) the residual transform coefficients (e.g., RAHT coefficients (2351, 2551), Haar coefficients). Construction of an average predictor is also more accurate because parameter is derived based on neighboring a posteriori prediction modes instead of neighboring prediction modes (2352) that are inferred or selected.
[0341] FIG. 28 illustrates an example process 2800 for encoding attributes. More specifically, FIG. 28 shows an example process 2800 for encoding, in a bitstream (2390), attributes of the RAHT node (2801) and determining an a posteriori prediction mode (2821) for the RAHT node (2801), according to some embodiments. Although FIG. 28 shows an example method for encoding attributes of a RAHT node, the example method may be generally applied for encoding attributes of a transform node (e.g., RAHT node, Haar node). Accordingly, the RAHT node (2801) maybe an example of a transform node. The example process 2800 may be generally used for a transform (e.g., RAHT, Haar). One or more steps of process 2800 may be performed and / or implemented by an encoder (e.g., encoder 114 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG.32. Process 2800 may include some of the same operations (shown as having the same labeled blocks) as those described with respect to FIG. 23 and FIG.24. In some examples, blocks 2810-2830 may represent components within the encoder.
[0342] The encoder may perform a forward transform (e.g., forward RAHT transform, forward Haar transform) to obtain sets of transform (e.g., RAHT, Haar) coefficients. For example, referring to FIG.28, at block 2310, the encoder may perform a forward RAHT transform. For example, the encoder transforms attributes (2805) of the RAHT node (2801) to obtain sets of RAHT coefficients (2311).
[0343] The encoder may determine the attribute prediction mode for encoding the attributes associated with the transform node (e.g., RAHT node 2801). At block 2320, the encoder obtains a prediction mode (2321) and if the prediction mode (2321) is the selected prediction mode psei(2421), then the encoder entropy encodes an indication of the prediction mode (2321) as prediction mode information (2322) in the bitstream (2390).
[0344] At block 2330, the encoder determines sets of residual transform coefficients based on applying the prediction mode (2321) to the transform (e.g., RAHT, Haar) node. For example, the residual transform coefficients may be a difference between the sets of transform coefficients (e.g., sets of RAHT coefficients 2311) and corresponding sets ofDocket No.: 24-2055PCTpredicted transform coefficients obtained / derived from applying the prediction mode (2321) Referring to FIG. 28, at block 2330, the encoder applies the prediction mode (2321) to the RAHT node (2801) to obtain (e.g., compute or derive) sets {Ck} of residual RAHT coefficients (2331) from the sets of RAHT coefficients (2311). The encoder entropy encodes the sets {Ck} of residual transform coefficients (e.g., RAHT coefficients (2331 ), Haar coefficients) into the bitstream (2390) as coefficient information (2342).
[0345] The encoder may (e.g., further) determine a posteriori prediction mode and may associate an a posteriori prediction mode with a transform (e.g., RAHT, Haar) node. For example, the encoder may (e.g., further) determine a posteriori prediction mode (2821) and may associate the a posteriori prediction mode (2821) with the RAHT node (2801). The encoder may (e.g., further) determine the a posteriori prediction mode (2821) and may associate the a posteriori prediction mode (2821) with the transform node (e.g , RAHT node (2801)), for example, after the sets (e.g., {Ck}) of residual transformed coefficients (e.g., RAHT coefficients (2331)) have been entropy encoded (e.g., at block 2340) into the bitstream (2390).
[0346] The encoder may determine (e.g., decide) whether an a posteriori prediction mode is activated for a transform (e.g., RAHT, Haar) node or not. For example, at block 2810, the encoder may determine (e.g., decide) whether an a posteriori prediction mode is activated for the RAHT node (2801) or not. The transform node associated with the prediction mode (2321) may be considered as an already-coded neighboring transform node of a (e.g., further) transform node to be processed and the process 2800 may end. The RAHT node (2801) associated with the prediction mode (2321) may be considered as an already-coded neighboring RAHT node of a (e.g., further) RAHT node to be processed and the process 2800 may end, for example, if it is determined that the a posteriori prediction mode is not activated. At block 2820, the encoder may determine an a posteriori prediction mode (2821). The encoder may determine the a posteriori prediction mode (2821), for example, if it is determined that the a posteriori prediction mode is activated. The encoder may determine the a posteriori prediction mode (2821), for example, based on decoded transformed coefficients (e.g., RAHT coefficients (2832)) associated with the transform node (e.g., RAHT node (2801)) as described herein. The decoded transformed coefficients (e.g., RAHT coefficients (2832)) may be obtained as follows. First, sets of decoded transformed coefficients of the transform node (e.g., RAHT coefficients of the RAHT node (2801)) may be obtained. The sets of decoded transformed (e.g., RAHT) coefficients may be the sets of the transformed coefficients (e.g., RAHT coefficients 2311 ), for example, when / if the sets of transformed residual coefficients (e.g., RAHT residual coefficients (2841)) are not quantized before being entropy encoded. The sets of decoded transform coefficients of the transform node may be inverse quantized. The sets of decoded transformed coefficients (e.g., RAHT coefficients) of the transform node (e.g., RAHT node (2801)) may be inverse quantized, for example, when / if the sets of RAHT residual coefficients (2841) are quantized before being entropy encoded. Second, the prediction mode (2321 ) may be used to generate a predictor of the sets of RAHT coefficients of the RAHT node (2801). The predictor of the sets of transformed coefficients (eg., RAHT coefficients) of the transform node (e.g., RAHT node 2801) may be added to the sets of residual transformed coefficients (e.g., residual RAHT coefficientsDocket No.: 24-2055PCT2531) to obtain the sets of decoded transformed coefficients (e.g., RAHT coefficients (2832)) of the transform node (e.g, RAHT node (2801)).
[0347] In some examples, a plurality of prediction modes (including the selected prediction mode (2321) at block 2320) are applied to generate a plurality of predictors of the sets of RAHT coefficients of the RAHT node (e.g., sets of RAHT coefficients (2311)), that may each be added to the sets of residual RAHT coefficients (2331). Then, one of the prediction modes of the plurality of prediction modes that results in a closest combined result (e.g., after adding predictor to the sets of residual RAHT coefficients (2331)) to the decoded RAHT coefficients (2832) may be selected as the a posteriori prediction mode (2821). In other words, resulting in the smallest difference between the decoded RAHT coefficients 2832 and the reconstructed RAHT coefficients resulting from adding the predictor of the sets of RAHT coefficients 2311 to the decoded sets of residual RAHT coefficients (2331)
[0348] At step 2830, the encoder may associate a posteriori prediction mode (2821) with the RAHT node (2801). An indication of the a posteriori prediction mode (2821) may be saved as a parameter for the RAHT node (2801). The RAHT node (2801) associated with the a posteriori prediction mode (2821) may be considered as being an already- coded neighboring RAHT node of a further RAHT node to be processed and the process 2800 may end.
[0349] FIG. 29 illustrates an example method for decoding attributes. More specifically, FIG. 29 shows an example process 2900 for decoding, from the bitstream (2590), attributes of the RAHT node (2901 ) using an attribute prediction mode and determining an a posteriori prediction mode (2821) for the RAHT node (2901), according to some embodiments. Although FIG. 29 shows an example method for decoding attributes of a RAHT node, the example method may be generally applied for decoding attributes of a transform node (e.g., RAHT node, Haar node) using an attribute prediction mode and one a posteriori prediction mode associated with the transform node.Accordingly, the RAHT node (2901) may be an example of a transform node. The example process 2900 may be generally used for a transform (e.g., RAHT, Haar). One or more steps of process 2900 may be performed by a decoder (e.g., decoder 120 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG.32 Process 2900 may include some of the same operations (shown as having the same labeled blocks) as those described with respect to FIG. 25, FIG. 26, and FIG. 28. The bitstream (2590) is typically obtained from the encoding method as shown herein with respect to FIG. 28.
[0350] At block 2520, the decoder obtains a prediction mode (2521) for the transform node (e.g., RAHT node (2901), Haar node).
[0351] At block 2540, the decoder obtains sets {Ck} of residual transform coefficients (e.g., RAHT coefficients (2531)) by entropy decoding coefficient information (2542) from the bitstream (2590). The decoder applies (block 2530) the prediction mode (2521) to the current transform node (e.g., RAHT node (2901)) to obtain (e.g., compute or derive) sets of transform coefficients (e.g., RAHT coefficients (2511)). The decoder obtains (block 2510) decoded attributes (2505) associated with the transform node (e.g., RAHT node (2901)) by inverse transforming the sets of transform coefficients (e.g., RAHT coefficients (2511)). As described herein, the inverse transform may be an inverse RAHTDocket No.: 24-2055PCTtransform or an inverse Haar transform depending on which transform is selected by the encoder and signaled in the bitstream to the decoder.
[0352] The decoder may (e.g., further) determine an a posteriori prediction mode and may associate the a posteriori prediction mode with a transform (e.g., RAHT, Haar) node. The decoder may (e.g., further) determine a posteriori prediction mode (2821) and may associate the a posteriori prediction mode (2821) with the transform node (e.g., RAHT node (2901)). The decoder may (e.g., further) determine a posteriori prediction mode (2821) and may associate the a posteriori prediction mode (2821) with the transform node (e.g., RAHT node (2901)), for example, after the decoded attributes (2505) have been obtained (e.g., at block 2510).
[0353] At block 2810, the decoder may determine (e.g., decide) whether the a posteriori prediction mode is activated for the transform node (e.g., RAHT node (2901)) or not. The transform node (e.g., RAHT node (2901)) associated with the prediction mode (2521) may be considered as an already-coded neighboring transform node of a (e.g., further) transform node to be processed and the process 2900 may end, for example, if it is determined the a posteriori prediction mode is not activated. At block 2820, the decoder may determine the posteriori prediction mode (2821). The decoder may determine the posteriori prediction mode (2821), for example, if the decoder determines that the a posteriori prediction mode is activated. The decoder may determine the a posteriori prediction mode (2821), for example, based on decoded transformed coefficients (e.g., RAHT coefficients (2832)) associated with the transform node (e.g., RAHT node (2801)) as described herein. The decoded transformed coefficients (e.g., RAHT coefficients (2832)) may be the sets of transformed coefficients (e.g., RAHT coefficients 2511). The sets of decoded transformed coefficients (e.g., RAHT coefficients 2511) of the transform node (e.g., RAHT node 2801) may be inverse quantized. The sets of decoded transformed coefficients (e.g., RAHT coefficients) of the transform node (e.g., RAHT node 2801) may be inverse quantized, for example, when / if the sets of transformed residual coefficients (e.g., RAHT residual coefficients 2531) are quantized before being entropy encoded. The prediction mode (2521) may be used to generate a predictor of the sets of transformed coefficients (e.g., RAHT coefficients (2511)) of the transform node (e.g., RAHT node (2901)). The predictor of the sets of transformed coefficients (e.g., RAHT coefficients) of the transform node (e.g., RAHT node 2901) may be added to the sets of (dequantized) residual transformed coefficients (e.g., residual RAHT coefficients 2531) to obtain the sets of decoded transformed coefficients (e.g., RAHT coefficients 2832) of the transform node (e.g., RAHT node 2801).
[0354] The decoder may associate an a posteriori prediction mode with the transform (e.g., RAHT, Haar) node. For example, at block 2830, the decoder associates the a posteriori prediction mode (2821) with the RAHT node (2901). An indication of the a posteriori prediction mode 2821 may be saved as a parameter for the transform node (e.g., RAHT node (2901)). The RAHT node (2901) associated with the a posteriori prediction mode (2821) may then be considered as an already already-coded neighboring transform (e.g., RAHT, Haar) node of a further transform node to be processed and the process 2900 ends.Docket No.: 24-2055PCT
[0355] In some embodiments, the a posteriori prediction mode (2821) may be determined (e.g., at block 2820) and associated (e.g., block 2830) with the transform node (e.g., RAHT node (2801, 2901)) based on the a posteriori prediction mode being activated (block 2810).
[0356] In some embodiments, the a posteriori prediction mode is activated based on at least one prediction mode of a set of prediction modes being omitted from being considered in selecting the prediction mode used in decoding the transform (e.g., RAHT, Haar) coefficients.
[0357] In some examples, the determination (or decision) of whether the a posteriori prediction mode is activated depends on whether the prediction mode (2321 , 2521) is reflective or highly likely to be indicative of a true, best prediction mode.
[0358] For example, when the prediction mode (2321 , 2521) is the selected prediction mode psei(2421 , 2621), then the prediction mode (2321 , 2421 ) reflects or highly likely reflects the true, best prediction mode when all the prediction modes of the set of prediction modes have been put into competition. Accordingly, the a posteriori prediction mode may be activated when the prediction mode (2321, 2421) is not selected from all the prediction modes of a set of prediction modes, e.g., when at least one of the set of prediction mode is pruned (e.g., omitted or skipped) when selecting the prediction mode.
[0359] In some embodiments, the a posteriori prediction mode may be activated (block 2810) based on the prediction mode (2121, 2321) being inferred.
[0360] For example, when the prediction mode (2321 , 2521) is inferred, as discussed above in FIG. 24 and FIG.26, then the prediction mode (2321, 2521) may not be highly reflective of a true, best prediction mode, and the a posteriori prediction mode is activated (block 2810).
[0361] In some embodiments, the inference of the prediction mode may be based on prediction modes associated with already-coded neighboring transform (e.g., RAHT, Haar) node of the transform (e.g., RAHT, Haar) node.
[0362] In some embodiments, the already-coded neighboring transform (e.g., RAHT, Haar) nodes (2350) of the transform (eg., RAHT, Haar) node (2801, 2901) may include nodes that share a face (e.g., a partofface), an edge (e.g., a part of an edge), ora vertex with the RAHT node (2801, 2901).
[0363] In some embodiments, the already-coded neighboring transform (e.g., RAHT, Haar) nodes (2350) of the transform (e.g., RAHT, Haar) node (2801, 2901) may include the parent node of the transform (e.g., RAHT, Haar) node (2801, 2901).
[0364] In some embodiments, the already-coded neighboring transform (e.g., RAHT, Haar) nodes (2350) of the transform (e.g., RAHT, Haar) node may include nodes that are siblings to the parent node of the transform (e.g., RAHT, Haar) node (2801, 2901).
[0365] In some embodiments, when the average prediction mode is activated and the prediction mode (2321, 2521) is determined as an average prediction mode , then the prediction mode (2321, 2521) does not necessarily reflect a true, best prediction mode, and the a posteriori prediction mode is activated (block 2810).Docket No.: 24-2055PCT
[0366] For example, when the average predictor as discussed in FIG 27 A and 27B is activated, the prediction mode (2321, 2521) may be inferred to be a predetermined prediction mode (e.g., associated with a fixed value that indicates either an intra prediction mode or an inter prediction mode). The prediction mode (2321, 2521) is then considered as not reflecting the true best prediction mode.
[0367] For example, the average prediction mode may be activated based on the RAHT depth (2702) of the transform (e.g., RAHT, Haar) node (2801, 2901) being higher than or equal to the maximum depth for prediction mode coding (2710) as discussed in FIG.27A.
[0368] For example, the average prediction mode may be activated based on the depth (2702) of the transform (e.g., RAHT, Haar) node (2801 , 2901) being between the upper depth for average prediction mode (2720) inclusive and the maximum depth for prediction mode coding (2710) inclusive.
[0369] For example, the average prediction mode may be activated based on the depth (2702) of the transform (e.g., RAHT, Haar) node (2801, 2901) being between the maximum depth for prediction mode coding (2710) exclusive and the lower depth for average prediction mode (2730) inclusive as discussed in FIG. 27B.
[0370] In some embodiments, the a posteriori prediction mode (2821) may be determined (e.g., at block 2820) as one of the prediction modes of a set of prediction modes. For example, the set of prediction modes may include at least one intra prediction mode and / or at least one inter prediction mode and / or a null mode. Examples of intra and inter prediction modes as described herein.
[0371] Determining the a posteriori prediction mode (2821) (e.g., at block 2820) may include calculating distances (e.g., representing distortions) for respective prediction modes of the set of prediction modes. Determining the a posteriori prediction mode (2821) (e.g., at block 2820) may include determining the a posteriori prediction mode (2821) as being the prediction mode with the minimum distance. Each distance of the distances (e.g., distances representing distortions) may be calculated. Each distance of the distances may be calculated, for example, based on (e.g., between) decoded attribute information of the transform node (e.g., RAHT node (2801, 2901)) derived from the decoded transformed coefficients (eg., RAHT coefficients 2832) and a prediction of the decoded attribute information of the transform node (e.g., RAHT node (2801, 2901)) derived from a respective prediction mode of the set of prediction modes. By doing so, both encoder and decoder may determine the same a posteriori prediction mode. Synchronization between encoding and decoding processes may be realized (e.g., ensured).
[0372] The decoded attribute information of the transform node (e.g., RAHT node (2801, 2901)) may include decoded attributes associated with the transform node (e.g., RAHT node (2801, 2901)). The decoded attributes associated with the transform node (e.g., RAHT node (2801, 2901)) may be determined (e.g., derived). The decoded attributes associated with the transform node (e.g., RAHT node 2801 , 2901) may be determined (e.g., derived), for example, based on (e.g., from) the decoded transformed coefficients (e.g., RAHT coefficients (2832)). The prediction of the decoded attribute information of the transform node (e.g., RAHT node (2801, 2901)) may include predicted decoded attributes associated with the transform node (e.g., RAHT node (2801, 2901)). The predicted decoded attributes associated with the transform node (e.g., RAHT node (2801, 2901)) may be determined (e.g., derived). The predictedDocket No.: 24-2055PCTdecoded attributes associated with the transform node (e.g., RAHT node (2801, 2901)) may be determined (e.g., derived), for example, based on the prediction mode of the set of prediction modes. Each distance may be determined (e.g., calculated) based on (e.g., between) the decoded attributes of the transform node (e.g., RAHT node (2801, 2901)) and the predicted decoded attributes.
[0373] Process 2800 (FIG. 28) may obtain the decoded attribute information of the transform (e.g., RAHT, Haar) node by decoding attributes of the transform node using the same process as process 2900 in FIG.29 as described herein. For example, process 2800 (FIG.28) may obtain the decoded attribute information of the RAHT node (2801) by decoding attributes of the RAHT node (2801) using the same process as process 2900 in FIG. 29 as described herein. The prediction mode (2321), corresponding to prediction mode (2521) obtained at the decoder, maybe used to generate (e.g., at block 2330) a prediction that may be added to the sets of residual transformed coefficients (e.g., RAHT coefficients 2331) to obtain sets of decoded transformed coefficients (e.g., RAHT coefficients). The prediction mode 2321 may be used to generate a prediction that may be added to the sets of residual transformed coefficients (e.g., residual RAHT coefficients (2331)), for example, after decoding the coefficient information 2342 encoding the sets of residual transformed coefficients (e.g., residual RAHT coefficients 2331). The decoded attributes associated with the transform node (eg., RAHT node 2801) may be obtained, for example, by inverse transforming (e.g., block 2510) the sets of decoded transformed coefficients (e.g., RAHT coefficients).
[0374] The prediction of the decoded attribute information of the transform node (e.g., RAHT node 2801, 2901) may be determined. The prediction of the decoded attribute information of the transform node (e.g., RAHT node 2801 , 2901) may be determined (e.g., derived), for example, based on decoded attribute information of child transform nodes (e.g., nodes j) of the transform node (e.g., RAHT node 2801, 2901).
[0375] The distance may be a sum of errors (e.g., differences) over the child transform nodes of the transform node (e.g., RAHT node 2801 , 2901). Each error may be defined between decoded attribute information of a respective child transform node (e.g., node j) of the child transform node (e.g., RAHT nodes) and the decoded attribute information associated with the transform node (e.g., RAHT node 2801, 2901). The sum of errors may be an absolute difference (e.g., SAD) ora sum of squared difference (e.g., SSE).
[0376] The decoded attribute information of a child transform node (e.g., node j) may be determined (e.g., derived). The sets of (decoded) transformed coefficients (e.g., RAHT coefficients) of the transform node (e.g., RAHT node 2801, 2901) may include decoded AC coefficients and a decoded DC coefficient. The decoded DC coefficient may be from (e.g., inherited from) the transform node (e.g., RAHT node 2801, 2901) as described herein. Decoded DC coefficients (e.g., q) may be obtained for each child transform node (e.g., node j) of the sets of decoded transformed coefficients (e.g., RAHT coefficients) of the transform node (e.g., RAHT node 2801 , 2901). Decoded DC coefficients (e.g., q) may be obtained for each child transform node (e.g., node j) of the sets of decoded transformed coefficients (e.g., RAHT coefficients) of the transform node (e.g., RAHT node 2801, 2901), for example, by applying / using an inverse transform (e.g., RAHT transform, Haar transform) to / for the transform node (e.g., RAHT node 2801, 2901). The decoded attribute information of a child transform nodes (e.g., node j) may be the decoded transformedDocket No.: 24-2055PCTcoefficient (e.g., q). The decoded transform coefficient (e.g., q) may be the sum of attribute values, associated with voxels contained in the volume associated with the child transform node (e.g., node j), divided by the square root of a number of attributes (i.e., the square root of the transform (e.g., RAHT) weight of the child transform node j) as described herein.
[0377] The decoded attribute information of a child transform node (e.g., node j) of the transform node (e.g., RAHT node 2801, 2901) may be the mean attribute of the child transform node (e.g., node;') of the transform node (e.g., RAHT node 2801, 2901). The mean attribute of the child transform node (e.g., node;') of the transform node (e.g., RAHT node 2801, 2901) may be determined (e.g., derived) by dividing the decoded DC coefficient (e.g., q) by the square root of the number of attributes (i.e., the square root of the transform weight of the child node). A decoded transformed coefficient (e.g., c) associated with the transform node (e.g., RAHT node 2801 , 2901) may be the sum of attribute values, associated with voxels contained in the volume associated with the transform node (e.g., RAHT node 2801, 2901), divided by the square rootof a number of attributes (i.e., the square root of the RAHT weight of the RAHT node j} as described herein.
[0378] Each error may be determined (e.g., calculated) based on (e.g., between) the mean attribute (e.g., c) of the transform (eg., RAHT, Haar) node and a prediction. The prediction may be determined (e.g., derived), for example, based on the mean attribute of a child transform node (e.g., node j) of the transform node (e.g., RAHT node 2801, 2901). An intra prediction of a child transform node (e.g., node j) of the transform node (e.g., RAHT node 2801, 2901) may be determined (e.g., derived). An intra prediction of a child RAHT node (e.g., node j) of the RAHT node 2801, 2901 may be determined (e.g., derived), for example, by interpolation of the mean attribute associated with each child RAHT node (e.g., node j) (e.g., as depicted for example in FIG. 22). An inter prediction of a child transform node (e.g., node j) of the transform (e.g., RAHT node 2801 , 2901 ) may be obtained. An inter prediction of a child transform node (e.g., node j) of the transform node (e.g., RAHT node 2801 , 2901) may be obtained, for example, by motion compensation of a reference frame in the mean attribute domain and obtaining mean attribute values for each child node (e.g., node j) of the transform node (eg., RAHT node 2801, 2901). A first distance may be determined (e.g, calculated) based on (e.g., between) the intra prediction and the mean attributes of the transform node (e.g., RAHT node 2801, 2901). A second distance may be determined (e.g., calculated) based on (e.g., between) the inter prediction and the mean attribute of the transform node (e.g., RAHT node 2801, 2901). The a posteriori prediction mode 2821 may be determined as the intra predictor The a posteriori prediction mode 2821 may be determined as the intra predictor, for example, if the first distance is lower than the second distance. The a posteriori prediction mode 2821 may be determined as the inter predictor, for example, if the first distance is higher than or equal to the second distance.
[0379] As discussed above, the a posteriori prediction mode (2821 ) may reflect a more accurate prediction mode that is closer to the true, best prediction mode for the transform node (eg., RAHT node (2801, 2901)) compared to a prediction mode (2321, 2521) associated with the transform node (e.g., RAHT node (2801, 2901)).Docket No.: 24-2055PCT
[0380] As discussed above, an average predictor may be derived from the prediction mode (2321, 2521). The average predictor may be obtained as a linear combination of an intra predictor and an inter predictor depending on a parameter as discussed above.
[0381] For example, when the a posteriori prediction mode is activated, already-coded neighboring transform (e.g., RAHT, Haar) nodes and their associated a posteriori prediction modes may be used to construct an average predictor for a transform (e.g., RAHT, Haar) node (e.g., at block 2820). The average predictor (e.g., Predaverage) may be, for example, representing average prediction mode 2811. The average predictor Predaverage may be obtained as a linear combination, of an intra predictor Predintra derived from an intra prediction mode and an inter predictor Predinter derived from an inter prediction mode, depending on a parameter :Predaverage = *Predjntra + (1 - )*Predjnter
[0382] The parameter p may be determined, for example, based on the neighboring prediction modes 2352. At least a part of the neighboring prediction modes 2352 may include one or more a posteriori prediction modes 2821 , of one or more neighboring transform (e.g., RAHT, Haar) nodes. The one or more a posteriori prediction modes 2821 may indicate the one or more respective (e.g., best) selected prediction modes. The selected prediction modes may be, for example, a best prediction mode being selected from a set of prediction modes (e.g , intra or inter or NULL prediction mode). By doing so, the parameter p of the average predictor for a transform (e.g., RAHT, Haar) node may be determined more accurately based on neighboring prediction modes 2352. The parameter p of the average predictor for a transform (e.g., RAHT, Haar) node may be determined more accurately, for example, because one or more of neighboring prediction modes 2352 may be obtained as one or more respective a posteriori prediction modes, which may be more accurate indications of which of intra predictors or inter predictors may be locally better. The quality of the average predictor may be thus improved as well as compression performance. The parameter p may have a positive correlation with the proportion of neighboring prediction modes 2352, which may be obtained from a posteriori prediction modes 2821 if activated, being intra modes.
[0383] When / if the a posteriori prediction mode is activated, neighboring prediction modes 2352 may be more likely to reflect the true, best prediction mode and improve context selection of entropy coding of indication of prediction mode 2321, 2521 associated with the transform node (e.g., RAHT node 2801, 2901) and / or of sets of residual transformed coefficients (e.g residual RAHT coefficients 2841, 2531), and / or improve determining the set of transformed coefficients (e.g., RAHT coefficients 2840, 2910). The selection of a context may be performed more accurately based on neighboring prediction modes 2352 because one or more of neighboring prediction modes 2352 may be obtained as one or more respective a posteriori prediction modes, which may be more accurate indications of which of intra predictors or inter predictors may be locally better. The quality of context selection may be thus improved as well as compression performance. The set of prediction modes may include at least one intra mode and / or at least one inter mode and / or a null prediction mode.
[0384] FIG. 30A illustrates an example of the top-down traversal of a transform process (e.g., a RAHT process). More specifically, FIG. 30A shows the top-down traversal of the RAHT process 3000A when a maximum depth forDocket No.: 24-2055PCTprediction mode coding is activated and when the a posteriori prediction mode is activated, according to some embodiments. Although FIG.30A shows the top-down traversal of a RAHT process, the concepts illustrated in FIG.30A may be applied to a top-down traversal of a transform process (e.g., RAHT process, Haar process). The RAHT process 3000A may be an example of a transform process. One or more steps of the example process 3000A may be performed by an encoder or a decoder (e.g., encoder 114 or decoder 120 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG.32. The process 3000A may include some of the same operations (shown as having the same labeled blocks) as those described in FIG. 27A. The encoder (or decoder) may associate an average prediction mode with a transform node, for example, by comparing a transform depth (depth in the transform tree) of the transform node with the threshold of a maximum depth for prediction mode coding. For example, the encoder (or decoder) may associate an average prediction mode with the RAHT node (2801 , 2901 ) by comparing its RAHT depth (2702) (e.g., depth in the RAHT tree) with the maximum depth for prediction mode coding (2710).
[0385] For example, when the transform (e.g., RAHT) depth (2702) of the transform node (e.g., RAHT node (2801 , 2801)) is greater than the maximum depth for predicting mode coding (2710), an a posteriori prediction mode is activated, and the prediction mode (2321, 2521) is replaced by the posteriori prediction mode (3010) determined in block 2820 rather than inferred to a predetermined / default prediction mode.
[0386] FIG. 30B illustrates an example of the top-down traversal of a transform process (e.g., a RAHT process). More specifically, FIG.30B shows the top-down traversal of the RAHT process when a maximum depth for prediction mode coding, an upper depth for average mode, and a lower depth for average mode are used, and when the a posteriori prediction mode is activated, according to some embodiments. Although FIG. 30B shows the top-down traversal of a RAHT process, the concepts illustrated in FIG. 30B may be applied to a top-down traversal of a transform process (e.g., RAHT process, Haar process). The RAHT process 3000B may be an example of a transform process.
[0387] One or more steps of the example process 3000B may be performed by an encoder or a decoder (e.g., encoder 114 or decoder 120 of FIG. 1) or an example computer system 3200 as described herein with respect to FIG.32. The process 3000B may include some of the same operations (shown as having the same labeled blocks) as those described in FIG. 27B.
[0388] For example, when the RAHT depth (2702) of the RAHT node (2801,2901) is between the upper depth for average prediction mode (2720) inclusive and the maximum depth for prediction mode coding (2710) inclusive or when the RAHT depth (2702) of the RAHT node (2801, 2901) is between the maximum depth for prediction mode coding (2710) exclusive and the lower depth for average prediction mode (2730) inclusive, or when the RAHT depth (2702) of the RAHT node (2801, 2801) is greater than the maximum depth for predicting mode coding (2710), one a posteriori prediction mode is activated, and the prediction mode (2321, 2521) is replaced by a posteriori prediction mode (3010) determined in block 2820 rather than inferred to a predetermined / default prediction mode.
[0389] By doing so (FIG. 30A and FIG. 30B), the parameter pi of the average predictor for a transform (e.g., RAHT) node can be determined more accurately based on neighboring prediction modes (2352) because one or more ofDocket No.: 24-2055PCTneighboring prediction modes (2352) may be obtained as one or more respective a posteriori prediction modes, which are more accurate indications of which of intra predictor or inter predictor are locally better. The quality of the average predictor is thus improved as well as compression performance compared to implementations of only using neighboring prediction modes (2352) directly used by neighboring transform (e.g., RAHT, Haar) nodes. For example, the parameter may have a positive correlation with the proportion of neighboring prediction modes (2352), which may be obtained from a posteriori prediction modes (2821) if activated, being intra modes.
[0390] When the a posteriori prediction mode is activated, neighboring prediction modes (2352) are more likely to reflect the true, best prediction mode and improves context selection of entropy coding of indication of prediction mode (2321, 2521) associated with the transform node (e.g., RAHT node (2801, 2901)) and / or of sets of residual transform (eg., RAHT coefficients (2331, 2531)), and / or improving determination of the set of transform coefficients (e.g., RAHT coefficients (2311 , 2511)). The selection of a context is performed more accurately based on neighboring prediction modes (2352) because one or more of neighboring prediction modes (2352) may be obtained as one or more respective a posteriori prediction modes, which are more accurate indication of which of intra predictors or inter predictors are locally better. The quality of context selection is thus improved as well as compression performance.
[0391] In some embodiments, the set of prediction modes may include at least one intra mode and / or at least one inter mode and / or a null prediction mode.
[0392] FIG. 31 illustrates a flowchart 3100 of an example method for associating an a posteriori prediction mode with a RAHT node of a RAHT tree, according to some embodiments. Although FIG.31 shows an example method for a RAHT node, the example method may be generally applied for a transform node (e.g., RAHT node, Haar node). Accordingly, the RAHT node may be an example of a transform node. The example method may be generally used for a transform (e.g., RAHT, Haar). One or more steps of the example method (e.g., flowchart 3100) maybe performed and / or implemented by a coder such as an encoder or a decoder (e.g., encoder 114 or the decoder 120 of FIG. 1).
[0393] The coder decodes, based on a prediction mode associated with the transform node (e.g., RAHT or Haar node), transform (e.g., RAHT, Haar) coefficients representative of attributes of the transform node of a transform tree for a point cloud of / representing content. For example, at block 3102, the coder (encoder or decoder) decodes, based on a prediction mode associated with the RAHT node, RAHT coefficients representative of attributes of the RAHT node of a RAHT tree. Examples of decoding the RAHT coefficients are further described herein with respect to FIGS.28 and 29.
[0394] In some examples, determination of the prediction mode may be based on prediction modes of neighboring transform nodes of the transform node, as described herein with respect to FIGS.28 and 29. In some examples, one or more of the prediction modes of the neighboring transform nodes may include one or more respective a posteriori prediction modes, which may be more accurate as explained above.
[0395] The coder determines one a posteriori prediction mode based on the decoded transform (e.g., RAHT, Haar) coefficients. For example, at block 3104, the coder determines one a posteriori prediction mode based on theDocket No.: 24-2055PCTdecoded RAHT coefficients. Examples of determines one a posteriori prediction mode are further described herein with respect to FIGS. 28 and 29 such as with respect to blocks 2810 and 2820 of FIGS. 28 and 29.
[0396] The coder associates the a posteriori prediction mode with the transform (e.g., RAHT, Haar) node. For example, at block 3106, the coder associates the a posteriori prediction mode with the RAHT node. Examples of associating the one a posteriori prediction mode are further described herein with respect to FIGS.28 and 29 such as with respect to block 2830 of FIGS. 28 and 29.
[0397] Embodiments of the present disclosure may be implemented in hardware using analog and / or digital circuits, in software, through the execution of instructions by one or more general purpose or special-purpose processors, or as a combination of hardware and software. Consequently, embodiments of the disclosure may be implemented in the environment of a computer system or other processing system. An example of such a computer system 3200 is shown in FIG. 32. Blocks depicted in the figures above, such as the blocks in FIGS. 1, 6, 10-18, 23-26, 28-29, and 31 may execute on one or more computer systems 3200. Furthermore, each of the steps of the flowcharts depicted in this disclosure may be implemented on one or more computer systems 3200. When more than one computer system 3200 is used to implement embodiments of the present disclosure, the computer systems 3200 may be interconnected by one or more networks to form a cluster of computer systems that may act as a single pool of seamless resources. The interconnected computer systems 3200 may form a "cloud” of computers.
[0398] Computer system 3200 includes one or more processors, such as processor 3204. Processor 3204 may be, for example, a special purpose processor, general purpose processor, microprocessor, or digital signal processor. Processor 3204 may be connected to a communication infrastructure 3202 (for example, a bus or network). Computer system 3200 may also include a main memory 3206, such as random-access memory (RAM), and may also include a secondary memory 3208.
[0399] Secondary memory 3208 may include, for example, a hard disk drive 3210 and / or a removable storage drive 3212, representing a magnetic tape drive, an optical disk drive, or the like. Removable storage drive 3212 may read from and / or write to a removable storage unit 3216 in a well-known manner. Removable storage unit 3216 represents a magnetic tape, optical disk, or the like, which is read by and written to by removable storage drive 3212. As will be appreciated by persons skilled in the relevant art(s), removable storage unit 3216 includes a computer usable storage medium having stored therein computer software and / or data.
[0400] In alternative implementations, secondary memory 3208 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 3200. Such means may include, for example, a removable storage unit 3218 and an interface 3214. Examples of such means may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a thumb drive and USB port, and other removable storage units 3218 and interfaces 3214 which allow software and data to be transferred from removable storage unit 3218 to computer system 3200.Docket No.: 24-2055PCT
[0401] Computer system 3200 may also include a communications interface 3220. Communications interface 3220 allows software and data to be transferred between computer system 3200 and external devices. Examples of communications interface 3220 may include a modem, a network interface (such as an Ethernet card), a communications port, etc. Software and data transferred via communications interface 3220 are in the form of signals which may be electronic, electromagnetic, optical, or other signals capable of being received by communications interface 3220. These signals are provided to communications interface 3220 via a communications path 3222. Communications path 3222 carries signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RE link, and other communications channels.
[0402] Computer system 3200 may also include one or more sensor(s) 3224. Sensor(s) 3224 may measure or detect one or more physical quantities and convert the measured or detected physical quantities into an electrical signal in digital and / or analog form. For example, sensor(s) 3224 may include an eye tracking sensor to track the eye movement of a user. Based on the eye movement of a user, a display of a point cloud may be updated. In another example, sensor(s) 3224 may include a head tracking sensor to the track the head movement of a user. Based on the head movement of a user, a display of a point cloud may be updated. In yet another example, sensor(s) 3224 may include a camera sensor for taking photographs and / or a 3D scanning device, like a laser scanning, structured light scanning, and / or modulated light scanning device.3D scanning devices may determine geometry information by moving one or more laser heads, structured light, and / or modulated light cameras relative to the object or scene being scanned. The geometry information may be used to construct a point cloud.
[0403] As used herein, the terms “computer program medium” and “computer readable medium” are used to refer to tangible storage media, such as removable storage units 3216 and 3218 ora hard disk installed in hard disk drive 3210. These computer program products are means for providing software to computer system 3200. Computer programs (also called computer control logic) may be stored in main memory 3206 and / or secondary memory 3208. Computer programs may also be received via communications interface 3220. Such computer programs, when executed, enable the computer system 3200 to implement the present disclosure as discussed herein. In particular, the computer programs, when executed, enable processor 3204 to implement the processes of the present disclosure, such as any of the methods described herein. Accordingly, such computer programs represent controllers of the computer system 3200.
[0404] In another embodiment, features of the disclosure may be implemented in hardware using, for example, hardware components such as application-specific integrated circuits (ASICs) and gate arrays. Implementation of a hardware state machine to perform the functions described herein will also be apparent to persons skilled in the relevant art(s).
Claims
Docket No.: 24-2055PCTCLAIMS1. A method comprising:decoding, based on a prediction mode associated with a Region Adaptive Hierarchical Transform (RAHT) node of a RAHT tree for a point cloud, RAHT coefficients representative of attributes of the RAHT node; determining one a posteriori prediction mode based on the decoded RAHT coefficients; andassociating the a posteriori prediction mode with the RAHT node.
2. A method comprising:decoding, based on a prediction mode associated with a transform node of a transform tree for a point cloud, transform coefficients representative of attributes of the transform node;determining one a posteriori prediction mode based on the decoded transform coefficients; and associating the a posteriori prediction mode with the transform node.
3. The method of claim 2, wherein:the transform node, the transform tree, and the transform coefficients are a RAHT node, a RAHT tree, and RAHT coefficients, respectively; orthe transform node, the transform tree, and the transform coefficients are a Haar node, a Haar tree, and Haar coefficients, respectively.
4. The method of any one of claims 1-2, wherein the a posteriori prediction mode is determined and associated with the transform node based on the a posteriori prediction mode being activated.
5. The method of claim 4, wherein the a posteriori prediction mode is activated based on at least one prediction mode of a set of prediction modes being omitted from being considered in selecting the prediction mode used in decoding the transform coefficients.
6. The method of any one of claims 4-5, wherein the a posteriori prediction mode is activated based on the prediction mode being inferred.
7. The method of claim 6, wherein the inference of the prediction mode is based on prediction modes associated with already-coded neighboring transform node of the transform node.
8. The method of claim 7, wherein the already-coded neighboring transform nodes of the transform node comprise nodes that share a part of face, a part of an edge, or a vertex with the transform node.
9. The method of any one of claims 7-8, wherein the already-coded neighboring transform nodes of the transform node comprise the parent node of the transform node.
10. The method of any one of claims 7-9, wherein the already-coded neighboring transform nodes of the transform node comprises nodes that are siblings to the parent node of the transform node.
11. The method of any one of claims 4-10, wherein the a posteriori prediction mode is activated based on an average prediction mode being activated.
12. The method of claim 11 , wherein, based on the average prediction mode being activated, the prediction mode is inferred to be a predetermined prediction mode that is an intra prediction mode or an inter prediction mode.Docket No.: 24-2055PCT13. The method of any one of claims 11-12, wherein the average prediction mode is activated based on the depth of the transform node being higher than or equal to a maximum depth for prediction mode coding.
14. The method of any one of claims 11-13, wherein the average prediction mode is activated based on the depth of the transform node being between an upper depth for average mode inclusive and a maximum depth for prediction mode coding inclusive.
15. The method of any one of claims 11-14, wherein the average prediction mode is activated based on the depth of the transform node being between a maximum depth for prediction mode coding exclusive and a lower depth for average mode inclusive.
16. The method of any one of claims 1-15, wherein the posteriori prediction mode is determined as one of a set of prediction modes.
17. The method of claim 16, wherein determining the a posteriori prediction mode comprises:calculating distances for respective prediction modes of the set of prediction modes, each distance being calculated between decoded attribute information of the transform node derived from the decoded transform coefficients and a prediction of the decoded attribute information associated with the transform node derived from a respective prediction mode of the set of prediction modes; anddetermining the a posteriori prediction mode as being the prediction mode with the minimum distance.
18. The method of claim 17, wherein the decoded attribute information of the transform node comprises decoded attributes associated with the transform node derived from the decoded transform coefficients and the prediction of the decoded attribute information of the transform node comprises predicted decoded attributes associated with the RAHT node and derived based on the prediction mode of the set of prediction modes.
19. The method of claim 18, wherein each distance is calculated between the decoded attributes of the transform node and the predicted decoded attributes.
20. The method of any one of claims 17-19, wherein the prediction of the decoded attribute information of the transform node is derived based on decoded attribute information of child transform nodes of the transform node.
21. The method of claim 20, wherein the distance is a sum of errors over the child transform nodes of the transform node, each error being defined between decoded attribute information of a respective child transform node of the child transform nodes and the decoded attribute information associated with the transform node.
22. The method of claim 21 , wherein the sum of errors is a sum of absolute difference or a sum of squared difference 23. The method of any one of claims 20-22, wherein the decoded attribute information of a child transform node of the transform node is the mean attribute of the child transform node.
24. The method of any one of claims 20-23, wherein each error is calculated between the mean attribute of the transform node and a prediction derived based on the mean attribute of a child transform node of the transform node.
25. The method of any one of claims 5-24, wherein the set of prediction modes comprises: at least one intra prediction mode, at least one inter prediction mode, or a null prediction mode.Docket No.: 24-2055PCT26. The method of claim 25, wherein the set of prediction modes comprises at least one intra prediction mode and one inter prediction mode.
27. A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors of an apparatus, cause the apparatus to perform the method of any one of claims 1-26.
28. An encoder comprising:one or more processors; andmemory storing instructions that, when executed by the one or more processors, cause the encoder to perform the method of any one of claims 1 -26.
29. A non-transitory computer-readable recording medium storing a bitstream generated by the method for encoding a video according to any one of claims 1-26.
30. A decoder comprising:one or more processors; andmemory storing instructions that, when executed by the one or more processors, cause the decoder to perform the method of any one of claims 1 -26.
31. A non-transitory computer readable medium storing a bitstream, which, when decoded by a decoder, causes the decoder to perform the method according to any one of claims 1-26.
32. A bitstream generated according to any one of claims 1-26.