Performance improvements in Geometry Point Cloud Compression (GPCC) planar mode using interpretation

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By deriving motion information from reference blocks and using context-adaptive coding, the coding performance of planar modes in point cloud compression is enhanced, addressing inefficiencies in existing G-PCC standards.

JP7874108B2Active Publication Date: 2026-06-15QUALCOMM INC

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: QUALCOMM INC
Filing Date: 2022-04-15
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Existing point cloud compression technologies, such as the Geometry Point Cloud Compression (G-PCC) standard, face challenges in efficiently leveraging geometric information from reference blocks to improve coding performance, particularly in planar modes, leading to suboptimal compression efficiency.

Method used

The proposed solution involves deriving motion information from reference blocks to enhance the correlation between current and reference nodes, using context-adaptive coding to determine whether a node is coded in planar mode based on the planar information of the reference block, thereby improving coding performance.

Benefits of technology

This approach enhances coding efficiency by leveraging geometric information from reference blocks, leading to improved compression performance in planar modes.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007874108000056
Figure 0007874108000057
Figure 0007874108000058

Patent Text Reader

Abstract

An exemplary device for processing a point cloud includes a memory configured to store at least a portion of the point cloud and one or more processors implemented in a circuit, where the one or more processors are configured to obtain planarity information of a reference block of the point cloud, determine a context based on the planarity information of the reference block, context-adaptively code a syntax element indicating whether a current node is coded using the planar mode based on the context, and code the current node using the planar mode based on the current node being coded using the planar mode.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001]

[0001] This application claims priority to U.S. Patent Application No. 17 / 659,219, filed Apr. 14, 2022, and U.S. Provisional Application No. 63 / 176,098, filed Apr. 16, 2021, the entire contents of each of which are incorporated herein by reference. U.S. Patent Application No. 17 / 659,219, filed Apr. 14, 2022, claims the benefit of U.S. Provisional Application No. 63 / 176,098, filed Apr. 16, 2021.

[0002]

[0002] This disclosure relates to point cloud encoding and decoding.

Brief Description of the Drawings

[0003] [Figure 1]

[0003] A block diagram showing an exemplary encoding and decoding system that can implement the techniques of this disclosure. [Figure 2]

[0004] A block diagram showing an exemplary geometry point cloud compression (G-PCC) encoder. [Figure 3]

[0005] A block diagram showing an exemplary G-PCC decoder. [Figure 4]

[0006] A flowchart showing an exemplary motion estimation technique for InterEM. [Figure 5]

[0007] A flowchart showing an exemplary technique for estimating local node motion vectors. [Figure 6]

[0008] A conceptual diagram showing an exemplary range-finding system that can be used with one or more techniques of this disclosure. [Figure 7]

[0009] A conceptual diagram showing an exemplary vehicle-based scenario in which one or more techniques of this disclosure can be used. [Figure 8]

[0010] A conceptual diagram illustrating an exemplary extended reality system in which one or more of the techniques of this disclosure may be used. [Figure 9]

[0011] A conceptual diagram illustrating an exemplary mobile device system in which one or more of the techniques of this disclosure may be used. [Figure 10A]

[0012] A conceptual diagram illustrating the range update process in binary arithmetic coding. [Figure 10B] A conceptual diagram illustrating the range update process in binary arithmetic coding. [Figure 11]

[0013] A conceptual diagram illustrating the output process in binary arithmetic coding. [Figure 12]

[0014] A block diagram showing the context-adaptive binary arithmetic coder in the G-PCC encoder. [Figure 13]

[0015] A block diagram showing the context-adaptive binary arithmetic coder in the G-PCC decoder. [Figure 14]

[0016] A flowchart illustrating an exemplary technique for predicting points in a point cloud according to one or more aspects of this disclosure. [Overview of the project]

[0004]

[0017] In general, this disclosure describes techniques for coding point cloud nodes using interprediction, including the currently under-development Geometry Point Cloud Compression (G-PCC) standard. However, the exemplary techniques are not limited to the G-PCC standard. A node's reference block may be derived by motion compensation using estimated motion information (rotation and translation). Good estimation of motion information can lead to a high correlation between the current node and the reference node regarding geometric structure, such as occupancy and planar information. Therefore, leveraging this geometric information of the reference node can improve the coding performance of the current node. This disclosure includes several techniques for leveraging information from the reference block in coding the planar information of the current node. In general, this information can be used in node eligibility for planar coding modes, and in selecting context when coding planar flags and planar indices.

[0005]

[0018] According to one or more techniques of the present disclosure, a G-PCC coder may include a memory configured to store at least a portion of a point cloud, and one or more processors implemented in the circuit, the one or more processors configured to retrieve planar information of a reference block in the point cloud, determine a context based on the planar information of the reference block, context-adaptive coding of syntax elements indicating whether the current node is coded using planar mode based on the context, and code the current node using planar mode based on whether the current node is coded using planar mode.

[0006]

[0019] For example, a method for processing a point cloud includes obtaining planar information for the reference blocks of the point cloud, determining the context based on the planar information of the reference blocks, context-adaptive coding of a syntax element indicating whether the current node is coded using planar mode based on the context, and coding the current node using planar mode based on the fact that the current node is coded using planar mode.

[0007]

[0020] In another example, a computer-readable storage medium stores instructions that, when executed by one or more processors, cause one or more processors to: obtain planar information of a reference block in a point cloud; determine a context based on the planar information of the reference block; context-adaptive coding of a syntax element indicating whether the current node is coded using planar mode based on the context; and code the current node using planar mode based on the fact that the current node is coded using planar mode.

[0008]

[0021] Details of one or more examples are described in the accompanying drawings and the following description. Other features, purposes, and advantages will become apparent from the description, drawings, and claims. [Modes for carrying out the invention]

[0009]

[0022] Figure 1 is a block diagram showing an exemplary encoding and decoding system 100 capable of implementing the techniques of the present disclosure. The techniques of the present disclosure generally concern coding (encoding and / or decoding) point cloud data, i.e., supporting point cloud compression. Generally, point cloud data includes any data for processing a point cloud. Coding may be effective in compressing and / or decompressing point cloud data.

[0010]

[0023] As shown in Figure 1, system 100 includes a source device 102 and a destination device 116. The source device 102 provides encoded point cloud data to be decoded by the destination device 116. In detail, in the example of Figure 1, the source device 102 provides the point cloud data to the destination device 116 via a computer-readable medium 110. The source device 102 and destination device 116 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, ground or marine vehicles, spacecraft, aircraft, robots, LIDAR devices, satellites, and the like. In some cases, the source device 102 and destination device 116 may be equipped for wireless communication.

[0011]

[0024] In the example in Figure 1, the source device 102 includes a data source 104, a memory 106, a G-PCC encoder 200, and an output interface 108. The destination device 116 includes an input interface 122, a G-PCC decoder 300, a memory 120, and a data consumer 118. According to this disclosure, the G-PCC encoder 200 of the source device 102 and the G-PCC decoder 300 of the destination device 116 may be configured to apply the techniques of this disclosure relating to utilizing information from a reference block in coding the planar information of the current node (e.g., the current block). Thus, the source device 102 represents an example of an encoding device, while the destination device 116 represents an example of a decoding device. In other examples, the source device 102 and the destination device 116 may include other components or configurations. For example, the source device 102 may receive data (e.g., point cloud data) from an internal or external source. Similarly, the destination device 116 may interface with an external data consumer rather than containing a data consumer within the same device.

[0012]

[0025] The system 100 shown in FIG. 1 is merely an example. Generally, other digital encoding and / or decoding devices may implement the techniques of the present disclosure related to utilizing the information of reference blocks in the coding of planar information of current nodes. The source device 102 and the destination device 116 are merely examples of devices such that the source device 102 generates coded data for transmission to the destination device 116. The present disclosure refers to a "coding" device as a device that performs coding (encoding and / or decoding) of data. Thus, the G-PCC encoder 200 and the G-PCC decoder 300 represent examples of coding devices, particularly an encoder and a decoder, respectively. In some examples, the source device 102 and the destination device 116 may operate substantially symmetrically such that each of the source device 102 and the destination device 116 includes an encoding component and a decoding component. Thus, the system 100 may support unidirectional or bidirectional transmission between the source device 102 and the destination device 116 for, for example, streaming, playback, broadcasting, telephony, navigation, and other applications.

[0013]

[0026] Generally, data source 104 represents the source of data (i.e., raw, unencoded point cloud data) and may provide a continuous series of “frames” of data to the G-PCC encoder 200, which then encodes the data for each frame. The data source 104 of source device 102 may include point cloud capture devices, such as any of various cameras or sensors, e.g., a 3D scanner or light detection and ranging (LIDAR) device, one or more video cameras, an archive containing previously captured data, and / or a data feed interface for receiving data from a data content provider. Alternatively or additionally, point cloud data may be computer-generated from scanner, camera, sensor, or other data. For example, data source 104 may generate computer graphics-based data as source data, or it may produce a combination of live data, archived data, and computer-generated data. In each case, the G-PCC encoder 200 encodes the captured data, pre-captured data, or computer-generated data. The G-PCC encoder 200 can rearrange the frames from the reception order (sometimes called the “display order”) to the coding order for coding. The G-PCC encoder 200 can generate one or more bitstreams containing encoded data. The source device 102 can then output the encoded data onto a computer-readable medium 110 via the output interface 108 for reception and / or retrieval by, for example, the input interface 122 of the destination device 116.

[0014]

[0027] Memory 106 of source device 102 and memory 120 of destination device 116 may represent general-purpose memory. In some examples, memory 106 and memory 120 may store raw data, for example, raw data from data source 104 and raw decoded data from G-PCC decoder 300. Additionally or alternatively, memory 106 and memory 120 may store, for example, software instructions executable by G-PCC encoder 200 and G-PCC decoder 300, respectively. Although memory 106 and memory 120 are shown separately from G-PCC encoder 200 and G-PCC decoder 300 in this example, it should be understood that G-PCC encoder 200 and G-PCC decoder 300 may also include internal memory for functionally similar or equivalent purposes. Furthermore, memory 106 and memory 120 may store encoded data, for example, output from G-PCC encoder 200 and input to G-PCC decoder 300. In some examples, portions of memory 106 and memory 120 may be allocated as one or more buffers to store, for example, raw, decoded, and / or encoded data. For example, memory 106 and memory 120 may store data representing a point cloud.

[0015]

[0028] Computer-readable medium 110 can represent any type of medium or device capable of transporting encoded data from source device 102 to destination device 116. In one example, computer-readable medium 110 represents a communication medium that enables source device 102 to directly transmit encoded data to destination device 116 in real time via, for example, a radio frequency network or a computer-based network. Output interface 108 can modulate a transmission signal including the encoded data, and input interface 122 can demodulate the received transmission signal according to a communication standard such as a wireless communication protocol. The communication medium can comprise any wireless or wired communication medium, such as the radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium can include a router, a switch, a base station, or any other device that may be useful for facilitating communication from source device 102 to destination device 116.

[0016]

[0029] In some examples, source device 102 can output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 can access encoded data from storage device 112 via input interface 122. Storage device 112 can include any of a variety of distributed or locally accessible data storage media, such as a hard drive, a Blu-ray Disc, a DVD, a CD-ROM, a flash memory, a volatile or non-volatile memory, or any other suitable digital storage media for storing encoded data.

[0017]

[0030] In some examples, source device 102 may output encoded data to a file server 114 or another intermediate storage device capable of storing the encoded data generated by source device 102. Destination device 116 may access the stored data from file server 114 via streaming or download. File server 114 can be any type of server device capable of storing encoded data and transmitting that encoded data to destination device 116. File server 114 may represent a web server (for example, a website), a File Transfer Protocol (FTP) server, a Content Delivery Network device, or a Network Attached Storage (NAS) device. Destination device 116 may access the encoded data from file server 114 through any standard data connection, including an Internet connection. This may include wireless channels (e.g., Wi-Fi® connection), wired connections (e.g., Digital Subscriber Line (DSL), cable modem, etc.), or a combination of both, which are suitable for accessing the encoded data stored in file server 114. The file server 114 and the input interface 122 may be configured to operate according to a streaming transmission protocol, a download transmission protocol, or a combination thereof.

[0018]

[0031] The output interface 108 and input interface 122 may represent a wireless transmitter / receiver, modem, wired networking component (e.g., an Ethernet® card), a wireless communication component operating according to any of the various IEEE 802.11 standards, or other physical components. In examples where the output interface 108 and input interface 122 include wireless components, the output interface 108 and input interface 122 may be configured to transfer data, such as encoded data, according to cellular communication standards such as 4G, 4G-LTE® (Long-Term Evolution), LTE Advanced, or 5G. In some examples where the output interface 108 includes a wireless transmitter, the output interface 108 and input interface 122 may be configured to transfer data, such as encoded data, according to other wireless standards such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., ZigBee®), or the Bluetooth® standard. In some examples, the source device 102 and / or destination device 116 may include their respective system-on-chip (SoC) devices. For example, the source device 102 may include an SoC device for performing functions resulting from the G-PCC encoder 200 and / or the output interface 108, and the destination device 116 may include an SoC device for performing functions resulting from the G-PCC decoder 300 and / or the input interface 122.

[0019]

[0032] The techniques of this disclosure can be applied to encoding and decoding that support any of a variety of applications, such as communication between autonomous vehicles, communication between scanners, cameras, sensors and processing devices such as local or remote servers, geographical mapping, or other applications.

[0020]

[0033] The input interface 122 of the destination device 116 receives an encoded bitstream from a computer-readable medium 110 (e.g., a communication medium, a storage device 112, a file server 114, etc.). The encoded bitstream may contain signaling information defined by the G-PCC encoder 200, which is also used by the G-PCC decoder 300, such as syntax elements having values that describe the characteristics and / or processing of encoded units (e.g., slices, pictures, picture groups, sequences, etc.). The data consumer 118 uses the decoded data. For example, the data consumer 118 may use the decoded data to determine the location of a physical object. In some examples, the data consumer 118 may have a display for presenting an image based on a point cloud.

[0021]

[0034] The G-PCC encoder 200 and the G-PCC decoder 300 can each be implemented as one or more suitable encoder and / or decoder circuits, or any combination thereof, including one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, etc. When the technique is partially implemented in software, the device may store instructions for the software in a suitable non-temporary computer-readable medium and execute those instructions in hardware using one or more processors to perform the technique of the Disclosure. Each of the G-PCC encoder 200 and the G-PCC decoder 300 may be included in one or more encoders or decoders, and any of them may be integrated as part of a composite encoder / decoder (codec) in the respective device. A device including the G-PCC encoder 200 and / or G-PCC decoder 300 may comprise one or more integrated circuits, microprocessors, and / or other types of devices.

[0022]

[0035] The G-PCC encoder 200 and G-PCC decoder 300 may operate according to coding standards such as the Video Point Cloud Compression (V-PCC) standard or the Geometry Point Cloud Compression (G-PCC) standard. This disclosure may refer to the coding of a picture (e.g., encoding and decoding) in general to include the process of encoding or decoding data. The encoded bitstream generally contains a set of values for syntax elements that represent coding decisions (e.g., coding modes).

[0023]

[0036] This disclosure may generally refer to “signaling” certain information, such as syntax elements. The term “signaling” may generally refer to the communication of values about syntax elements and / or other data used to decode encoded data. That is, the G-PCC encoder 200 may signal values about syntax elements in a bitstream. Generally, signaling refers to generating values in a bitstream. As described above, the source device 102 may transport the bitstream to the destination device 116, either substantially in real time or not in real time, as can happen when the source device 102 stores the syntax elements in the storage device 112 for later retrieval by the destination device 116.

[0024]

[0037] ISO / IEC MPEG (JTC1 / SC29 / WG11) is studying the potential need to standardize point cloud coding techniques that offer significantly greater compression capabilities than current methods, and aims to create such a standard. The group is collaborating on this exploration in a collaborative effort known as the 3D Graphics Team (3DG) to evaluate compression technique designs proposed by experts in the field.

[0025]

[0038] Point cloud compression activities fall into two distinct categories. The first is "video point cloud compression" (V-PCC), which segments a 3D object and projects the segments onto multiple 2D planes (represented as "patches" in a 2D frame), which are then further coded by legacy 2D video codecs such as the High Efficiency Video Coding (HEVC) (ITU-T H.265) codec. The second is "geometry-based point cloud compression" (G-PCC), which directly compresses 3D geometry, i.e., the positions of a set of points in 3D space, and the associated attribute values (for each point associated with the 3D geometry). G-PCC addresses point cloud compression in both Category 1 (static point clouds) and Category 3 (dynamically collected point clouds). The latest draft of the G-PCC standard is available in G-PCC DIS, ISO / IEC JTC1 / SC29 / WG11 w19088, Brussels, Belgium, January 2020, and the codec description is available in G-PCC Codec Description v6, ISO / IEC JTC1 / SC29 / WG11 w19091, Brussels, Belgium, January 2020.

[0026]

[0039] A point cloud contains a set of points in 3D space, and these points may have associated attributes. These attributes may be color information such as R, G, B, or Y, Cb, Cr, or reflectance information, or other attributes. Point clouds can be captured by various cameras or sensors, such as LIDAR sensors and 3D scanners, and can also be computer-generated. Point cloud data is used in a variety of applications, including, but not limited to, architecture (modeling), graphics (3D models for visualization and animation), and the automotive industry (LIDAR sensors used to aid navigation).

[0027]

[0040] The 3D space occupied by point cloud data can be enclosed by a virtual bounding box. The positions of points within the bounding box can be represented by a certain precision, and therefore the positions of one or more points can be quantized based on that precision. At the smallest level, the bounding box is divided into voxels, which are the smallest units of space represented by a unit cube. A voxel within the bounding box can be associated with zero, one, or two or more points. The bounding box can be divided into multiple cubic / rectangular regions, sometimes called tiles. Each tile can be coded into one or more slices. The division of the bounding box into slices and tiles can be based on the number of points in each division, or on other considerations (e.g., certain regions can be coded as tiles). Slice regions can be further divided using division decisions similar to those in video codecs.

[0028]

[0041] Figure 2 provides an overview of the G-PCC encoder 200. Figure 3 provides an overview of the G-PCC decoder 300. The illustrated modules are logical and do not necessarily have a one-to-one correspondence with the G-PCC codec, i.e., the code implemented in the reference implementation of the TMC13 test model software studied by ISO / IEC MPEG (JTC1 / SC29 / WG11).

[0029]

[0042] In both the G-PCC encoder 200 and the G-PCC decoder 300, the point cloud location is coded first. Attribute coding depends on the decoded geometry. In Figures 2 and 3, the gray shaded module is the option typically used for Category 1 data. The diagonally striped module is the option typically used for Category 3 data. All other modules are common between Category 1 and Category 3.

[0030]

[0043] For Category 3 data, the compressed geometry is typically represented as an octree extending from the root to the leaf level of individual voxels. For Category 1 data, the compressed geometry is typically represented by a pruned octree (i.e., an octree extending from the root to the leaf level of blocks larger than voxels) and a model approximating the surface within each leaf of the pruned octree. Thus, both Category 1 and Category 3 data share an octree coding mechanism, although Category 1 data can further approximate voxels within each leaf with a surface model. The surface model used is a triangulation with 1 to 10 triangles per block, resulting in a triangle soup. Therefore, the Category 1 geometry codec is known as the Trisoup geometry codec, and the Category 3 geometry codec is known as the octree geometry codec.

[0031]

[0044] For each node in the octree, the occupancy rate is signaled (if not inferred) for one or more of its child nodes (up to eight nodes). Multiple neighborhoods are specified, including (a) nodes that share a face with the current octree node, and (b) nodes that share a face, edge, or vertex with the current octree node. Within each neighborhood, the occupancy rates of nodes and / or their children may be used to predict the occupancy rate of the current node or its children. For points sparsely distributed across some nodes in the octree, the codec also supports a direct coding mode in which the 3D location of the points is directly coded. A flag may be signaled to indicate that direct mode is being signaled. At the lowest level, the number of points associated with an octree node / leaf node may also be coded.

[0032]

[0045] When geometry is coded, attributes corresponding to the geometry points are coded. When there are multiple attribute points corresponding to a single reconstructed / decoded geometry point, an attribute value representing the reconstructed point can be derived.

[0033]

[0046] G-PCC offers three attribute coding methods: Region Adaptive Hierarchical Transform (RAHT) coding, interpolation-based hierarchical nearest neighbor prediction (predictive transformation), and interpolation-based hierarchical nearest neighbor prediction with update / lifting steps (lifting transformation). RAHT and lifting are typically used for Category 1 data, while prediction is typically used for Category 3 data. However, any method can be used for arbitrary data, and just as with geometry codecs in G-PCC, the attribute coding method used to code a point cloud is specified in the bitstream.

[0034]

[0047] Attribute coding can be performed at the Level of Detail (LOD), and with each level of detail, a finer representation of the point cloud attributes can be obtained. Each level of detail can be specified based on a distance metric from neighboring nodes or based on sampling distance.

[0035]

[0048] In the G-PCC encoder 200, the residuals obtained as the output of the coding method for attributes are quantized. The residuals can be obtained by subtracting the attribute values from the predictions derived based on points in the neighborhood of the current point and on the attribute values of previously coded points. The quantized residuals can be coded using context-adaptive arithmetic coding.

[0036]

[0049] In the example shown in Figure 2, the G-PCC encoder 200 may include a coordinate transformation unit 202, a color transformation unit 204, a voxelization unit 206, an attribute transfer unit 208, an octree analysis unit 210, a surface approximation analysis unit 212, an arithmetic coding unit 214, a geometry reconstruction unit 216, a RAHT unit 218, an LOD generation unit 220, a lifting unit 222, a coefficient quantization unit 224, and an arithmetic coding unit 226.

[0037]

[0050] As shown in the example in Figure 2, the G-PCC encoder 200 can obtain a set of point locations and a set of attributes in a point cloud. The G-PCC encoder 200 can obtain a set of point locations and a set of attributes in a point cloud from the data source 104 (Figure 1). Locations may include the coordinates of the points in the point cloud. Attributes may include information about the points in the point cloud, such as the color associated with the points in the point cloud. The G-PCC encoder 200 can generate a geometry bitstream 203 containing an encoded representation of the point locations in the point cloud. The G-PCC encoder 200 can also generate an attribute bitstream 205 containing an encoded representation of the set of attributes.

[0038]

[0051] The coordinate transformation unit 202 may apply a transformation to the coordinates of a point in order to transform the coordinates from an initial region to a transformation region. In this disclosure, the transformed coordinates may be referred to as transformed coordinates. The color transformation unit 204 may apply a transformation to transform the color information of an attribute to a different region. For example, the color transformation unit 204 may transform color information from the RGB color space to the YCbCr color space.

[0039]

[0052] Furthermore, in the example in Figure 2, the voxelization unit 206 may voxelize the transformed coordinates. Voxelization of the transformed coordinates may involve quantization and the removal of some points from the point cloud. In other words, multiple points in the point cloud may be contained within a single "voxel," which can then be treated as a single point in some respects. Furthermore, the octree analysis unit 210 may generate an octree based on the voxelized transformed coordinates. Furthermore, in the example in Figure 2, the surface approximation analysis unit 212 may analyze the points to potentially determine the surface representation of the set of points. The arithmetic coding unit 214 may entropy code the syntax elements representing the octree and / or surface information determined by the surface approximation analysis unit 212. The G-PCC encoder 200 may output these syntax elements in the geometry bitstream 203. The geometry bitstream 203 may also include other syntax elements, including syntax elements that are not arithmetically coded.

[0040]

[0053] The geometry reconstruction unit 216 may reconstruct the transformed coordinates of points in the point cloud based on the octree, surface approximation analysis unit 212, and / or other information. The number of transformed coordinates reconstructed by the geometry reconstruction unit 216 may differ from the original number of points in the point cloud due to voxelization and surface approximation. The obtained points may be referred to as reconstructed points in this disclosure. The attribute transfer unit 208 may transfer the attributes of the original points in the point cloud to the reconstructed points in the point cloud.

[0041]

[0054] Furthermore, the RAHT unit 218 can apply RAHT coding to the attributes of the reconstructed points. In some examples, the attributes of a 2x2x2 point location block are taken and transformed along one direction to obtain four low (L) frequency nodes and four high (H) frequency nodes under RAHT. The four low frequency nodes (L) are then transformed along a second direction to obtain two low (LL) frequency nodes and two high (LH) frequency nodes. The two low frequency nodes (LL) are then transformed along a third direction to obtain one low (LLL) frequency node and one high (LLH) frequency node. The low frequency node LLL corresponds to the DC coefficient, and the high frequency nodes H, LH, and LLH correspond to the AC coefficient. The transformation in each direction may be a 1-D transformation with two coefficient weights. Low-frequency coefficients can be taken as coefficients for a 2x2x2 block for the next higher level of the RAHT transform, while AC coefficients are encoded without modification, and such transforms continue up to the top root node. Tree traversal for encoding is used from top to bottom to calculate the weights that should be used for those coefficients, and the transform order is from bottom to top. Those coefficients can then be quantized and coded.

[0042]

[0055] Alternatively or additionally, the LOD generation unit 220 and the lifting unit 222 may apply LOD processing and lifting, respectively, to the attributes of the reconstructed points. LOD generation is used to divide the attributes into different refinement levels. Each refinement level provides refinement of the attributes of the point cloud. The first refinement level provides a coarse approximation and contains a small number of points, subsequent refinement levels typically contain more points, and so on. The refinement levels may be constructed using distance-based metrics, or one or more other classification criteria (e.g., subsampling from a particular order) may also be used. Thus, all reconstructed points may be included in a refinement level. Each level of detail is created by taking the set (union) of all points up to a particular refinement level, for example, LOD1 is obtained based on refinement level RL1, LOD2 is obtained based on RL1 and RL2, ... LODN is obtained by the set of RL1, RL2, ... RLN. In some cases, LOD generation may be followed by a prediction mechanism (e.g., a predictive transformation), where the attributes associated with each point in the LOD are predicted from a weighted average of preceding points, and the residuals are quantized and entropy-coded. A lifting mechanism is built on top of the predictive transformation mechanism, where an update operator is used to update the coefficients, and adaptive quantization of the coefficients is performed.

[0043]

[0056] The RAHT unit 218 and the lifting unit 222 can generate coefficients based on attributes. The coefficient quantization unit 224 can quantize the coefficients generated by the RAHT unit 218 or the lifting unit 222. The arithmetic coding unit 226 can apply arithmetic coding to the syntax elements representing the quantization coefficients. The G-PCC encoder 200 can output these syntax elements in the attribute bitstream 205. The attribute bitstream 205 may also include other syntax elements, including syntax elements that are not arithmetically coded.

[0044]

[0057] In the example shown in Figure 3, the G-PCC decoder 300 may include a geometry arithmetic decoding unit 302, an attribute arithmetic decoding unit 304, an octree synthesis unit 306, an inverse quantization unit 308, a surface approximation synthesis unit 310, a geometry reconstruction unit 312, a RAHT unit 314, an LoD generation unit 316, an inverse lifting unit 318, an inverse transform coordinate unit 320, and an inverse transform color unit 322.

[0045]

[0058] The G-PCC decoder 300 can obtain a geometry bitstream 203 and an attribute bitstream 205. The geometry arithmetic decoding unit 302 of the decoder 300 can apply arithmetic decoding (for example, context-adaptive binary arithmetic coding (CABAC) or other types of arithmetic decoding) to syntax elements in the geometry bitstream 203. Similarly, the attribute arithmetic decoding unit 304 can apply arithmetic decoding to syntax elements in the attribute bitstream 205.

[0046]

[0059] The octree synthesis unit 306 can synthesize octrees based on syntax elements parsed from the geometry bitstream 203. Starting with the root node of the octree, the occupation of each of the eight child nodes at each octree level is signaled in the bitstream. When the signaling indicates that a child node at a particular octree level is occupied, the occupation of this child node's children is signaled. The signaling of nodes at each octree level is signaled before proceeding to subsequent octree levels. At the final level of the octree, each node corresponds to a voxel position, and when a leaf node is occupied, one or more points may be designated as occupied at the voxel position. In some cases, some branches of the octree may terminate before the final level due to quantization. In such cases, leaf nodes are considered occupied nodes that have no child nodes. In cases where surface approximation is used in the geometry bitstream 203, the surface approximation synthesis unit 310 can determine the surface model based on the syntax elements parsed from the geometry bitstream 203 and on the octree.

[0047]

[0060] Furthermore, the geometry reconstruction unit 312 may perform reconstruction to determine the coordinates of points in the point cloud. For each position in the leaf nodes of the octree, the geometry reconstruction unit 312 may reconstruct the node position by using the binary representation of the leaf nodes in the octree. At each respective leaf node, the number of points at each leaf node is signaled, which indicates the number of duplicate points at the same voxel position. When geometry quantization is used, the point positions are scaled to determine the reconstructed point position values.

[0048]

[0061] The inverse transform coordinate unit 320 can apply an inverse transform to the reconstructed coordinates in order to convert the reconstructed coordinates (positions) of points in the point cloud back from the transformed region to the initial region. The position of a point in the point cloud may be in the floating-point region, while the point position in the G-PCC codec is coded in the integer region. The inverse transform can be used to convert the position back to the original region.

[0049]

[0062] Furthermore, in the example in Figure 3, the inverse quantization unit 308 can inverse quantize the attribute values. The attribute values can be obtained based on syntax elements obtained from the attribute bitstream 205 (which includes, for example, syntax elements decoded by the attribute arithmetic decoding unit 304).

[0050]

[0063] Depending on how the attribute values are encoded, the RAHT unit 314 may perform RAHT coding to determine the color values for the points in the point cloud based on the inversely quantized attribute values. RAHT decoding is performed from the top to the bottom of the tree. At each level, the low-frequency and high-frequency coefficients derived from the inversely quantization process are used to derive component values. At the leaf nodes, the derived values correspond to the attribute values of those coefficients. The weight derivation process for the points is similar to the process used in the G-PCC encoder 200. Alternatively, the LOD generation unit 316 and the inverse lifting unit 318 may determine the color values for the points in the point cloud using level-of-detail based techniques. The LOD generation unit 316 decodes each LOD, which gives a progressively finer representation of the point's attributes. Using predictive transformations, the LOD generation unit 316 derives point predictions from the weighted sum of points that were in the previous LOD or previously reconstructed in the same LOD. The LOD generation unit 316 may add predictions to the residuals (obtained after inverse quantization) to obtain the reconstructed values of the attributes. When a lifting scheme is used, the LOD generation unit 316 may also include an update operator to update the coefficients used to derive the attribute values. In this case, the LOD generation unit 316 may also apply inverse adaptive quantization.

[0051]

[0064] Furthermore, in the example in Figure 3, the inverse color conversion unit 322 may apply an inverse color conversion to the color values. The inverse color conversion may be the reverse of the color conversion applied by the color conversion unit 204 of the encoder 200. For example, the color conversion unit 204 may convert color information from the RGB color space to the YCbCr color space. Therefore, the inverse color conversion unit 322 may convert color information from the YCbCr color space to the RGB color space.

[0052]

[0065] The various units in Figures 2 and 3 are shown to aid in understanding the operations performed by the encoder 200 and decoder 300. The units may be implemented as fixed-function circuits, programmable circuits, or a combination thereof. A fixed-function circuit refers to a circuit that provides a specific function and is pre-configured with respect to the operations it may perform. A programmable circuit refers to a circuit that can be programmed to perform various tasks and to provide flexible functionality in the operations it may perform. For example, a programmable circuit may execute software or firmware that operates the programmable circuit in a manner defined by software or firmware instructions. A fixed-function circuit may execute software instructions (e.g., to receive or output parameters), but the type of operation performed by a fixed-function circuit is generally immutable. In some examples, one or more of the units may be separate circuit blocks (fixed-function or programmable), and in some examples, one or more of the units may be integrated circuits.

[0053]

[0066] The planar coding mode, first proposed in (Sebastien Lasserre, David Flynn, "[GPCC] Planar mode in octree-based geometry coding", ISO / IEC JTC1 / SC29 / WG11 MPEG / m48906, Gothenburg, Sweden, July 2019), was adopted at the 128th MPEG Conference in Geneva, Switzerland ("Sebastien Lasserre, Jonathan Taquet, "[GPCC] CE13.22 report on planar coding mode", ISO / IEC JTC1 / SC29 / WG11 MPEG / m50008, Geneva, Switzerland, October 2019). The angular coding mode, first proposed in Sebastien Lasserre and Jonathan Taquet, "[GPCC][CE 13.22 related] An improvement of the planar coding mode," ISO / IEC JTC1 / SC29 / WG11 MPEG / m50642, Geneva, Switzerland, October 2019, hereafter referred to as "m50642"), was adopted at the 129th MPEG Conference in Brussels, Belgium (Sebastien Lasserre and Jonathan Taquet, "[GPCC] CE 13.22 report on angular mode," ISO / IEC JTC1 / SC29 / WG11 MPEG / m51594, Brussels, Belgium, January 2020, hereafter referred to as "m51594"), improves the coding efficiency of the planar mode by using the sensor characteristics of a typical LIDAR sensor. The angle coding mode is optionally used in conjunction with the planar mode to improve the coding of vertical (z) planar position syntax elements by employing knowledge about the position and angle at which the laser beam is detected in a typical LIDAR sensor. Furthermore, the angle coding mode can optionally be used to improve the coding of the vertical z position bit in the IDCM.In a separate contribution (Geert Van der Auwera, Bappaditya Ray, Louis Kerofsky, Adarsh K. Ramasubramonian, Marta Karczewicz, "[GPCC][New Proposal] Angular mode simplifications and HLS refinements," ISO / IEC JTC1 / SC29 / WG11 MPEG / m53693, remote conference (formerly Alpbach conference), April 2020), the context derivation of the angular coding mode was simplified and the HLS coding of sensor data parameters was made more efficient. The description of the angular mode in the following sections is based on the original MPEG contribution documents [m50642, m51594] and the GPCC DIS text (G-PCC DIS, ISO / IEC JTC1 / SC29 / WG11 w19617, remote conference, November 2020, hereinafter referred to as "GPCC DIS").

[0054]

[0067] The azimuthal coding mode, first proposed in (Sebastien Lasserre, Jonathan Taquet, "[GPCC] [CE13.22 related] The azimuthal coding mode", ISO / IEC JTC1 / SC29 / WG11 MPEG / m51596, Brussels, Belgium, January 2020, hereafter referred to as "m51596"), was adopted at the 130th MPEG Remote Conference (Sebastien Lasserre, Jonathan Taquet, "[GPCC] [CE 13.22] Report on azimuthal coding mode", ISO / IEC JTC1 / SC29 / WG11 MPEG / m52958, Remote Conference (formerly Alpbach Conference), April 2020, hereafter referred to as "m52958"). The azimuthal coding mode is similar to the angular mode, extending the angular mode to the coding of the (x) and (y) planar position syntax elements of the planar mode and improving the coding of x or y position bits in IDCM. In a separate contribution to the 131st MPEG Remote Conference (Geert Van der Auwera, Bappaditya Ray, Adarsh K. Ramasubramonian, Marta Karczewicz, "[GPCC][New Proposal] Planar and azimuthal coding mode simplifications," ISO / IEC JTC1 / SC29 / WG11 MPEG / m54694, Remote Conference, July 2020, hereafter referred to as "m54694"), the number of contexts used in the azimuthal mode was significantly reduced.

[0055]

[0068] Note: In the following sections, "angle mode" may also refer to azimuth mode.

[0056]

[0069] The specifications related to the planar coding mode are summarized in the GPCC DIS as follows: 8.2.3.1 Node Eligibility for Planar Coding Mode XXX Split and Rearrange [XXX, This process lacks planar rate updates after decoding is_planar_flag] The explicit coding of the occupied plane is conditional on the probability of XXX. For k=0..2, the array PlanarRate[k] is an estimate of the probability that the node occupancy forms a single plane perpendicular to the k-th axis. The variable LocalDensity is an estimate of the average number of occupied children in a node. The variable NumNodesUntilPlanarUpdate counts the number of nodes that will be parsed before updating PlanarRate and LocalDensity. [XXX Entropy state continues] When parsing the geometry_octree syntax structure, PlanarRate and LocalDensity are initialized as follows:

[0057]

number

[0058] At the start of parsing each geometry_octree_node syntax structure, NumNodesUntilPlanarUpdate is decremented. If NumNodesUntilPlanarUpdate is less than 0, PlanarRate and LocalDensity are updated as follows:

[0059] The number of occupied sibling nodes is determined and used to update the LocalDensity estimate.

[0060]

number

[0061] The number of nodes until the next update is as follows:

[0062]

number

[0063] The occupancy information of the parent node is used to determine the existence of a single occupied plane along each axis and to update the corresponding plane probability estimate PlanarRate[k].

[0064]

number

[0065] At the start of parsing the geometry_octree_node syntax structure, for each axis, it is determined whether the current node is eligible to signal planar information. The output of this process is an array PlanarEligible with elements PlanarEligible[k] for k=0..2. First, PlanarRate is used to determine the order of the three planes, planeOrder[k], from most likely to least likely, according to Table 18. Next, PlanarEligible is set as follows:

[0066]

number

[0067] [Table 1]

[0068] Syntax elements can be signaled within the bitstream: an is_planar_flag[axisIdx] equal to 1 indicates that the positions of the current node's children form a single plane perpendicular to the axisIdx-th axis. An is_planar_flag[axisIdx] equal to 0 indicates that, when present, the positions of the current node's children occupy both planes perpendicular to the axisIdx-th axis. The context index (ctxIdx) for coding is_planar_flag is specified in Table 37 of the GPCC DIS, where it is set to be equal to axisIdx. 8.2.3.2 Buffers that track the nearest node along the axis The arrays PlanarPrevPos, PlanarPlane, and IsPlanarNode record information about previously decoded geometry tree nodes for use in determining the ctxIdx for the syntax element plane_position. The arrays are not used by the decoding process when geometry_planar_enabled_flag is equal to 0 or planar_buffer_disabled_flag is equal to 1. In this process, the variable axisIdx is used to represent one of three coded axes, and the variable axisPos represents the position of the node along axisIdx-th axis. The value of axisPos is within the range of 0..0x3fff. The array IsPlanarNode, with the value IsPlanarNode[axisIdx][axisPos], indicates whether the most recently decoded node with the axisIdx-th position component equal to axisPos is a plane in a plane perpendicular to the axisIdx-th axis. An array PlanarPrevPos has a value PlanarPrevPos[axisIdx][axisPos] that stores the maximum position component of the most recently decoded node having position component at axisIdx equal to axisPos. The array PlanarPlane with the value PlanarPlane[axisIdx][axisPos] shows the value of plane_position[axisIdx] for the most recently decoded node that has the axisIdx-th position component equal to axisPos. At the start of each geometry tree level, each element of the array PlanarPrevPos and IsPlanarNode is initialized to 0. After decoding each geometry_planar_mode_data syntax structure with the XXX parameters childIdx and axisIdx, the arrays PlanarPrevPos, PlanarPlane, and IsPlanarNode are updated as follows.

[0069] The variable axisPos, which represents the position along axisIdx, is derived as follows:

[0070]

number

[0071] The array entry corresponding to the node is updated as follows:

[0072]

number

[0073] 8.2.3.3 Determining ctxIdx for the syntax element plane_position The inputs to this process are as follows:

[0074] The variable axisIdx identifies the axis that is perpendicular to the plane (normal to), and The current node's position (sN, tN, vN) within the geometry tree level. The output of this process is the variable ctxIdx. The variable `neighOccupied` indicates whether there are any nodes adjacent to the current node along axisIdx. It is derived as follows: XXX

[0075]

number

[0076] When planar_buffer_disabled_flag is equal to 1, the value of ctxIdx is set to equal to adjPlaneCtxInc, and no further processing is performed by this process. Otherwise, the remainder of this section applies. The variable axisPos indicates the 14 least significant position bits of the current node along axisIdx.

[0077]

number

[0078] The variable `dist` represents the distance between the current node and the most recently decoded node position that has the same value of `axisPos` along the axisIdx-th axis. It is derived as follows:

[0079]

number

[0080] The context index ctxIdx is derived as follows:

[0081]

number

[0082] 8.2.3.4 Determination of planePosIdxAzimuthalS and planePosIdxAzimuthalT for coding horizontal plane positions [Ed. Correct how this interacts with ctxIdx above. NB: ctxIdx is not plane-independent] The determination of planePosIdxAngularS for the arithmetic coding of plane_position[0] and planePosIdxAngularT for the arithmetic coding of plane_position[1] is obtained as follows: When geometry_angular_enabled_flag is equal to 0, the values of both planePosIdxAzimuthalS and planePosIdxAzimuthalT are set to be equal to planePosIdx. Otherwise, the following applies:

[0083]

number

[0084] The contextAngular decision for the arithmetic coding of plane_position[2] is carried out as described in XREF. 8.2.3.5 Determination of planePosIdxAngular for coding vertical plane position [Ed. Correct how this interacts with ctxIdx above. NB: ctxIdx is not planar independent.] The determination of planePosIdxAngular for arithmetic coding of plane_position[2] is obtained as follows: When geometry_angular_enabled_flag is equal to 0, the value of planePosIdxAngular is set to be equal to planePosIdx. Otherwise, the following applies:

[0085]

number

[0086] The contextAngular decision for the arithmetic coding of plane_position[2] is carried out as described in Section 8.2.4.4. Angular and azimuth modes in GPCC DIS Angle mode syntax Syntax elements that carry Lidar laser sensor information required for the angle coding mode to have the benefit of any coding efficiency are italicized in Table 2. The semantics of these syntax elements are specified in GPCC DIS as follows: A geometry_planar_enabled_flag equal to 1 indicates that planar coding mode is activated. A geometry_planar_enabled_flag equal to 0 indicates that planar coding mode is not activated. If it does not exist, geometry_planar_enabled_flag is presumed to be 0. geom_planar_th[i] specifies the activation threshold value for planar coding mode along the i-th most likely direction for planar coding mode to be efficient, for i in the range of 0 to 2. geom_planar_th[i] is an integer in the range of 0 to 127. `geom_idcm_rate_minus1` specifies the rate at which a node may be eligible for direct coding. If it is not present, `geom_idcm_rate_minus1` is presumed to be 31. The array IdcmEnableMask is derived as follows:

[0087]

number

[0088] A geometry_angular_enabled_flag equal to 1 indicates that angular coding mode is activated. A geometry_angular_enabled_flag equal to 0 indicates that angular coding mode is not activated. A geom_slice_angular_origin_present_flag equal to 1 specifies that a slice relative angular origin is present in the geometry slice header. A geom_slice_angular_origin_present_flag equal to 0 specifies that an angular origin is not present in the geometry slice header. When not present, the geom_slice_angular_origin_present_flag is assumed to be 0. geom_angular_origin_bits_minus1 + 1 is the bit - unit length of the syntax element geom_angular_origin_xyz[k]. geom_angular_origin_xyz[k] specifies the k - th component of the (x, y, z) coordinates of the origin used in the processing of the angular coding mode. When not present, for k = 0..2, the value of geom_angular_origin_xyz[k] is assumed to be 0. geom_angular_azimuth_scale_log2 and geom_angular_radius_scale_log2 specify factors used to scale positions coded using a spherical coordinate system during conversion to Cartesian coordinates. geom_angular_azimuth_step_minus1 + 1 specifies the unit change of the azimuth angle. The differential prediction residuals used in the angular prediction tree coding can be partially represented as multiples of geom_angular_azimuth_step_minus1 + 1. The value of geom_angular_azimuth_step_minus1 shall be less than (1<<geom_angular_azimuth_scale_log2). numbers_lasers_minus1 + 1 specifies the number of lasers used for the angular coding mode. laser_angle_init, and laser_angle_diff[i] for i=0..number_lasers_minus1, specify the tangent of the elevation angle of the i-th laser with respect to the horizontal plane defined by the first and second coded axes. The array LaserAngle[i] when i=0..number_lasers_minus1 is derived as follows:

[0089]

number

[0090] A requirement for bitstream conformance is that the value of LaserAngle[i] when i=1..number_lasers_minus1 must be greater than or equal to LaserAngle[i-1]. laser_correction_init, and laser_correction_diff[i] when i=1..number_lasers_minus1, specify the correction of the i-th laser position relative to GeomAngularOrigin[2] along the second internal axis. laser_phi_per_turn_init_minus1, and laser_phi_per_turn_diff[i] when i=1..number_lasers_minus1, specify the number of samples generated by the i-th laser of the rotation detection system located at the origin used in the angle coding mode processing. The arrays LaserCorrection[i] and LaserPhiPerTurn[i] for the case i=1..number_lasers_minus1 are derived as follows:

[0091]

number

[0092] A requirement for bitstream compatibility is that the value of LaserPhiPerTurn[i] is not zero when i = 0..number_lasers_minus1. The arrays DeltaPhi[i] and InvDeltaPhi[i] for the case i=0..number_lasers_minus1 are derived as follows:

[0093]

number

[0094] A planar_buffer_disabled_flag equal to 1 indicates that using the buffer to track the nearest node is not used in the process of coding planar mode flags and planar positions in planar mode. A planar_buffer_disabled_flag equal to 0 indicates that using the buffer to track the nearest node is used. When not present, planar_buffer_disabled_flag is presumed to be !geometry_planar_enabled_flag.

[0095] [Table 2-1] [Table 2-2] [Table 2-3] [Table 2-4]

[0096] The data syntax for planar mode and direct mode is included in Tables 3 and 4, respectively.

[0097] [Table 3-1] [Table 3-2]

[0098] [Table 4]

[0099] 8.2.4.1 Derivation process for angular qualification of nodes XXX Input / Output If geometry_angular_enabled_flag is equal to 0, angular_eligible will be set to equal to 0. Otherwise, the following applies: The variable deltaAngle, which specifies the minimum angular distance between lasers, is derived as follows:

[0100]

number

[0101] Finally, angular_eligible is derived as follows: [Ed and sNchild need to be checked]

[0102]

number

[0103] 8.2.4.2 Derivation process of the laser index associated with a node XXX Input / Output If angular_eligible is equal to 0, the laserIndex index is set to the preset value UNKOWN_LASER. Instead, if angular_eligible is equal to 1, the following applies as a continuation of the process described in 8.2.5.1: Firstly, the reciprocal rInv of the radial distance from Lidar to the current node is determined as follows:

[0104]

number

[0105] Next, the angle theta32 is determined as follows.

[0106]

number

[0107] [Ed XXX:laserIndex[Parent] is meaningless and another state array needs to be added.] Finally, the angular qualification and associated laser are determined based on the parent node as follows:

[0108]

number

[0109] 8.2.4.3 Derivation process of contextAzimuthalS and contextAzimuthalT for planar coding mode XXX Input / Output The following applies as a continuation of the process described in 8.2.5.2: Firstly, two angles are inferred from the node position relative to the angle origin.

[0110]

number

[0111] Secondly, the azimuth predictor is obtained from the array phiBuffer.

[0112]

number

[0113] The two azimuth contexts are initialized as follows:

[0114]

number

[0115] Next, if the predictor predPhi is not equal to 0x80000000, the following is applied to improve the two azimuth contexts.

[0116]

number

[0117] 8.2.4.4 Context for Planar Coding Mode: The Derivation Process of Angular XXX Input / Output If the laser index laserIndex is equal to UNKOWN_LASER, contextAngular is set to the preset value UNKOWN_CONTEXT. Otherwise, if the laser index laserIndex is not equal to UNKOWN_LASER, the following applies as a continuation of the process described in 8.2.5.2. Firstly, two angular differences with respect to the lower and upper planes, thetaLaserDeltaBot and thetaLaserDeltaTop, are determined.

[0118]

number

[0119] Next, the angle context is inferred from the difference between the two angles.

[0120]

number

[0121]

[0070] Motion prediction in GPCC. There are two types of motion involved in the G-PCC InterEM software (Exploratory Model for Interpretation in G-PCC, ISO / IEC JTC1 / SC29 WG11 N18096, Macau, China, October 2018): global motion matrix and local node motion vector. The global motion parameter is defined as a rotation matrix and translation vector that will be applied to all points in the prediction (reference) frame (except for points to which the local motion mode is applied). The local node motion vector of an octree node is a motion vector that is applied only to points within the node in the prediction (reference) frame. Details of the motion estimation algorithm in InterEM are described below.

[0122]

[0071] Figure 4 is a flowchart illustrating an exemplary motion estimation technique for InterEM. Given an input predicted (reference) frame and a current frame, global motion is first estimated at the global scale. After applying global motion to the prediction, local motion is estimated at the node level at a finer scale in the octtree. Finally, the estimated local node motion is applied in motion compensation.

[0123]

[0072] Details of the above technique are described below.

[0124]

[0073] Method for estimating global motion matrices and translation vectors. Figure 5 is a flowchart illustrating an exemplary technique for estimating local node motion vectors. As shown in Figure 5, the motion vectors are estimated in a recursive manner. The cost function used to select the best preferred motion vectors may be based on rate strain costs.

[0125]

[0074] If the current node is not split into eight children, a motion vector that can produce the lowest cost between the current node and the predicted node is determined. If the current node is split into eight children, the motion estimation algorithm is applied and the total cost under the splitting conditions is obtained by adding up the estimated cost values of each child node. The decision of whether to split or not is reached by comparing the costs between splitting and not splitting, if splitting is done, each subnode is assigned its own motion vector (or may be further split into its children), if not splitting, the current node is assigned a motion vector.

[0126]

[0075] Two parameters that affect the performance of motion vector estimation are the block size and the minimum predictive unit size (MinPUSize). BlockSize defines the upper limit of the node size to which motion vector estimation is applied, and MinPUSize defines the lower limit.

[0127]

[0076] The techniques described above may present one or more drawbacks. The reference block of a node is derived by motion compensation using estimated motion information (rotation and translation). Good estimation of motion information leads to a high correlation between the current node and the reference node regarding geometric structure, such as occupancy and planar information. Therefore, utilizing this geometric information of the reference node improves the coding performance of the current node. According to one or more techniques of this disclosure, a G-PCC coder (e.g., G-PCC encoder 200 or G-PCC decoder 300) may utilize information from the reference block in coding the planar information of the current node. As an example, a G-PCC coder may utilize information from the reference block in selecting the context when coding the node's eligibility for planar coding mode, planar flags, and planar indices. One or more techniques disclosed herein may be applied independently or in combination.

[0128]

[0077] Planar qualification of a node using interpretation. In one example, PlanarRate may be updated by a factor (R) that depends on the planar information of the reference block. In this example, PlanarRate in section 8.2.3.1 in GPCC DIS may be specified as follows (additions are shown in bold italics): The occupancy information of the parent node is used to determine the existence of a single occupied plane along each axis and to update the corresponding plane probability estimate PlanarRate[k].

[0129]

number

[0130] Here, R[k] is a scaling factor that depends on whether the reference block is in planar mode in the k-th direction. For example, if the reference block is planar in the k-th direction, R[k] may be set higher than 1. Otherwise, R[k] may be set lower than 1.

[0131]

[0078] In another example, PlanarEligible may not be allowed if the reference block is not planar. Next, PlanarEligible is set as follows:

[0132]

number

[0133]

[0079] Alternatively, the PlanarRateRef may be determined separately based on the planarity of the reference block. Furthermore, the eligibility of the current node for planar mode may be determined based on comparing the PlanarRateRef to a threshold. For example:

[0080] Next, PlanarEligible is set as follows.

[0134]

number

[0135]

[0081] In some cases, the planar index position is derived based on the planar index position of the reference node. In some cases, this determination may be based on threshold comparison.

[0136]

[0082] Planar Copy Mode (PCM). The G-PCC coder may signal a flag (PCM flag) in the bitstream to indicate whether the current node and the reference node share the same planar mode in all directions (e.g., k=0..2). When this flag is 1, the decoder may not need to decode the planar flag for each direction, and the decoder will only use the corresponding value in the reference node (e.g., the planar flag may be copied from the reference node). When the flag is 0, the planar flag may be signaled in each direction.

[0137]

[0083] At a further level of PCM flags, it may also indicate whether the current node and the reference node share a planar position index. In this example, if the PCM flag is 1, in the k-th direction, the decoder does not need to decode the planar position index and the decoder can use the planar index in the reference block in the same k-th direction.

[0138]

[0084] In some examples, the PCM flag may be signaled conditionally. For example, if the reference block occupancy is 0, the PCM flag may not be signaled and may be implicitly set to 0. In some examples, a flag in the slice header or SPS header may be defined to activate or deactivate the PCM mode.

[0139]

[0085] Context selection in signaling planar flags using interpretation. There may be three contexts for signaling the planar flag (is_planar_flag). The index selection is simply by the direction index (axisIdx). According to one or more techniques of this disclosure, the context for encoding the planar flag can be extended using the planar mode of the reference node. An example of extension can be described below. The inputs to this process are as follows:

[0140] - A variable childIdx that identifies the children of the current node, - A variable axisIdx that identifies an axis perpendicular to the plane, and - The current node's position (sN, tN, vN) within the geometry tree level. The output of this process is the variable ctxIdx. The value of ctxIdx is set to equal to (2*axisIdx+PlanarModeRef[axisIdx]), and no further processing is performed.

[0141]

[0086] In this example, PlanarModeRef[axisIdx] indicates whether the reference node is planar in the axisIdx direction.

[0142]

[0087] CtxIdx determination for the syntax element plane_position using intermode. In GPCC DIS, the context index used to encode plane_position is a function of axisIdx, plane position prediction based on neighbor occupancy, and a buffer lookup to determine that the nearest already coded node in the same buffer row index contains isPlanar, planePosition, distance measure, as follows (Section 8.2.3.3): The context index ctxIdx is derived as follows:

[0143]

number

[0144]

[0088] In one example of the present disclosure, the occupancy and planar mode of the reference block may be used as additional parameters for determining the context index of the planar position coding.

[0145]

[0089] PlanarModeRef and RefPlane are defined as the planar mode and planar position in the reference block.

[0146]

[0090] If PlanarModeRef[axisIdx] is 0, RefPlane[axisIdx] is set to equal to -1.

[0147]

[0091] In one example, RefPlane[axisIdx] may be used to replace prevPlane.

[0148]

number

[0149]

[0092] In another example, ctxIdx may be updated as follows:

[0150]

number

[0151] Here, N is the number of contexts supported using only axisIdx, adjPlaneCtxInc, distCtxInc, and prevPlane. In the current draft GPCC DIS, N is 36.

[0152]

[0093] In another example, the occupation and planar position of a reference block may be used to replace a neighbor occupation for deriving a context.

[0153]

[0094] In another example, the context index for the planar flag can be derived using only the orientation index and planar mode of the reference block, as follows:

[0154]

number

[0155]

[0095] A process for deriving context Angular for planar coding mode using interpretation. Let M be the number of contexts supported for context Angular to code planar positions. After context Angular has been derived as in the case of Section 3.3.5, it may be updated with the planar mode and planar position of the reference block.

[0156]

[0096] For example, the contextAngular in Section 3.3.5 may be updated as follows:

[0157]

[0097] In some cases, if PlanarModeRef[axisIdx] is 0, RefPlane[axisIdx] is set to equal to -1. The contextAngular value can be assigned as follows: Next, the angle context is inferred from the difference between the two angles.

[0158]

number

[0159]

[0098] In another example, if PlanarModeRef[axisIdx] is 0, RefPlane[axisIdx] is set to equal to 0.

[0160]

[0099] The contextAngular value can be assigned as follows: Next, the angle context is inferred from the difference between the two angles.

[0161]

number

[0162]

[0100] A process for deriving context contextAzimuthalS and contextAzimuthalT combined with interprediction for planar coding mode.

[0163]

[0101] After contextAzimuthalS and contextAzimuthalT are derived in 8.2.4.3, they may be updated along with the use of planar mode for interreference blocks.

[0164]

[0102] In some cases, if PlanarModeRef[axisIdx] is 0, RefPlane[axisIdx] is set to equal to -1. The following modifications may be made.

[0165]

number

[0166]

[0103] In some cases, if PlanarModeRef[axisIdx] is 0, RefPlane[axisIdx] is set to equal to 0. The following modifications may be made.

[0167]

number

[0168]

[0104] Context-occupied coding using interpretation.

[0169]

[0105] In the reference software for InterEM, the context derivation for the occupied bits is derived as follows:

[0170]

number

[0171]

[0106] In the above calculation, the context associated with the inter prediction (ctxInter) is the sum of !!mappedPred, bitPred, and bitPredStrong. According to one or more techniques of this disclosure, ctxIdxMapIdx may be modified as follows:

[0172]

number

[0173]

[0107] Motion-based threshold. A threshold may be defined based on a motion vector / motion parameter, and this threshold may be used in one or more decisions disclosed herein.

[0174]

[0108] For example, the threshold can be determined based on a magnitude / parameter associated with a rotation (e.g., an angle of rotation) or a translation (e.g., a magnitude of translation), where x is the angle associated with the rotation and y is the magnitude of the translation, then the threshold can be derived as a function of x and y (e.g., a linear combination a*x + b*y, where a and b are fixed values).

[0175]

[0109] In other alternative forms, the thresholds associated with each axis may be derived separately; for example, the translation associated with an axis may be used to derive the threshold associated with that axis.

[0176]

[0110] Based on thresholds, one or more decisions may be made. For example, if a point is associated with zero motion, the threshold associated with zero motion may be used to determine the planar qualification of the node (as described above with respect to PCM), and if the motion associated with the point is greater, a different threshold may be used to determine the planar qualification.

[0177]

[0111] In another example, one or more decisions disclosed in this document may be invalidated when a threshold (or motion parameter) exceeds a certain fixed value.

[0178]

[0112] Different thresholds may be used for different decisions. Thresholds may also be signaled in the bitstream.

[0179]

[0113] Examples in various aspects of this disclosure may be used individually or in any combination.

[0180]

[0114] Figure 6 is a conceptual diagram showing an exemplary range measuring system 700 that may be used with one or more techniques of the present disclosure. In the example of Figure 6, the range measuring system 700 includes an illuminator 702 and a sensor 704. The illuminator 702 may emit light 706. In some examples, the illuminator 702 may emit light 706 as one or more laser beams. Light 706 may be at one or more wavelengths, such as infrared wavelengths or visible light wavelengths. In other examples, light 706 is not coherent laser light. When light 706 encounters an object, such as object 708, light 706 yields reflected light 710. The reflected light 710 may include backscattered and / or reflected light. The reflected light 710 may pass through a lens 711 that directs the reflected light 710 to yield an image 712 of object 708 onto sensor 704. Sensor 704 generates a signal 714 based on the image 712. Image 712 may comprise a set of points (represented, for example, by the dots in Image 712 in Figure 6).

[0181]

[0115] In some examples, the illuminator 702 and sensor 704 may be mounted on a spinning structure (e.g., a spinning LIDAR sensor) so that the illuminator 702 and sensor 704 capture a 360-degree field of view of the environment. In other examples, the range measuring system 700 may include one or more optical components (e.g., mirrors, collimators, diffraction gratings, etc.) that enable the illuminator 702 and sensor 704 to detect the range of an object within a specific range (e.g., up to 360 degrees). Although the example in Figure 6 shows only a single illuminator 702 and sensor 704, the range measuring system 700 may include multiple sets of illuminators and sensors.

[0182]

[0116] In some examples, the illuminator 702 generates a structured light pattern. In such examples, the range measuring system 700 may include a plurality of sensors 704 from which each image of the structured light pattern is formed. The range measuring system 700 may use the parallax between the images of the structured light pattern to determine the distance to an object 708 from which the structured light pattern is backscattered. A structured light-based range measuring system may have a high level of accuracy (e.g., sub-millimeter range accuracy) when the object 708 is relatively close to the sensors 704 (e.g., 0.2 meters to 2 meters). This high level of accuracy may be useful in facial recognition applications such as unlocking mobile devices (e.g., mobile phones, tablet computers, etc.) and for security applications.

[0183]

[0117] In some examples, the range measurement system 700 is a time-of-flight (ToF) based system. In some examples where the range measurement system 700 is a ToF based system, the illuminator 702 generates pulses of light. In other words, the illuminator 702 may modulate the amplitude of the emitted light 706. In such examples, the sensor 704 detects the return light 710 from the pulses of light 706 generated by the illuminator 702. The range measurement system 700 can then determine the distance to the object 708 from which the light 706 is backscattered, based on the delay between when the light 706 is emitted and detected and the known speed of light in the air. In some examples, instead of modulating the amplitude of the emitted light 706 (or in addition to it), the illuminator 702 may modulate the phase of the emitted light 706. In such an example, the sensor 704 may detect the phase of the reflected light 710 from the object 708 and, using the speed of light and based on the time difference between when the illuminator 702 produced the light 706 at a specific phase and when the sensor 704 detected the reflected light 710 at a specific phase, determine the distance to a point on the object 708.

[0184]

[0118] In other examples, the point cloud may be generated without using the illuminator 702. For example, in some examples, the sensor 704 of the range measuring system 700 may include two or more optical cameras. In such examples, the range measuring system 700 may use optical cameras to capture a stereo image of the environment including an object 708. The range measuring system 700 may include a point cloud generator 716 that can calculate the parallax between locations in the stereo image. The range measuring system 700 may then use the parallax to determine the distance to the locations shown in the stereo image. From these distances, the point cloud generator 716 may generate a point cloud.

[0185]

[0119] The sensor 704 may also detect other attributes of the object 708, such as color and reflectance information. In the example of Figure 6, the point cloud generator 716 may generate a point cloud based on the signal 714 generated by the sensor 704. The range measurement system 700 and / or the point cloud generator 716 may form part of the data source 104 (Figure 1). Thus, the point cloud generated by the range measurement system 700 may be encoded and / or decoded according to any of the techniques of this disclosure.

[0186]

[0120] Figure 7 is a conceptual diagram illustrating an exemplary vehicle-based scenario in which one or more techniques of the present disclosure may be used. In the example of Figure 7, the vehicle 800 includes a range measuring system 802. The range measuring system 802 may be implemented in the manner described with respect to Figure 107. Although not shown in the example of Figure 7, the vehicle 800 may also include a data source, such as a data source 104 (Figure 1), and a G-PCC encoder, such as a G-PCC encoder 200 (Figure 1). In the example of Figure 7, the range measuring system 802 emits a laser beam 804 that is reflected from a pedestrian 806 or other object in the road. The data source of the vehicle 800 may generate a point cloud based on the signal generated by the range measuring system 802. The G-PCC encoder of the vehicle 800 may encode the point cloud to generate a bitstream 808, such as a geometry bitstream (Figure 2) and an attribute bitstream (Figure 2). The bitstream 808 may contain far fewer bits than the unencoded point cloud obtained by the G-PCC encoder.

[0187]

[0121] The output interface of the vehicle 800 (for example, output interface 108 (Figure 1)) can transmit the bitstream 808 to one or more other devices. The bitstream 808 may contain far fewer bits than the unencoded point cloud acquired by the G-PCC encoder. Therefore, the vehicle 800 may be able to transmit the bitstream 808 to other devices more quickly than the unencoded point cloud data. Furthermore, the bitstream 808 may require less data storage capacity.

[0188]

[0122] In the example of Figure 7, vehicle 800 may transmit bitstream 808 to another vehicle 810. Vehicle 810 may include a G-PCC decoder, such as G-PCC decoder 300 (Figure 1). The G-PCC decoder of vehicle 810 may decode bitstream 808 to reconstruct the point cloud. Vehicle 810 may use the reconstructed point cloud for various purposes. For example, vehicle 810 may decide based on the reconstructed point cloud that pedestrian 806 is on the road ahead of vehicle 800 and therefore begin to decelerate, for example, even before the driver of vehicle 810 is aware that pedestrian 806 is on the road. Thus, in some examples, vehicle 810 may perform autonomous navigation operations based on the reconstructed point cloud.

[0189]

[0123] As an addition or alternative, the vehicle 800 may transmit the bitstream 808 to the server system 812. The server system 812 may use the bitstream 808 for various purposes. For example, the server system 812 may store the bitstream 808 for subsequent reconstruction of the point cloud. In this example, the server system 812 may use the point cloud along with other data (e.g., vehicle telemetry data generated by the vehicle 800) to train an autonomous driving system. In another example, the server system 812 may store the bitstream 808 for subsequent reconstruction for forensic accident investigation.

[0190]

[0124] Figure 8 is a conceptual diagram showing an exemplary extended reality system in which one or more of the techniques of the present disclosure may be used. Extended reality (XR) is a term used to cover a range of techniques including augmented reality (AR), mixed reality (MR), and virtual reality (VR). In the example of Figure 8, user 900 is located at a first location 902. User 900 wears an XR headset 904. As an alternative to the XR headset 904, user 900 may use a mobile device (e.g., a mobile phone, tablet computer, etc.). The XR headset 904 includes depth-sensing sensors, such as a range-measuring system, that detect the position of a point on an object 906 at location 902. The data source for the XR headset 904 may use signals generated by the depth-sensing sensors to generate a point cloud representation of the object 906 at location 902. The XR headset 904 may include a G-PCC encoder (e.g., the G-PCC encoder 200 in Figure 1) configured to encode a point cloud to generate a bitstream 908.

[0191]

[0125] The XR headset 904 may transmit a bitstream 908 to an XR headset 910 worn by user 912 at a second location 914 (for example, via a network such as the Internet). The XR headset 910 may decode the bitstream 908 to reconstruct the point cloud. The XR headset 910 may use the point cloud to generate an XR visualization (for example, an AR, MR, or VR visualization) representing an object 906 at location 902. Thus, in some examples, user 912 may have a 3D immersive experience of location 902, such as when the XR headset 910 generates a VR visualization. In some examples, the XR headset 910 may determine the position of a virtual object based on the reconstructed point cloud. For example, based on the reconstructed point cloud, the XR headset 910 may determine that the environment (for example, location 902) includes a flat surface, and then determine that a virtual object (for example, a cartoon character) should be placed on that flat surface. The XR headset 910 can generate XR visualizations in which virtual objects are located at a predetermined position. For example, the XR headset 910 could show a cartoon character sitting on a flat surface.

[0192]

[0126] Figure 9 is a conceptual diagram showing an exemplary mobile device system in which one or more techniques of the present disclosure may be used. In the example of Figure 9, a mobile device 1000 (e.g., a wireless communication device), such as a mobile phone or tablet computer, includes a range measuring system, such as a LIDAR system, which detects the location of points on an object 1002 in the environment of the mobile device 1000. The data source of the mobile device 1000 may use signals generated by a depth sensing sensor to generate a point cloud representation of the object 1002. The mobile device 1000 may include a G-PCC encoder (e.g., the G-PCC encoder 200 in Figure 1) configured to encode the point cloud to generate a bitstream 1004. In the example of Figure 9, the mobile device 1000 may transmit the bitstream to a remote device 1006, such as a server system or another mobile device. The remote device 1006 may decode the bitstream 1004 to reconstruct the point cloud. The remote device 1006 may use the point cloud for a variety of purposes. For example, the remote device 1006 may use a point cloud to generate a map of the environment of the mobile device 1000. For example, the remote device 1006 may generate a map of the interior of a building based on a reconstructed point cloud. In another example, the remote device 1006 may generate an image (e.g., computer graphics) based on a point cloud. For example, the remote device 1006 may use the points of the point cloud as the vertices of a polygon and the color attributes of the points as the basis for shading the polygon. In some examples, the remote device 1006 may use a reconstructed point cloud for facial recognition or other security applications.

[0193]

[0127] Figures 10A and 10B show an example of this process in bin n. In example 201 of Figure 10A, given a context state (σ), the range in bin n is LPS(p σThis includes RangeMPS and RangeLPS, which are given by the probability of ). Example 201 shows the update of the range in bin n+1 when the value in bin n is equal to MPS. In this example, the low remains the same, but the value of the range in bin n+1 is reduced to the value of RangeMPS in bin n. Example 203 in Figure 10B shows the update of the range in bin n+1 when the value in bin n is not equal to MPS (i.e., equal to LPS). In this example, the low is moved to a lower range value of RangeLPS in bin n. Furthermore, the value of the range in bin n+1 is reduced to the value of RangeLPS in bin n.

[0194]

[0128] In some examples, the range may be represented by 9 bits and the low by 10 bits. There is a renormalization process to maintain the range and low values with sufficient precision. Whenever the range is less than 256, renormalization is performed. Thus, after renormalization, the range is always equal to or greater than 256. Depending on the range value and the low value, the BAC outputs either "0" or "1" to the bitstream, or updates an internal variable (called BO: bits-outstanding) to hold for future outputs. Figure 11 shows an example of BAC output depending on the range. For example, when the range and low are above a certain threshold (e.g., 512), "1" is output to the bitstream. When the range and low are below a certain threshold (e.g., 512), "0" is output to the bitstream. When the range and low are between a certain threshold, nothing is output to the bitstream. Instead, the BO value is incremented and the next bin is encoded.

[0195]

[0129] As described above, arithmetic coding methods can be used to provide high compression efficiency. This can be achieved by first converting non-binary syntax elements into binary representations (e.g., 0, 1) using a process called binarization. The resulting transformed entries are called bins or bin strings. These bins or bin strings are then fed into an arithmetic coding process. FIG. 11 shows an exemplary context-adaptive binary arithmetic coding (CABAC) encoding stage. The exemplary CABAC encoding stage can be implemented in a G-PCC encoder, such as by arithmetic coding unit 214 and / or arithmetic coding unit 226 of the G-PCC encoder 200 in FIG. 2.

[0196]

[0130] In some examples of G-PCC, context-adaptive binary arithmetic coding (CABAC) can be used to generate bins through a binarization process. For each coded bin value, an appropriate context model is selected. These context models are used to encode each bin value into output bits based on bin probability values. The CABAC engine bypasses context modeling and bin encoding when the bin is equally likely to be 0 or 1. This is the bypass coding stage described below. Otherwise, an appropriate context model is specified when the bin value is encoded and modeled based on the probability of the bin value. The context is adapted as the encoder encodes more bins. Finally, the context-coded bin values or raw bit stream are sent to or otherwise provided to the decoder.

[0197] [

[0131] ]FIG. 12 is a block diagram of an exemplary arithmetic coding unit 214 configured to perform CABAC according to the techniques of the present disclosure. A syntax element 1180 is input to the arithmetic coding unit 214. If the syntax element is already a binary value syntax element (e.g., a flag, or other syntax element having only values of 0 and 1), the binarization step may be skipped. If the syntax element is a non-binary value syntax element (e.g., a syntax element that can have a value other than 1 or 0), the non-binary value syntax element is binarized by a binarizer 1200. The binarizer 1200 performs a mapping of the non-binary value syntax element to a sequence of binary decisions. These binary decisions are often referred to as "bins". For example, at the transform coefficient level, the level value can be divided into consecutive bins, each bin indicating whether the absolute value of the coefficient level is greater than a certain value. For example, bin 0 (which may sometimes be called a significance flag) indicates whether the absolute value of the transform coefficient level is greater than 0. Bin 1 indicates whether the absolute value of the transform coefficient level is greater than 1, and so on. For each non-binary value syntax element, a unique mapping can be created.

[0198] [

[0132] ]Each bin generated by the binarizer 1200 is supplied to the binary arithmetic coding side of the arithmetic coding unit 214. That is, for a given set of non-binary value syntax elements, each bin type (e.g., bin 0) is coded before the next bin type (e.g., bin 1). The coding can be performed in either normal mode or bypass mode. In bypass mode, a bypass coding engine 1260 performs arithmetic coding using a fixed probability model, for example, using Golomb-Rice or exponential Golomb coding. Bypass mode is generally used for more predictable syntax elements.

[0199]

[0133] Coding in normal mode involves performing CABAC. Normal mode CABAC is for coding bin values when the probability of a bin value is predictable given the values of previously coded bins. The probability that a bin is LPS is determined by the context modeler 1220. The context modeler 1220 outputs the bin value and a stochastic state for the context (for example, a stochastic state σ containing the value of LPS and the probability that LPS occurs). The context can be an initial context for a set of bins, or it can be determined based on the coded values of previously coded bins. Identification information for the context can be represented and / or determined based on the value of the variable ctxInc (a context increment, such as the value of ctxInc representing the increment to be applied to the previous context). As described above, the context modeler 1220 may update its state based on whether the received bin was MPS or LPS. After the context and stochastic state σ have been determined by the context modeler 1220, the normal coding engine 1240 performs BAC on the bin value.

[0200]

[0134] Figure 13 is a block diagram of an exemplary arithmetic decoding unit 302 that may be configured to perform CABAC according to the technique of the present disclosure. The arithmetic decoding unit 302 in Figure 13 performs CABAC in the reverse manner to that of the arithmetic coding unit 214 described in Figure 12. Coated bits from bitstream 2180 are input to the arithmetic decoding unit 302. The coded bits are fed to either the context modeler 2200 or the bypass decoding engine 2220, based on whether they were entropy coded using normal mode or bypass mode. If the coded bits were coded in bypass mode, the bypass decoding engine will use Golomb-Rice or exponential Golomb decoding to extract, for example, the bins of binary value syntax elements or non-binary syntax elements.

[0201]

[0135] If the coded bits were coded in normal mode, the context modeler 2200 may determine a probabilistic model for the coded bits, and the normal decoding engine 2240 may decode the coded bits to generate bins (or, in the case of binary values, the syntax elements themselves) for non-binary value syntax elements. After the context and probabilistic state σ have been determined by the context modeler 2200, the normal decoding engine 2240 performs BAC to decode the bin values. In other words, the normal decoding engine 2240 may determine the probabilistic state of the context and decode the bin values based on the previously coded bins and the current range. After decoding the bins, the context modeler 2200 may update the probabilistic state of the context based on the window size and the decoded bin values.

[0202]

[0136] Figure 14 is a flowchart illustrating an exemplary technique for predicting points in a point cloud according to one or more embodiments of the present disclosure. The technique of Figure 14 may be implemented by a G-PCC coder, such as the G-PCC encoder 200 of Figure 2. However, other devices, such as the G-PCC decoder 300 of Figure 3, may implement the technique of Figure 14.

[0203]

[0137] The G-PCC encoder 200 can obtain planar information of the reference blocks in the point cloud (1402). For example, the arithmetic coding unit 214 of the G-PCC encoder 200 can determine whether the reference block is coded using planar mode in a particular direction (for example, PlanarModeRef[axisIdx] can indicate whether the reference block / node is planar in the axisIdx direction).

[0204]

[0138] The G-PCC encoder 200 can determine the context based on the planar information of the reference block (1404). For example, the arithmetic coding unit 214 can determine the context index (ctxIdx) based on the planar information of the reference block. As an example, the arithmetic coding unit 214 can determine ctxIdx as (2*axisIdx+PlanarModeRef[axisIdx]).

[0205]

[0139] The G-PCC encoder 200 may, based on the context, context-adaptively code a syntax element that indicates whether the current node is coded using planar mode (1406). For example, the arithmetic coding unit 214 may perform context-adaptive binary arithmetic coding (CABAC) of the is_planar_flag syntax element for the current node based on ctxIdx. As described above, an is_planar_flag syntax element equal to 1 may indicate that the positions of the children of the current node form a single plane perpendicular to the axisIdx-th axis. An is_planar_flag[axisIdx] equal to 0 may, when present, indicate that the positions of the children of the current node occupy both planes perpendicular to the axisIdx-th axis.

[0206]

[0140] The G-PCC encoder 200 may encode the current node using planar mode, based on the fact that the current node is encoded using planar mode (1408). For example, the G-PCC encoder 200 may encode the children of the current node as forming a single plane.

[0207]

[0141] In some examples, if the current node is coded using planar mode, the arithmetic coding unit 214 may determine a second context based on the reference plane and, based on the second context, context-adaptive coding of a syntax element indicating the plane for the current node. The syntax element indicating the plane for the current node may be the plane_position syntax element. In some examples, to determine a second context based on a reference plane, the arithmetic coding unit 214 may determine the context index according to the following formula, i.e., ctxIdx = (12 × axisIdx + 4 × adjPlaneCtxInc + 2 × distCtxInc + prevPlane + 3) + (RefPlane[axisIdx] + 1) × N, where ctxIdx is the context index, axisIdx is the axis index, adjPlaneCtxInc is the adjusted plane context increment, distCtxInc is the distance context increment, prevPlane is the previous plane, and RefPlane[axisIdx] is the reference plane.

[0208]

[0142] In some examples, the arithmetic coding unit 214 may determine an angular context for the current node based on a reference plane, and determine a plane for the current node based on the angular context. To code the current node using plane mode, the G-PCC encoder 200 may code the current node based on a plane.

[0209]

[0143] In some examples, the arithmetic coding unit 214 may determine the azimuthal context for the current node based on a reference plane, and determine the plane for the current node based on the azimuthal context. To code the current node using plane mode, the G-PCC encoder 200 may code the current node based on a plane. In some examples, to determine the azimuthal context, the arithmetic coding unit 214 may determine the azimuthal context according to the following formula, namely contextAzimuthal = contextAnglePhi + 8 × (RefPlane[axisIdx] + 1), where contextAzimuthal is the azimuthal context, contextAnglePhi is an intermediate value used to derive the azimuthal context, and RefPlane[axisIdx] is the reference plane. Multiple azimuthal contexts may be derived based on contextAnglePhi, including contextAzimuthalS and contextAzimuthalT.

[0210]

[0144] In some examples, the current node may be selectively coded using Planar Copy Mode (PCM). For example, the G-PCC encoder 200 may decide whether to copy planar information for the current node from a reference node. The G-PCC encoder 200 may signal whether the current node is coded using PCM. For example, the arithmetic coding unit 214 may code a syntax element, such as a binary flag (e.g., PCM_flag), indicating whether the current node is coded using Planar Copy Mode. If the current node is coded using Planar Copy Mode, the G-PCC decoder 300 may copy the planar information for the current node from a reference node. For example, the G-PCC decoder 300 may use the planar position of the reference node as the planar position of the current node. Similarly, if the current node is not coded using Planar Copy Mode, the G-PCC encoder 200 may encode planar information for the current node from the bitstream (and the G-PCC decoder 300 may decode it). In this way, PCM can improve coding efficiency.

[0211]

[0145] The following numbered clauses may represent one or more aspects of this disclosure.

[0212]

[0146] Clause 1A. A method for coding point cloud data, the method comprising: obtaining planar information of a reference block of point cloud data; and coding the current block of point cloud data based on the obtained planar information.

[0213]

[0147] Clause 2A. The method of Clause 1A, wherein coding the current block of point cloud data comprises determining a planar rate for the current orientation of the current block, at least in part on whether a reference block is coded using planar mode in the current orientation.

[0214] Clause 3A. The method according to Clause 1A, comprising determining whether the current block is plane - eligible based on whether the reference block is a plane when coding the current block of point - cloud data.

[0215] Clause 4A. The method according to Clause 1A, further comprising coding a syntax element having a value indicating whether the current block and the reference block share the same plane mode in all directions in the coded bitstream.

[0216] Clause 5A. The method according to Clause 1A, further comprising coding a syntax element having a value indicating whether the current block and the reference block share the same plane position index in the coded bitstream.

[0217] Clause 5A. The method according to Clause 1A, comprising determining a context for context - adaptive coding of the plane flag of the current block based on whether the reference block is a plane when coding the current block of point - cloud data.

[0218] Clause 6A. The method according to Clause 1A, comprising determining an angle context for the current block based on the plane mode of the plane position of the reference block when coding the current block of point - cloud data.

[0219] Clause 7A. A device for processing point - cloud data, the device comprising one or more means for implementing the method according to any of Clauses 1A to 6A.

[0220] Clause 8A. The device according to Clause 7A, wherein one or more means comprise one or more processors implemented in a circuit.

[0221]

[0155] Clause 9A. The device described in either Clause 7A or 8A, further comprising memory for storing data representing a point cloud.

[0222]

[0156] Clause 10A. A device according to any one of Clauses 7A to 9A, wherein the device comprises a decoder.

[0223]

[0157] Clause 11A. A device according to any of Clauses 7A to 10A, wherein the device comprises an encoder.

[0224]

[0158] Clause 12A. A device as described in any of Clauses 7A to 11A, further comprising a device for generating a point cloud.

[0225]

[0159] Clause 13A. A device according to any one of Clauses 7A to 12A, further comprising a display for presenting an image based on a point cloud.

[0226]

[0160] Clause 14A. A computer-readable storage medium storing instructions, wherein, when an instruction is executed, causes one or more processors to perform the method described in any of Clauses 1A to 6A.

[0227]

[0161] Depending on the example, some of the actions or events among the techniques described herein may be performed in different sequences, added, merged, or completely excluded (for example, not all described actions or events are necessary for the practice of the technique). Furthermore, in some examples, the actions or events may not be performed sequentially, but rather simultaneously, for example, through multithreading, interrupt handling, or across multiple processors.

[0228]

[0162] In one or more examples, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted via computer-readable media as one or more instructions or codes and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to tangible media such as data storage media, or communication media including any media that facilitates the transfer of computer programs from one place to another according to a communication protocol, for example. Thus, computer-readable media may generally correspond to (1) non-transient tangible computer-readable storage media, or (2) communication media such as signals or carrier waves. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes and / or data structures for implementation of the techniques described herein. Computer program products may include computer-readable media.

[0229]

[0163] As an example, and not an limitation, such computer-readable storage media may include RAM, ROM, EEPROM®, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and can be accessed by a computer. Any connection is also appropriately called a computer-readable medium. For example, if instructions are transmitted from a website, server or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technology such as infrared, radio, and microwave are included in the definition of a medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carriers, signals or other temporary media, but instead cover non-temporary tangible storage media. As used herein, the terms "disk" and "disc" include compact discs (CDs), laserdiscs (registered trademarks), optical discs (discs), digital multipurpose discs (DVDs), floppy disks (registered trademarks), and Blu-ray discs (discs), where a disc typically reproduces data magnetically and a disc reproduces data optically using a laser. Any combination of the above should also be included within the scope of computer-readable media.

[0230]

[0164] Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the terms “processor” and “processing circuit” as used herein may refer to any of the above-described structures or any other structure suitable for implementing the techniques described herein. Furthermore, in some embodiments, the functions described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into a composite codec. The techniques may also be fully implemented by one or more circuits or logic elements.

[0231]

[0165] The techniques of the Disclosure may be implemented in a wide variety of devices or apparatus, including wireless handsets, integrated circuits (ICs) or sets of ICs (e.g., chipsets). In this Disclosure, various components, modules, or units have been described to highlight the functional aspects of devices configured to implement the techniques disclosed, but these components, modules, or units do not necessarily require implementation by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit, including one or more processors described above, along with suitable software and / or firmware, or provided by a set of interoperable hardware units.

[0232]

[0166] Various examples have been described. These and other examples fall within the scope of the following claims. The invention described in the original claims of this application is listed below. [C1] A device for processing a point cloud, wherein the device is A memory configured to store at least a portion of the aforementioned point cloud, One or more processors implemented in the circuit and The one or more processors are equipped with Obtaining planar information of the reference block in the aforementioned point cloud, Based on the planar information of the aforementioned reference block, the context is determined, Based on the aforementioned context, context-adaptive coding is performed to indicate whether the current node is coded using planar mode, Based on the fact that the current node is coded using the planar mode, coding the current node using the planar mode A device configured to perform the following actions. [C2] The device according to C1, wherein the syntax element indicating whether the current node is coded using the planar mode comprises the is_planar_flag syntax element. [C3] The above context is a first context, wherein the one or more processors are In response to the determination that the current node is coded using the planar mode, Determining a second context based on a reference plane, Based on the second context described above, the syntax element indicating the plane for the current node is to be context-adaptive coding. It is further configured to do the following: Herein, in order to code the current node using the planar mode, one or more processors are configured to code the current node based on the plane, The device described in C1. [C4] The device according to C3, wherein the syntax element indicating the plane for the current node comprises a plane_position syntax element. [C5] In order to determine the second context based on the reference plane, one or more processors are configured to determine the context index according to the following formula:

number

Claims

1. A device for processing a point cloud, wherein the device is A memory configured to store at least a portion of the aforementioned point cloud, One or more processors implemented in the circuit and The one or more processors are equipped with To obtain the planar information of the reference node of the point cloud, Based on the planar information of the reference node, a first context is determined, Based on the first context, context-adaptive coding of a syntax element indicating whether the current node of the point cloud is coded using planar mode, Based on the reference plane, determine the context for the current node, Based on the context for the current node, determine the plane for the current node, The current node, which is coded using the planar mode, is coded based on the plane. A device configured to perform the following actions.

2. The device according to claim 1, wherein the syntax element indicating whether the current node is coded using the planar mode comprises the is_planar_flag syntax element.

3. The one or more processors are In response to the determination that the current node is coded using the planar mode, Based on the aforementioned reference plane, a second context is determined as the context for the current node, Based on the second context, context-adaptive coding of the syntax element indicating the plane for the current node, Further configured to perform, The device according to claim 1.

4. The device according to claim 3, wherein the syntax element indicating the plane for the current node comprises a plane_position syntax element.

5. In order to determine the second context based on the reference plane, one or more processors are configured to determine the context index according to the following formula: [Math 1] Here, ctxIdx is the context index, axisIdx is the axis index, adjPlaneCtxInc is the adjusted plane context increment, distCtxInc is the distance context increment, prevPlane is the previous plane, RefPlane[axisIdx] is the reference plane, and N is the number of contexts supported using only axisIdx, adjPlaneCtxInc, distCtxInc, and prevPlane. The device according to claim 3.

6. The one or more processors Based on the aforementioned reference plane, the angular context is determined as the context for the current node, Based on the angle context, the system is configured to determine the plane for the current node, The device according to claim 1.

7. The one or more processors described above are Based on the aforementioned reference plane, the azimuth context is determined as the context for the current node, Configured to determine the plane for the current node based on the azimuth context, The device according to claim 1.

8. In order to determine the azimuth context, one or more processors are configured to determine the azimuth context according to the following formula: [Math 2] Here, contextAzimuthal is the azimuth context, contextAnglePhi is the intermediate value used to derive the azimuth context, and RefPlane[axisIdx] is the reference plane. The device according to claim 7.

9. The one or more processors described above are This involves coding a syntax element that indicates whether the current node is coded using planar copy mode, When the current node is coded using the planar copy mode, the planar information of the current node is copied from the reference node. The device according to claim 1, further configured to perform the following:

10. In order to copy the planar information, one or more processors The planar position of the reference node is used as the planar position of the current node. The device according to claim 9, configured as described above.

11. The device according to claim 9, wherein the syntax element indicating whether the current node is coded using the planar copy mode comprises a binary flag.

12. Spinning LiDAR sensor The device according to claim 1, further comprising, wherein one or more processors are configured to generate the point cloud based on data generated by the spinning LiDAR sensor, and optionally, the device is a vehicle including the spinning LiDAR sensor.

13. The device according to claim 1, wherein the device is a wireless communication device.

14. A method for coding point cloud data, wherein the method is To obtain the planar information of the reference node of the point cloud, Based on the planar information of the reference node, a first context is determined, Based on the first context, context-adaptive coding of a syntax element indicating whether the current node of the point cloud is coded using planar mode, Based on the reference plane, determine the context for the current node, Based on the context for the current node, determine the plane for the current node, The current node, which is coded using the planar mode, is coded based on the plane. A method that includes [a certain feature].

15. When executed, it will be used by one Tera processor. Obtaining planar information of the reference node in the point cloud, Based on the planar information of the reference node, a first context is determined, Based on the first context, context-adaptive coding of a syntax element indicating whether the current node of the point cloud is coded using planar mode, Based on the reference plane, determine the context for the current node, Based on the context for the current node, determine the plane for the current node, The current node, which is coded using the planar mode, is coded based on the plane. A computer-readable storage medium that stores commands to perform an action.