Point cloud decoding device, point cloud decoding method, and program

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The point cloud decoding device and method enhance encoding efficiency by adapting processing for Non-Spinning LiDAR data based on sensor type or operation mode, addressing impaired compression performance in existing methods.

JP7874075B2Active Publication Date: 2026-06-15KDDI CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: KDDI CORP
Filing Date: 2023-07-07
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Existing methods for encoding Non-spinning LiDAR data do not consider data characteristics and biases, leading to impaired compression performance.

Method used

A point cloud decoding device and method that switches processing based on sensor type or operation mode, performing intra prediction using the values of parent and ancestor nodes for Non-Spinning LiDAR data or elevation angle component prediction, and activating residual decoding in the Angular mode.

Benefits of technology

Improves compression performance by adapting processing to sensor type or operation mode, enhancing encoding efficiency for Non-Spinning LiDAR data.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007874075000009
Figure 0007874075000010
Figure 0007874075000011

Patent Text Reader

Abstract

To provide compression performance of coding.SOLUTION: In a point cloud decoding device 200, a tree synthesis section 2020 switches processing to be performed according to a sensor type or an operation mode in an Angular mode of Predictive coding. When it is in the Angular mode and the sensor type indicates Non-Spinning LiDAR data or the operation mode indicates the prediction of an elevation angle component and validation of residual decoding, intra-prediction is performed on the basis of values of a parent node and an ancestor node of a node to be processed, for coordinate values r, θ, and φ after polar coordinate conversion of the node to be processed.SELECTED DRAWING: Figure 2

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a point cloud decoding device, a point cloud decoding method, and a program.

Background Art

[0002] In Non-Patent Document 1, in Predictive coding, a technique of performing intra prediction in an Angular mode that takes into account data characteristics and biases for Spinning LiDAR data is disclosed.

[0003] Also, in Non-Patent Document 2, in Predictive coding, a technique of performing inter prediction in an Angular mode that takes into account data characteristics and biases for Spinning LiDAR data is disclosed.

Prior Art Documents

Non-Patent Documents

[0004]

Non-Patent Document 1

Non-Patent Document 2

Summary of the Invention

Problems to be Solved by the Invention

[0005] However, in the method of Non-Patent Document 1, for Non-spinning LiDAR data, intra prediction is performed without considering data characteristics and biases, resulting in a problem that compression performance is impaired.

[0006] Therefore, the present invention has been made in view of the above problems, and an object thereof is to provide a point cloud decoding device, a point cloud decoding method, and a program that can improve the compression performance of encoding.

Means for Solving the Problems

[0007] A first feature of the present invention is a point cloud decoding device including a tree synthesis unit. The tree synthesis unit switches the processing to be performed according to the sensor type or the operation mode in the Angular mode of Predictive coding. In the Angular mode, when the sensor type indicates Non-Spinning LiDAR data or the operation mode indicates the prediction of the elevation angle component and the activation of residual decoding, for the coordinate values r, θ, φ after the polar coordinate transformation of the processing target node, intra prediction is performed based on the values of the parent node and ancestor nodes of the processing target node.

[0008] A second feature of the present invention is a point cloud decoding method including a step of switching the processing to be performed according to the sensor type or the operation mode in the Angular mode of Predictive coding, and a step of performing intra prediction for the coordinate values r, θ, φ after the polar coordinate transformation of the processing target node based on the values of the parent node and ancestor nodes of the processing target node when in the Angular mode and when the sensor type indicates Non-Spinning LiDAR data or the operation mode indicates the prediction of the elevation angle component and the activation of residual decoding.

[0009] A third feature of the present invention is a program that causes a computer to function as a point cloud decoder, wherein the point cloud decoder includes a tree synthesis unit, and the tree synthesis unit switches the processing performed in the Angular mode of predictive coding depending on the sensor type or operating mode, and in the Angular mode, when the sensor type indicates non-spinning LiDAR data or when the operating mode indicates the prediction of the elevation angle component and the activation of residual decoding, the program performs intra-prediction of the coordinate values r, θ, and φ of the target node after polar coordinate transformation based on the values of the parent node and ancestor node of the target node. [Effects of the Invention]

[0010] According to the present invention, it is possible to provide a point cloud decoding device, a point cloud decoding method, and a program that can improve the compression performance of encoding. [Brief explanation of the drawing]

[0011] [Figure 1] Figure 1 shows an example of the configuration of a point cloud processing system 10 according to one embodiment. [Figure 2] Figure 2 shows an example of the functional block of a point cloud decoding device 200 according to one embodiment. [Figure 3] Figure 3 shows an example of the configuration of encoded data (bitstream) received by the geometric information decoding unit 2010 of a point cloud decoding device 200 according to one embodiment. [Figure 4] Figure 4 shows an example of the syntax configuration of GPS2011. [Figure 5] Figure 5 is a flowchart showing an example of processing in the tree synthesis unit 2020 of the point cloud decoding device 200 according to one embodiment. [Figure 6] Figure 6 is a flowchart showing an example of the slice data decoding process in step S505. [Figure 7] Figure 7 illustrates an example of a coordinate transformation process, specifically a transformation to a polar coordinate system. [Figure 8A] Figure 8A illustrates an example of processing when the flag controlling the sensor type indicates Spinning LiDAR data. [Figure 8B] Figure 8B illustrates an example of processing when the flag controlling the sensor type indicates non-spinning LiDAR data. [Figure 9] Figure 9 is a diagram illustrating an example of the prediction method in step S604. [Figure 10] Figure 10 is a flowchart showing an example of the coordinate prediction process in step S604. [Figure 11] Figure 11 illustrates an example of the process for configuring the intra predictor in step S1003 described above, when the mode is Angular and the sensor type is Non-Spinning LiDAR, or when elevation angle component prediction and residual decoding are enabled. [Figure 12] Figure 12 illustrates an example of the process of selecting a predictor from a reference frame in step S1004, when the mode is Angular and the sensor type is Non-Spinning LiDAR, or when elevation angle component prediction and residual decoding are enabled. [Figure 13] Figure 13 illustrates an example of the process of selecting a predictor from a reference frame in step S1004, when the mode is Angular and the sensor type is Non-Spinning LiDAR, or when elevation angle component prediction and residual decoding are enabled. [Figure 14] Figure 14 shows an example of the configuration of encoded data (bitstream) received by the attribute information decoding unit 2060 of the point cloud decoding device 200 according to one embodiment. [Figure 15] Figure 15 shows an example of the syntax configuration of the APS2611 shown in Figure 14. [Figure 16] Figure 16 is a flowchart showing an example of the processing in the RAHT unit 2080. [Figure 17] Figure 17 is a flowchart showing an example of the process in step S28004. [Figure 18] Figure 18 is a flowchart showing an example of the process in step S28104. [Figure 19] Figure 19 is a flowchart showing an example of the intra-prediction process in step S28112. [Figure 20] Figure 20 shows the relationship between the node to be decrypted and its neighboring nodes in the higher hierarchy. [Figure 21] Figure 21 shows the relationship between the node to be decrypted and its neighboring nodes in the subnode hierarchy. [Figure 22] Figure 22 is a flowchart showing an example of the intra-prediction process in step S28112. [Figure 23] Figure 23 is a flowchart showing an example of the processing in the RAHT unit 2080. [Figure 24] Figure 24 shows an example of the interpretation prediction process in step S28111. [Figure 25] Figure 25 is a diagram showing an example of the functional blocks of the point cloud coding device 100 according to this embodiment. [Modes for carrying out the invention]

[0012] Embodiments of the present invention will be described below with reference to the drawings. Note that the components in the following embodiments can be replaced with existing components as appropriate, and various variations are possible, including combinations with other existing components. Therefore, the description of the following embodiments does not limit the content of the invention as described in the claims.

[0013] (First Embodiment) The point cloud processing system 10 according to the first embodiment of the present invention will be described below with reference to Figures 1 to 25. Figure 1 is a diagram showing the point cloud processing system 10 according to this embodiment.

[0014] As shown in Figure 1, the point cloud processing system 10 includes a point cloud encoding device 100 and a point cloud decoding device 200.

[0015] The point cloud encoding device 100 is configured to generate encoded data (bitstream) by encoding the input point cloud signal. The point cloud decoding device 200 is configured to generate an output point cloud signal by decoding the bitstream.

[0016] The input and output point cloud signals consist of positional and attribute information for each point within the point cloud. Attribute information includes, for example, the color and reflectance of each point.

[0017] Here, such a bitstream may be transmitted from the point cloud encoding device 100 to the point cloud decoding device 200 via a transmission line. Alternatively, the bitstream may be stored in a storage medium and then provided from the point cloud encoding device 100 to the point cloud decoding device 200.

[0018] (Point cloud decoder 200) The point cloud decoder 200 according to this embodiment will be described below with reference to Figure 2. Figure 2 is a diagram showing an example of the functional block of the point cloud decoder 200 according to this embodiment.

[0019] As shown in Figure 2, the point cloud decoding device 200 includes a geometric information decoding unit 2010, a tree synthesis unit 2020, an approximate surface synthesis unit 2030, a geometric information reconstruction unit 2040, an inverse coordinate transformation unit 2050, an attribute information decoding unit 2060, an inverse quantization unit 2070, an RAHT unit 2080, an LoD calculation unit 2090, an inverse lifting unit 2100, an inverse color transformation unit 2110, and a frame buffer 2120.

[0020] The geometric information decoding unit 2010 is configured to take the bitstream related to geometric information (geometric information bitstream) from the bitstream output from the point cloud coding device 100 as input and decode the syntax.

[0021] The decoding process is, for example, a context-adaptive binary arithmetic decoding process. Here, for example, the syntax includes control data (flags and parameters) to control the decoding process of the location information.

[0022] The tree synthesis unit 2020 is configured to generate tree information indicating which regions within the decoding target space contain points, by taking as input the control data decoded by the geometric information decoding unit 2010 and the occupancy code, which indicates which node in the tree (described later) contains the point cloud.

[0023] The system may also be configured to perform the decoding of the occupancy code within the tree synthesis unit 2020.

[0024] This process divides the space to be decoded into rectangular prisms, determines whether a point exists within each rectangular prism by referring to the occupancy code, divides the rectangular prism containing a point into multiple rectangular prisms, and recursively repeats the process of referring to the occupancy code, thereby generating tree information.

[0025] Here, interpretation, as described later, may be used when decoding such occupancy code.

[0026] In this embodiment, a method called "Octree" can be used, which recursively performs octree partitioning by always treating the cuboid as a cube, and a method called "QtBt" can be used, which performs quadtree partitioning and binary partitioning in addition to octree partitioning. Whether or not to use "QtBt" is transmitted as control data from the point cloud encoding device 100.

[0027] Alternatively, if the control data specifies that predictive geometry coding should be used, the tree synthesis unit 2020 is configured to decode the coordinates of each point based on an arbitrary tree configuration determined by the point cloud coding device 100.

[0028] The approximate surface synthesis unit 2030 is configured to generate approximate surface information using tree information generated by the tree synthesis unit 2020, and to decode the point cloud based on this approximate surface information.

[0029] Approximate surface information is used, for example, when decoding 3D point cloud data of an object, in cases where the point cloud is densely distributed on the object's surface. Instead of decoding each individual point cloud, the region where the point cloud exists is approximated and represented by a small plane.

[0030] Specifically, the approximate surface synthesis unit 2030 can generate approximate surface information and decode point clouds using a method called "Trisoup," for example. A specific example of the "Trisoup" process will be described later. Furthermore, this process can be omitted when decoding sparse point clouds acquired by Lidar or the like.

[0031] The geometric information reconstruction unit 2040 is configured to reconstruct the geometric information (position information in the coordinate system assumed by the decoding process) of each point in the point cloud data to be decoded, based on the tree information generated by the tree synthesis unit 2020 and the approximate surface information generated by the approximate surface synthesis unit 2030.

[0032] The inverse coordinate transformation unit 2050 is configured to take the geometric information reconstructed by the geometric information reconstruction unit 2040 as input, transform it from the coordinate system assumed by the decoding process to the coordinate system of the output point cloud signal, and output position information.

[0033] The frame buffer 2120 is configured to take the geometric information reconstructed by the geometric information reconstruction unit 2040 as input and store it as a reference frame. The stored reference frame is read from the frame buffer 2130 and used as a reference frame when the tree synthesis unit 2020 performs interpretation of frames that are different in time.

[0034] Here, the choice of which time reference frame to use for each frame may be determined, for example, based on control data transmitted as a bitstream from the point cloud encoding device 100.

[0035] The attribute information decoding unit 2060 is configured to take the bitstream related to attribute information (attribute information bitstream) from the bitstream output from the point cloud coding device 100 as input and decode the syntax.

[0036] The decoding process is, for example, a context-adaptive binary arithmetic decoding process. Here, for example, the syntax includes control data (flags and parameters) to control the decoding process of attribute information.

[0037] Furthermore, the attribute information decoding unit 2060 is configured to decode quantized residual information from the decoded syntax.

[0038] The inverse quantization unit 2070 is configured to perform inverse quantization processing based on the quantized residual information decoded by the attribute information decoding unit 2060 and the quantization parameter, which is one of the control data decoded by the attribute information decoding unit 2060, in order to generate inverse quantized residual information.

[0039] The inversely quantized residual information is output to either the RAHT unit 2080 or the LoD calculation unit 2090, depending on the characteristics of the point cloud to be decoded. Which unit it is output to is specified by the control data decoded by the attribute information decoding unit 2060.

[0040] The RAHT unit 2080 is configured to take the inversely quantized residual information generated by the inverse quantization unit 2070 and the geometric information reconstruction unit 2040 as input, and decode the attribute information of each point using a type of Haar transform called RAHT (Region Adaptive Hierarchical Transform) (inverse Haar transform in the decoding process). As a specific example of the RAHT processing, the method described in Non-Patent Document 1 can be used.

[0041] The LoD calculation unit 2090 is configured to take geometric information generated by the geometric information reconstruction unit 2040 as input and generate LoD (Level of Detail).

[0042] LoD (Level of Data) is information used to define reference relationships (referring points and referenced points) for implementing predictive coding, which involves predicting the attribute information of one point from the attribute information of another point and then encoding or decoding the prediction residual.

[0043] In other words, LoD is information that defines a hierarchical structure in which each point included in geometric information is classified into multiple levels, and the attributes of points belonging to lower levels are encoded or decoded using the attribute information of points belonging to higher levels.

[0044] As for a specific method for determining the LoD, for example, the method described in Non-Patent Document 1 above may be used.

[0045] The inverse lifting unit 2100 is configured to decode the attribute information of each point based on the hierarchical structure defined by the LoD, using the LoD generated by the LoD calculation unit 2090 and the inversely quantized residual information generated by the inverse quantization unit 2070. As a specific processing method for inverse lifting, for example, the method described in Non-Patent Document 1 above can be used.

[0046] The reverse color conversion unit 2110 is configured to perform reverse color conversion on the attribute information output from the RAHT unit 2080 or the reverse lifting unit 2100 if the attribute information to be decoded is color information and color conversion has been performed on the point cloud encoding device 100 side. Whether or not such reverse color conversion processing is performed is determined by the control data decoded by the attribute information decoding unit 2060.

[0047] The point cloud decoder 200 is configured to decode and output attribute information for each point in the point cloud through the above processing.

[0048] (Geometric information decoding unit 2010) The control data decoded by the geometric information decoding unit 2010 will be explained below using Figures 3 and 4.

[0049] Figure 3 shows an example of the configuration of encoded data (bitstream) received by the geometric information decoding unit 2010.

[0050] Firstly, the bitstream may include GPS2011. GPS2011, also known as the geometry parameter set, is a set of control data related to the decoding of geometric information. Specific examples will be discussed later. Each GPS2011 includes at least GPS ID information to identify each individual GPS2011 if multiple GPS2011s exist.

[0051] Secondly, the bitstream may include GSH2012A / 2012B. GSH2012A / 2012B, also called geometry slice headers or geometry data unit headers, are sets of control data corresponding to slices, which will be described later. Hereafter, we will use the term "slice," but you can also read "slice" as "data unit." Specific examples will be given later. Each GSH2012A / 2012B includes at least GPS ID information to specify the GPS2011 corresponding to each GSH2012A / 2012B.

[0052] Thirdly, the bitstream may include slice data 2013A / 2013B after GSH2012A / 2012B. Slice data 2013A / 2013B contains data that encodes geometric information. An example of slice data 2013A / 2013B is encoded data using occupancy code or predicative coding, which will be discussed later.

[0053] As described above, the bitstream is configured so that each slice data 2013A / 2013B corresponds to one GSH2012A / 2012B and one GPS2011.

[0054] As described above, GSH2012A / 2012B allows you to specify which GPS2011 to refer to using GPS ID information, so a common GPS2011 can be used for multiple slice data 2013A / 2013B.

[0055] In other words, GPS2011 does not necessarily need to be transmitted for each slice. For example, as shown in Figure 3, the bitstream can be configured so that GPS2011 is not encoded immediately before GSH2012B and slice data 2013B.

[0056] Note that the configuration in Figure 3 is merely an example. As long as each slice data 2013A / 2013B corresponds to GSH2012A / 2012B and GPS2011, other elements besides those mentioned above may be added as components of the bitstream.

[0057] For example, as shown in Figure 3, the bitstream may include a sequence parameter set (SPS) 2001. Similarly, it may be formatted to a different configuration than that shown in Figure 3 during transmission. Furthermore, it may be combined with the bitstream decoded by the attribute information decoding unit 2060 described later and transmitted as a single bitstream.

[0058] Figure 4 shows an example of the GPS2011 syntax configuration.

[0059] Please note that the syntax names described below are merely examples. If the functionality of the syntax described below is the same, the syntax names may differ.

[0060] GPS2011 may include GPS ID information (gps_geom_parameter_set_id) to identify each GPS2011.

[0061] In Figure 4, the Descriptor column indicates how each syntax is encoded. ue(v) means it is an unsigned zero-order exponential Golomb code, and u(1) means it is a 1-bit flag.

[0062] GPS2011 may include a flag (geom_tree_type) for controlling the tree type in the tree synthesis unit 2020.

[0063] For example, if the value of geom_tree_type is "1", it may be defined to use predictive geometry coding, and if the value of geom_tree_type is "0", it may be defined to use Octree.

[0064] GPS2011 may include a flag (interprediction_enabled_flag) that controls whether or not interprediction is performed in the tree synthesis unit 2020.

[0065] For example, you could define that if the value of `interprediction_enabled_flag` is "0", then interprediction will not be performed, and if the value of `interprediction_enabled_flag` is "1", then interprediction will be performed.

[0066] Note that the `interprediction_enabled_flag` may be included in SPS2001 instead of GPS2011.

[0067] GPS2011 may include a flag (geom_tree_type) for controlling the tree type in the tree synthesis unit 2020. For example, a value of "1" for geom_tree_type may define the use of predicative coding, while a value of "0" for geom_tree_type may define the disuse of predicative coding.

[0068] Note that geom_tree_type may be included in SPS2001 instead of GPS2011.

[0069] GPS2011 may include a flag (geom_angular_enabled) to control whether or not to process in Angular mode in the tree synthesis unit 2020.

[0070] For example, if the value of geom_angular_enabled is "1", it may be defined that predictive coding will be performed in Angular mode, and if the value of geom_angular_enabled is "0", it may be defined that predictive coding will not be performed in Angular mode.

[0071] Note that geom_angular_enabled may be included in SPS2001 instead of GPS2011.

[0072] GPS2011 may include a flag (sensor_type) in the tree composition unit 2020 for controlling the sensor type in Angular mode.

[0073] For example, it might be defined that if the value of sensor_type is "1", it will be processed as spinning LiDAR data, and if the value of sensor_type is "0", it will be processed as non-spinning LiDAR data.

[0074] Note that `sensor_type` may be included in `SPS2001` instead of `GPS2011`. Alternatively, instead of `sensor_type`, a flag used to control the switching of operating modes in Angular mode, as described later, may be used.

[0075] GPS2011 may include a flag (global_motion_enabled_flag) that controls whether or not to perform global motion compensation for interpretation in the tree synthesis unit 2020.

[0076] For example, you could define that global motion compensation is disabled if the value of global_motion_enabled_flag is "0", and that global motion compensation is enabled if the value of global_motion_enabled_flag is "1".

[0077] When performing global motion compensation, each slice data may include a global motion vector.

[0078] Note that global_motion_enabled_flag may be included in SPS2001 instead of GPS2011.

[0079] (Tree Synthesis Department 2020) The processing of the tree synthesis unit 2020 will be explained below using Figures 5 to 13. Figure 5 is a flowchart showing an example of the processing in the tree synthesis unit 2020. The following explanation will describe an example of synthesizing a tree using "Predictive geometry coding".

[0080] Note that terms such as "Predictive geometry," "Predictive geometry coding," and "Predictive tree" are sometimes used instead of "Predictive geometry coding."

[0081] As shown in Figure 5, in step S501, the tree synthesis unit 2020 determines whether to use interprediction based on the value of interprediction_enabled_flag.

[0082] If the tree synthesis unit 2020 determines that interpretation should be used, it proceeds to step S502; otherwise, it proceeds to step S505.

[0083] In step S502, the tree composition unit 2020 obtains a reference frame from the frame buffer 2120. The frame buffer 2120 may already store one previously decoded frame, and the addition of decoded frames to the frame buffer 2020 may be performed one frame at a time or a predetermined number of frames after the decoding is completed. After obtaining the reference frame, the tree composition unit 2020 proceeds to step S503.

[0084] In step S503, the tree synthesis unit 2020 determines whether to perform global motion compensation based on the global_motion_enabled_flag.

[0085] If the tree synthesis unit 2020 determines that global motion compensation should be performed, it proceeds to step S504; if it determines that global motion compensation should not be performed, it proceeds to step S505.

[0086] In step S504, the tree synthesis unit 2020 performs global motion compensation on the reference frame acquired in step S502.

[0087] Here, global motion compensation is a process that corrects the global positional shift for each frame, and applies rotation and translation based on the global motion vector decoded by the geometric information decoding unit 2010 to all or a specified range of points in the reference frame.

[0088] After performing global motion compensation, the tree synthesis unit 2020 proceeds to step S505.

[0089] In step S505, the tree synthesis unit 2020 decodes the slice data. The specific processing in step S505 will be described later. After decoding the slice data, the tree synthesis unit 2020 proceeds to step S506.

[0090] In step S506, the tree synthesis unit 2020 terminates its processing. Note that the processing in steps S503 and S504, i.e., the determination and execution of global motion compensation, may be performed during the slice data decoding process in step S505.

[0091] Figure 6 is a flowchart showing an example of the slice data decoding process in step S505.

[0092] As shown in Figure 6, in step S601, the tree synthesis unit 2020 constructs a prediction tree corresponding to the slice data.

[0093] The slice data may contain a list of the number of child nodes of each node in the prediction tree, sorted in depth-first order. One way to construct the prediction tree is to start from the root node and add the number of child nodes specified in the above list to each node, in depth-first order.

[0094] After completing the construction of the prediction tree, the tree synthesis unit 2020 proceeds to step S602.

[0095] In step S602, the tree synthesis unit 2020 determines whether processing of all nodes in the prediction tree has been completed.

[0096] If the tree synthesis unit 2020 determines that processing of all nodes in the prediction tree is complete, it proceeds to step S607; if it determines that processing of all nodes in the prediction tree is not complete, it proceeds to step S603.

[0097] In step S603, the tree synthesis unit 2020 selects the node to be processed from the prediction tree.

[0098] The tree synthesis unit 2020 may select the node that follows the previously processed node in depth-first order as the node to be processed.

[0099] After the tree synthesis unit 2020 has finished selecting the nodes to be processed, it proceeds to step S604.

[0100] In step S604, the tree synthesis unit 2020 predicts the coordinates of the points corresponding to the nodes to be processed. The specific method for predicting these coordinates will be described later.

[0101] After completing the prediction, the tree synthesis unit 2020 proceeds to step S605.

[0102] In step S605, the tree synthesis unit 2020 decodes the predicted residuals of the coordinates of the points corresponding to the nodes to be processed. The slice data may include the predicted residuals of the coordinates of the points corresponding to each node.

[0103] After the tree synthesis unit 2020 has finished decoding the predicted residuals of the target node, it proceeds to step S606.

[0104] In step S606, the tree synthesis unit 2020 reconstructs the coordinates of the points corresponding to the nodes to be processed. The tree synthesis unit 2020 may determine the coordinates of the points by summing the coordinates predicted in step S604 and the residuals decoded in step S605.

[0105] After the coordinate reconstruction is complete, the tree synthesis unit 2020 returns to step 602.

[0106] In step S607, the tree synthesis unit 2020 terminates the process in step S505.

[0107] Here, the order of steps S604 and S605 may be reversed.

[0108] If Angular mode is used, the tree composition unit 2020 may take into consideration that the coordinate values handled in steps S604 to S606 are the values after coordinate transformation, and step S606 may include the processing of the inverse transformation.

[0109] Figure 7 illustrates an example of a coordinate transformation process, specifically a transformation to a polar coordinate system. As shown in Figure 7, the coordinate values of each point are expressed as radius r, elevation angle θ, and azimuth angle φ.

[0110] Furthermore, the elevation angle θ and azimuth angle φ among the coordinate values after polar coordinate transformation may be subjected to further appropriate transformations. For example, the two values of elevation angle θ and azimuth angle φ may be treated as two-dimensional Cartesian coordinates, and a further two-dimensional polar coordinate transformation may be applied.

[0111] Figure 8A illustrates an example of processing when the flag controlling the sensor type indicates Spinning LiDAR data, and Figure 8B illustrates an example of processing when the flag controlling the sensor type indicates Non-spinning LiDAR data.

[0112] As shown in Figure 8A, when the sensor type is Spinning LiDAR, the tree synthesis unit 2020 processes only the radius r and azimuth angle φ from the coordinate values in steps S604 and S605, and in step S606, it reconstructs the radius r and azimuth angle φ, while setting the elevation angle θ from a predefined fixed value.

[0113] For example, the tree synthesis unit 2020 may set the elevation angle θ to the elevation angle of one of the multiple lasers in the LiDAR, and the laser ID indicating which laser it is may be included in the slice data.

[0114] On the other hand, if the sensor type is Non-Spinning LiDAR, the tree synthesis unit 2020 may perform prediction, residual decoding, and coordinate reconstruction processing for all of the coordinate values r, θ, and φ in steps S604 to S606.

[0115] Furthermore, the tree synthesis unit 2020 may switch between the two operations described above based on the value of a flag that controls whether or not to enable elevation angle component prediction and residual decoding, rather than a flag that controls the sensor type.

[0116] Figure 9 is a diagram illustrating an example of the prediction method in step S604.

[0117] In step S604, the tree synthesis unit 2020 may perform a prediction using only one of the consecutive N points, as shown in Figure 9. The tree synthesis unit 2020 may then share the predicted value of that one point as the predicted value for the remaining N-1 points.

[0118] Figure 10 is a flowchart showing an example of the coordinate prediction process in step S604.

[0119] As shown in Figure 10, in step S1001, the tree synthesis unit 2020 decodes the predictor flag.

[0120] Here, the slice data may include flags indicating the predictor to be used for each node. For example, the slice data may include flags similar to those described in Non-Patent Documents 1 and 2, such as a flag indicating whether it is an inter-predictor or an intra-predictor, or an index for the inter-predictor. The slice data may also include other flags described later.

[0121] After decoding the predictor flag, the tree synthesis unit 2020 proceeds to step S1002.

[0122] In step S1002, the tree synthesis unit 2020 determines whether to use the interpreter based on the predictor flag decoded in step S1001.

[0123] If the tree synthesis unit 2020 determines that the inter predictor should be used, it proceeds to step S1004; otherwise, it proceeds to step S1003.

[0124] In step S1003, the tree synthesis unit 2020 performs intra-prediction of the coordinates of the nodes to be processed.

[0125] When performing intra-prediction, the tree synthesis unit 2020 configures a predictor based on the coordinates of the parent node or ancestor node (for example, the parent node's parent node) of the node to be processed, and predicts the coordinates of the node to be processed.

[0126] If the tree synthesis unit 2020 is in Angular mode and the sensor type is Spinning LiDAR, or if elevation angle component prediction and residual decoding are disabled, the method for configuring the intra predictor may be the method described in Non-Patent Documents 1 and 2.

[0127] The tree synthesis unit 2020 is in Angular mode, and if the sensor type is Non-Spinning LiDAR or if elevation angle component prediction and residual decoding are enabled, the intra-predictor configuration method described later may be used.

[0128] After completing intra-prediction, the tree synthesis unit 2020 proceeds to step S1005.

[0129] In step S1004, the tree synthesis unit 2020 performs inter-coordinate prediction for the nodes to be processed.

[0130] When performing interpretation, the tree synthesis unit 2020 selects a node corresponding to the node to be processed from the reference frame as a predictor, and uses the coordinates of the selected predictor as the predicted value of the coordinates of the node to be processed.

[0131] The tree synthesis unit 2020 is in Angular mode, and if the sensor type is Spinning LiDAR or if elevation angle component prediction and residual decoding are disabled, a predictor may be selected from the reference frame using the same method as in Non-Patent Document 2.

[0132] Furthermore, when the tree synthesis unit 2020 is in Angular mode and the sensor type is Non-Spinning LiDAR, or when elevation angle component prediction and residual decoding are enabled, it selects a predictor from the reference frame, as described later.

[0133] After the inter-prediction is complete, the tree synthesis unit 2020 proceeds to step S1005.

[0134] In step S1005, the tree synthesis unit 2020 terminates the process in step S604.

[0135] Figure 11 illustrates an example of the process for configuring the intra predictor in step S1003 described above, when the mode is Angular and the sensor type is Non-Spinning LiDAR, or when elevation angle component prediction and residual decoding are enabled. The intra predictor predicts the coordinate values q=(r, θ, φ) of the node to be processed. The intra predictor may predict the coordinate value q=0, assuming no prediction is made. The intra predictor may use the coordinate value qk-1 of the parent node to predict q=qk-1. The intra predictor may use the coordinate values qk-1 of the parent node and qk-2 of the grandfather node, so q = qk-1(qk-1 - qk-2). • In a prediction tree, if Δq is the average absolute distance between adjacent nodes in the most recently processed branch, the intra predictor may predict q = qk-1 + Δq using the parent node's coordinate value qk-1. • In the prediction tree, the intra predictor may select the node most similar to the parent node of the node being processed from among the nodes belonging to the branch that was processed immediately before, and let Δq' be the absolute distance between the coordinate values of that similar node and its child node. Then, using the coordinate value qk-1 of the parent node, the intra predictor may predict q = qk-1 + Δq'.

[0136] The tree synthesis unit 2020 may decode a flag for each node indicating which of these intra predictors to use. The flag may be included in the slice data.

[0137] Figures 12 and 13 illustrate an example of the process of selecting a predictor from a reference frame in step S1004 described above, when the mode is Angular and the sensor type is Non-Spinning LiDAR, or when elevation angle component prediction and residual decoding are enabled.

[0138] In the example shown in Figure 12, the tree synthesis unit 2020 searches for the node in the reference frame whose coordinate values θ and φ are closest to the parent node of the node to be processed, and selects a predictor from its child and grandchild nodes.

[0139] In the example shown in Figure 13, the tree synthesis unit 2020 assumes that each node has a unique node ID within the frame and selects a predictor from the reference frame based on the node ID of the node to be processed.

[0140] The tree synthesis unit 2020 is in Angular mode, and when the sensor type is Non-Spinning LiDAR or when elevation angle component prediction and residual decoding are enabled, the prediction tree is configured so that N branches are connected in series from the root, and each node may hold a branch ID indicating the branch to which it belongs.

[0141] The tree synthesis unit 2020 may limit the range in which it searches for a predictor from the reference frame based on the branch ID of the parent node of the node to be processed.

[0142] (Attribute information decoding unit 2060) The control data decoded by the attribute information decoding unit 2060 will be explained below using Figures 14 to 15.

[0143] Figure 14 shows an example of the configuration of encoded data (bitstream) received by the attribute information decoding unit 2060.

[0144] Figure 14 shows an example of the syntax configuration of the APS2611.

[0145] Please note that the syntax names described below are merely examples. If the functionality of the syntax described below is the same, the syntax names may differ.

[0146] The APS2611 may include APS ID information (aps_geom_parameter_set_id) to identify each APS2611.

[0147] The Descriptor column in Figure 15 indicates how each syntax is encoded. ue(v) means it is an unsigned zero-order exponential Golomb code, and u(1) means it is a 1-bit flag.

[0148] The APS2611 may include a flag (attr_coding_type) in the inverse quantization unit 2070 to control whether the inverse quantized residual information is output to the RAHT unit 2080 or the LoD calculation unit 2090.

[0149] For example, it may be defined that if the value of attr_coding_type is "1", the output is sent to the LoD calculation unit 2090, and if the value of attr_coding_type is "0", the output is sent to the RAHT unit 2080.

[0150] The APS2611 may include a flag (raht_prediction_enabled) in the RAHT unit 2080 to control whether or not to perform attribute information prediction.

[0151] For example, if the value of raht_prediction_enabled is "1", attribute information prediction may be performed, and if the value of raht_prediction_enabled is "0", attribute information prediction may not be performed.

[0152] The APS2611 may include a flag (raht_subnode_prediction_enable_flag) in the RAHT unit 2080 that controls whether or not to use subnodes for predicting attribute information.

[0153] For example, if the value of raht_subnode_prediction_enable_flag is "1", it may be defined that subnodes are used for predicting attribute information, and if the value of raht_subnode_prediction_enable_flag is "0", it may be defined that subnodes are not used for predicting attribute information.

[0154] The APS2611 may include weight parameters (raht_prediction_weights) for performing intra prediction of attribute information in the RAHT unit 2080.

[0155] For example, the values of raht_prediction_weights may be defined according to the type of neighboring node used for intra-prediction.

[0156] The APS2611 may include a flag (raht_inter_prediction_enabled) in the RAHT unit 2080 to control whether or not to perform interprediction of attribute information.

[0157] For example, if the value of raht_inter_prediction_enabled is "1", attribute information prediction may be performed, and if the value of raht_inter_prediction_enabled is "0", attribute information prediction may not be performed.

[0158] The APS2611 may include a value (raht_inter_prediction_depth_minus1) in the RAHT section 2080 that indicates the depth at which inter-prediction of attribute information is enabled.

[0159] For example, if the value of raht_inter_prediction_depth_minus1 is "N-1", interprediction may be enabled up to the top N levels of the Octree structure.

[0160] (RAHT section 2080) An example of the processing of the RAHT unit 2080 will be explained using Figures 16 to 24.

[0161] Figure 16 is a flowchart showing an example of the processing in the RAHT unit 2080.

[0162] As shown in Figure 16, in step S28001, the RAHT unit 2080 recursively divides the nodes into octave trees using a technique called Octree until they reach a predetermined size. After this division is complete, the operation proceeds to step S28002.

[0163] In step S28002, the RAHT unit 2080 counts the total number of points belonging to the lower hierarchy of each node divided by Octree.

[0164] Specifically, the RAHT unit 2080 sequentially scans the nodes in a certain hierarchy and records the number of points belonging to each node. Next, the RAHT unit 2080 calculates the number of points belonging to each node by summing the number of points recorded in the child nodes of each node in the node one level up.

[0165] The RAHT unit 2080 repeats the above scan sequentially from the lowest layer to the highest layer. The total number of points acquired is used as the weight for the inverse RAHT transform in step S28005 described below. After this calculation is completed, the operation proceeds to step S28003.

[0166] In step S28003, the RAHT unit 2080 decodes the DC coefficient of the node belonging to the highest level of the Octtree. Alternatively, the RAHT unit 2080 may predict the DC coefficient using intra-prediction, decode the predicted residuals of the DC coefficient, and sum them up to calculate the DC coefficient.

[0167] After decoding the DC coefficients, the RAHT unit 2080 calculates the total number of points belonging to the root node obtained in step S28002, w, using the following formula: root And the decoded DC coefficient DC root Using the attribute value A of the root node, root Calculate.

[0168]

number

[0169] In step S28004, the RAHT unit 2080 determines whether the decoding of attribute information for all nodes included in the hierarchy has been completed.

[0170] If the process is not complete, the operation proceeds to step S28005; otherwise, the operation proceeds to step S28007.

[0171] In step S28005, the RAHT unit 2080 decodes the AC coefficients. This will be described in detail later. After this decoding is complete, the operation proceeds to step S28006.

[0172] In step S28006, the RAHT unit 2080 calculates attribute values using the inverse transform of RAHT, based on the total number of points belonging to the lower hierarchy of each aggregated node, the decoded AC coefficient, and the DC coefficient calculated from the higher hierarchy nodes using a method described later.

[0173] Here, the inverse transformation of RAHT is performed in units of 8 nodes, 2x2x2, which are divided into octave trees by Octree.

[0174] Specifically, attribute values A1, A2, ... A k This is the DC coefficient DC of a node that holds k subnodes, and the AC coefficients AC1, AC2, ... AC k-1 And the total number of points belonging to the lower hierarchy of each subnode w = w1, w2, ... w k Using this, it can be calculated by the following equation (1).

[0175]

number

[0176] This conversion process is intended to be performed repeatedly, starting from the higher-level nodes and progressing to the lower-level nodes.

[0177]

number

[0178] In step S28007, the RAHT unit 2080 determines whether processing of all layers has been completed. If not, this operation moves the processing target layer down one level and proceeds to step S28004. If completed, this operation proceeds to step S28008 and terminates processing.

[0179] Figure 17 is a flowchart showing an example of the process in step S28005.

[0180] As shown in Figure 17, in step S28101, the RAHT unit 2080 determines whether to predict the AC coefficient. When making this determination, the RAHT unit 2080 may refer to raht_prediction_enabled and use its value.

[0181] The RAHT unit 2080 may decode a flag indicating whether or not to predict the AC coefficient at the currently processed node and use the value of such flag.

[0182] Such flags are, The coefficients may be decoded separately, The data may be decrypted node by node or hierarchy by hierarchy. The flag may be decrypted only if the value of raht_prediction_enabled is "1", indicating that prediction is enabled. The flag may be included in the slice data.

[0183] If the AC coefficient is not predicted as a result of the determination, this operation proceeds to step S28102. If the AC coefficient is predicted, this operation proceeds to steps S28103 and S28104.

[0184] In step S28102, the RAHT unit 2080 decodes the AC coefficients. After this decoding is complete, the operation proceeds to step S28106 and terminates.

[0185] In step S28103, the RAHT2080 decodes the AC coefficient residual. After this decoding is complete, the operation proceeds to step S28105.

[0186] In step S28104, the RAHT unit 2080 predicts the AC coefficient. Inter prediction or intra prediction may be used to predict the AC coefficient.

[0187] The RAHT unit 2080 may first predict attribute values and then calculate predicted AC coefficients using RAHT. This will be explained in detail later. After the AC coefficient prediction is complete, the operation proceeds to step S28105.

[0188] In step S28105, the RAHT unit 2080 adds the residual of the decoded AC coefficient to the predicted AC coefficient and reconstructs the AC coefficient. After this reconstruction is complete, the operation proceeds to step S28106 and terminates.

[0189] Figure 18 is a flowchart showing an example of the process in step S28104.

[0190] As shown in Figure 18, in step S28107, the RAHT unit 2080 determines whether inter-prediction is enabled. The RAHT unit 2080 may refer to raht_inter_prediction_enabled and use its value for the determination. If the determination shows that inter-prediction is enabled, the operation proceeds to step S28109; if inter-prediction is disabled, the operation proceeds to step S28112.

[0191] In step S28109, the RAHT unit 2080 determines whether the depth of the hierarchy containing the node to be processed is less than or equal to a threshold. The RAHT unit 2080 may refer to raht_inter_prediction_depth_minus1 as the threshold and use that value.

[0192] If the depth is less than or equal to the threshold, the operation proceeds to step S28110; if the depth is greater than the threshold, the operation proceeds to step S28112.

[0193] In step S28110, the RAHT unit 2080 determines whether or not to predict the AC coefficient of the node to be processed.

[0194] The RAHT unit 2080 may also check whether inter-prediction is feasible, perform inter-prediction if feasible, and refrain from performing inter-prediction if it is not feasible. Details will be described later.

[0195] The RAHT unit 2080 may decode a flag indicating whether or not to interpret the AC coefficient of the target node for the determination, and use the value of such flag.

[0196] Such a flag may be used to determine the cost of intra-prediction and inter-prediction during encoding and indicate the flag with the smaller cost. The cost may be a value calculated by considering one or more combinations of the magnitude of the residual, the distortion of the reconstructed value, and the estimated code size.

[0197] Such flags may be decoded coefficient by coefficient, node by node, multiple nodes by node, or hierarchy by hierarchy.

[0198] Such a flag may be decoded and the determination made only if it is determined that interpretation is feasible.

[0199] Such flags may be included in the slice data.

[0200] Such flags may be encoded using arithmetic coding, in which case the context referenced in the arithmetic coding may be one contest per frame or one context per hierarchy.

[0201] In step S28111, the RAHT unit 2080 performs inter-prediction of the AC coefficients of the processing target node. The details will be described later.

[0202] In step S28112, the RAHT unit 2080 performs intra-prediction of the AC coefficient of the node to be processed. The details will be described later.

[0203] In step S28113, the process in step S28104 is terminated. Note that the conditional branch in step S28109 may be omitted.

[0204] In the interpretation process in step S28111, processing equivalent to the intrapretation in step S28112 may also be performed, and the results of the interpretation and intrapretation may be combined for prediction. Details will be described later.

[0205] Figure 19 is a flowchart showing an example of the intra-prediction process in step S28112.

[0206] As shown in Figure 19, in step S28201, the RAHT unit 2080 determines whether or not to perform intraprediction using adjacent nodes in the subnode hierarchy. The RAHT unit 2080 may refer to raht_subnode_prediction_enable_flag and use its value for this determination.

[0207] The RAHT unit 2080 performs intra-prediction using only adjacent nodes in the higher hierarchy if adjacent nodes in the subnode hierarchy are not used.

[0208] Here, the adjacent nodes in the higher hierarchy are the 6 nodes adjacent to the parent node of the node to be decoded that share a face, the 12 nodes adjacent to it by an edge, and the 3 nodes adjacent to the parent node itself that share a face, the 3 nodes adjacent to it by an edge, and the 7 nodes of the parent node itself, out of a total of 19 nodes.

[0209] Figure 20 shows the relationship between the node to be decrypted and its neighboring nodes in the higher hierarchy.

[0210] When the RAHT unit 2080 uses adjacent nodes in the subnode hierarchy, it performs intra-prediction using adjacent nodes in the higher hierarchy and adjacent nodes in the subnode hierarchy.

[0211] Here, the adjacent node of the sub-node hierarchy is a node that has been decoded among the sub-nodes of the adjacent node of the upper hierarchy, where the node to be decoded is adjacent to a face or an edge.

[0212] FIG. 21 is a diagram showing the relationship between the node to be decoded and the adjacent node of the sub-node hierarchy.

[0213] As a result of the determination, if intra prediction is performed without using the adjacent node of the sub-node hierarchy, this operation proceeds to step S28202, and if intra prediction is performed using the adjacent node of the sub-node hierarchy, this operation proceeds to step S28204.

[0214] In step S28202, the RAHT unit 2080 acquires the attribute value of the adjacent node of the upper hierarchy. After acquiring the attribute value of the adjacent node of the upper hierarchy, this operation proceeds to step S28203.

[0215] In step S28203, the RAHT unit 2080 predicts the attribute value of the node to be decoded.

[0216] The RAHT unit 2080 may predict the attribute value attr using the following formula with the acquired attribute values attr of the k adjacent nodes of the upper hierarchy i and the weight w corresponding to the type of adjacent node i i Here, as the weight w

[0217]

Equation

[0218] After the prediction of such an attribute value is completed, this operation proceeds to step S28207.

[0219] In step S28204, the RAHT unit 2080 obtains the attribute values of the adjacent nodes in the higher hierarchy.

[0220] Here, the targets for obtaining attribute values are the adjacent nodes in the higher hierarchy whose subnodes have not been decoded, or the adjacent nodes in the higher hierarchy whose subnodes have been decoded, but which do not have any subnodes adjacent to the node to be decoded by a face or edge.

[0221] Once the acquisition of these attribute values is complete, this operation proceeds to step S28205.

[0222] In step S28205, the RAHT unit 2080 obtains the attribute values of adjacent nodes in the subnode hierarchy. After obtaining the attribute values of adjacent nodes in the subnode hierarchy, the operation proceeds to step S28206.

[0223] In step S28206, the RAHT unit 2080 predicts the attribute values of the node to be decrypted.

[0224] The RAHT unit 2080 retrieves the attribute values (attr) of the k adjacent nodes in the higher hierarchy and the adjacent nodes in the subnode hierarchy. i And a weight w depending on the type i of the adjacent node. i Using this, the attribute value attr can be predicted using the following formula.

[0225]

number

[0226] After the prediction of these attribute values is complete, this operation proceeds to step S28207.

[0227] In step S28207, the RAHT unit 2080 converts the predicted attribute values into AC coefficients. The AC coefficients are generated by performing RAHT on the predicted attribute values. For example, the RAHT unit 2080 may use the method described in Non-Patent Document 1 as such a conversion method.

[0228] The above describes an example in which the RAHT unit 2080 directly uses the attribute values predicted in step S28206 to convert the AC coefficients in step S28207. However, the RAHT unit 2080 may also perform smoothing of the predicted attribute values before converting the AC coefficients.

[0229] For example, as shown in Figure 22, the RAHT unit 2080 may, after predicting the attribute values, determine in step S1301 whether or not to perform smoothing.

[0230] The RAHT unit 2080 may refer to raht_smoothing_enable_flag and use its value in making such a determination.

[0231] If smoothing is performed, this operation proceeds to step S1302. If smoothing is not performed, this operation proceeds to step S28207.

[0232] In step S1302, the RAHT unit 2080 may perform attribute value smoothing.

[0233] For example, the RAHT unit 2080 processes the smoothed attribute value Attr of the node to be decoded. smoothing Regarding this, the attribute value Attr predicted by subnode i within the same parent node as the node to be decrypted. i and weight α i Alternatively, it can be obtained by calculating the weighted average using the following method.

[0234]

number

[0235] Furthermore, the RAHT section 2080 has weight α i You can use a hardcoded value, or you can refer to raht_smoothing_weighted_average_weights and use that value.

[0236] Furthermore, the RAHT unit 2080, for example, calculates the smoothed attribute value Attr of the node to be decoded. smoothing Regarding this, the predicted value Attr0 of the node to be decrypted, and the attribute value Attr predicted by subnode i, which is a subnode within the same parent node as the node to be decrypted, other than the node to be decrypted. i , weight β i The same can be obtained by performing clipping using the threshold Thr, as shown below.

[0237]

number

[0238] The function Clip3, which performs clipping,

[0239]

number

[0240] Here, for the target sub-node i, the node to be decoded by the RAHT unit 2080 may be a node adjacent to the target node in the plane, or a node adjacent to the target node in the plane and a node adjacent to the target node in the edge, or all sub-nodes within the same parent node.

[0241] Also, the RAHT unit 2080 may use a hard-coded value as the weight β i or refer to raht_smoothing_clipping_weights and use the value therein.

[0242] Also, the RAHT unit 2080 may use a hard-coded value as the threshold Thr, or refer to raht_smoothing_clipping_threshold and use the value therein.

[0243] In the above, an example in which the RAHT unit 2080 decodes the AC coefficients of both the color difference signal and the luminance signal has been described. However, the RAHT unit 2080 may skip the decoding of the AC coefficients of the color difference signal only for the bottom layer of the Octree.

[0244] For example, as shown in FIG. 23, in step S1401, the RAHT unit 2080 may determine whether to skip the decoding of the AC coefficients of the color difference signal only for the bottom layer of the Octree.

[0245] If skipping, this operation proceeds to step S1402. If not skipping, this operation proceeds to step S28004.

[0246] In step S1402, the RAHT unit 2080 determines whether the node to be decoded is the bottom layer of the Octree.

[0247] If it is the bottom layer, this operation proceeds to step S1403. If it is not the bottom layer, this operation proceeds to step S28004.

[0248] [[ID=3

[0249] The RAHT unit 2080 performs the same processing as in step S28004 for decoding AC coefficients other than the color difference signal, sets the AC coefficient of the color difference signal to 0, and then calculates the attribute value in the subsequent step S28005.

[0250] After decoding of AC coefficients other than the color difference signal is complete, this operation proceeds to step S28006.

[0251] Figure 24 shows an example of the interpretation prediction process in step S28111.

[0252] The RAHT unit 2080 predicts the AC coefficient of the node to be processed using information from the reference node, which is the corresponding node in the reference frame. Here, the information from the reference node may be its attribute values or AC coefficient. The reference frame may refer to another decoded frame, and its information may be contained in the previous frame buffer 2120.

[0253] The RAHT unit 2080 may apply the same Octree structure to the reference frame as it does to the frame being processed. In this case, nodes may be set at positions where there are no points. Such nodes are called empty nodes. If the reference node is an empty node, the RAHT unit 2080 may make interpretation impossible in step S28110.

[0254] The RAHT unit 2080 may apply Octree to the reference frame independently of the frame to be processed, and set a different Octree structure from that of the frame to be processed. In this case, it is not necessarily the case that a node does not exist at the same location as the frame to be processed. If a reference node is not found at the location corresponding to the node to be processed, the RAHT unit 2080 may make inter prediction impossible in step S28143.

[0255] If the reference node is an empty node, or if the reference node is not found, the RAHT unit 2080 may estimate and interpolate the information of the reference node using information from nearby nodes in the reference frame.

[0256] For example, the RAHT unit 2080 may estimate and interpolate the average of the attribute values or AC coefficients of adjacent nodes, nearest neighbor nodes, or k-nearest neighbor nodes with respect to the reference node position, as the attribute values or AC coefficients of the reference node, respectively.

[0257] The RAHT unit 2080 may predict the AC coefficient of the node to be processed, for example, from the attribute values of the reference node.

[0258] Specifically, the RAHT unit 2080 receives the value of the decoded attribute value of the reference node, Attr inter Using the predicted value of the attribute value of the node to be processed, Attr pred To obtain the predicted value of the attribute value of the node to be processed, Attr pred By applying RAHT to this, the predicted AC coefficient of the target node can be obtained. pred You could also say that you are looking for this.

[0259] Attr pred =Attr inter AC pred =RAHT(Attr pred ) The RAHT unit 2080 may, for example, directly predict the AC coefficient of the node to be processed from the AC coefficient of the reference node.

[0260] Specifically, the RAHT unit 2080 uses RAHT in the reference frame to determine the value of the AC coefficient of the reference node AC. inter The AC coefficient of the target node is calculated and that value is used as the predicted AC value. pred That is also acceptable.

[0261] AC pred =AC inter For the AC coefficient of the reference node, the RAHT unit 2080 may record the AC coefficients of each node in the reference frame in the frame buffer 2120 and obtain them by referring to the values in the frame buffer 2120. In such a case, when the AC coefficient of the reference node does not exist in the frame buffer 2120, the RAHT unit 2080 may make the inter prediction infeasible in step S28110.

[0262] Note that the RAHT unit 2080 may multiply Attr inter and AC inter by α times respectively with the scaling factor α.

[0263] Attr pred = α Attr inter Or AC pred = α AC inter The coefficient α may take any real number. The coefficient α may be decoded for each node or for each layer. The coefficient α may be included in the slice data.

[0264] For example, the coefficient α may be defined using the depth of the layer as follows, and α' may be decoded instead of the coefficient α.

[0265] α = 1 + α'·2 -depth The RAHT unit 2080 may perform a similar operation for the inter prediction of the DC coefficient in step S28003.

[0266] DC pred = α DC inter Here, the DC coefficient of the reference node is DC inter and the predicted value of the DC coefficient of the root node is DC pred is used.

[0267] Also, the RAHT unit 2080 may calculate the predicted value of the attribute value or the AC coefficient by combining the inter prediction and the intra prediction.

[0268] For example, an example of the case where the RAHT unit 2080 obtains a prediction of an attribute value is shown below.

[0269] Attr pred =W inter ·Attr inter +W intra ·Attr intra Here, Attr inter and Attr intra are inter-prediction and intra-prediction of the attribute value, respectively. Also, W inter and W intra are weights of inter-prediction and intra-prediction, respectively. W inter and W intra may be determined such that the intra-prediction is more emphasized for deeper layers according to the depth depth of the processing target layer. For example, W inter =1-depth / N W intra =depth / N N is the maximum value of the depth of the layer where the inter-prediction is effective. The combination of inter-prediction and intra-prediction may be effective only at a specific layer. For example, the combination of inter-prediction and intra-prediction may be effective only when M < depth < N. M is an arbitrary real number less than N and may be decoded as header information such as APS.

[0270] (Point cloud encoding device 100) Hereinafter, the point cloud encoding device 100 according to the present embodiment will be described with reference to FIG. 25. FIG. 25 is a diagram showing an example of the functional blocks of the point cloud encoding device 100 according to the present embodiment.

[0271] As shown in Figure 25, the point cloud coding device 100 includes a coordinate transformation unit 1010, a geometric information quantization unit 1020, a tree analysis unit 1030, an approximate surface analysis unit 1040, a geometric information coding unit 1050, a geometric information reconstruction unit 1060, a color conversion unit 1070, an attribute transfer unit 1080, a RAHT unit 1090, a LoD calculation unit 1100, a lifting unit 1110, an attribute information quantization unit 1120, an attribute information coding unit 1130, and a frame buffer 1140.

[0272] The coordinate transformation unit 1010 is configured to perform a transformation process from the 3D coordinate system of the input point cloud to any different coordinate system. The coordinate transformation may be performed, for example, by rotating the input point cloud to transform the x, y, and z coordinates of the input point cloud into arbitrary s, t, and u coordinates. Alternatively, as one variation of the transformation, the coordinate system of the input point cloud may be used as is.

[0273] The geometric information quantization unit 1020 is configured to quantize the position information of the input point cloud after coordinate transformation and to remove points with overlapping coordinates. When the quantization step size is 1, the position information of the input point cloud and the position information after quantization coincide. In other words, when the quantization step size is 1, it is equivalent to not performing quantization.

[0274] The tree analysis unit 1030 is configured to take the positional information of the quantized point cloud as input and generate an occupancy code that indicates which node in the encoding target space a point is located at, based on the tree structure described later.

[0275] The tree analysis unit 1030 is configured to generate a tree structure in this process by recursively dividing the space to be encoded into rectangular parallelepipeds.

[0276] Here, if a point exists within a given rectangular prism, a tree structure can be generated by recursively dividing that rectangular prism into multiple rectangular prisms until the rectangular prism reaches a predetermined size. Each of these rectangular prisms is called a node. Each rectangular prism generated by dividing a node is called a child node, and the occupancy code is a representation of whether or not a point is contained within a child node, expressed as 0 or 1.

[0277] As described above, the tree analysis unit 1030 is configured to generate occupancy code by recursively dividing the nodes until they reach a predetermined size.

[0278] In this embodiment, a method called "Octree" can be used, which recursively performs octree partitioning by always treating the cuboid as a cube, and a method called "QtBt" can be used, which performs quadtree partitioning and binary tree partitioning in addition to octree partitioning.

[0279] Whether or not to use "QtBt" is transmitted to the point cloud decoder 200 as control data.

[0280] Alternatively, it may be specified to use predictive coding with an arbitrary tree structure. In this case, the tree analysis unit 1030 determines the tree structure, and the determined tree structure is transmitted to the point cloud decoder 200 as control data.

[0281] For example, the control data in a tree structure may be configured to be decryptable using the procedure described in Figures 5 to 24.

[0282] The approximate surface analysis unit 1040 is configured to generate approximate surface information using the tree information generated by the tree analysis unit 1030.

[0283] Approximate surface information is used, for example, when decoding 3D point cloud data of an object, in cases where the point cloud is densely distributed on the object's surface. Instead of decoding each individual point cloud, the region where the point cloud exists is approximated and represented by a small plane.

[0284] Specifically, the approximate surface analysis unit 1040 may be configured to generate approximate surface information using a method called "Trisoup," for example. Furthermore, this process can be omitted when decoding sparse point clouds acquired by Lidar or the like.

[0285] The geometric information encoding unit 1050 is configured to encode the syntax of the occupancy code generated by the tree analysis unit 1030 and the approximate surface information generated by the approximate surface analysis unit 1040, and generate a bitstream (geometric information bitstream). Here, the bitstream may include, for example, the syntax described in Figure 4.

[0286] The encoding process is, for example, context-adaptive binary arithmetic encoding. Here, for example, the syntax includes control data (flags and parameters) to control the decoding process of the location information.

[0287] The geometric information reconstruction unit 1060 is configured to reconstruct the geometric information of each point in the point cloud data to be encoded (the coordinate system assumed by the encoding process, i.e., the position information after the coordinate transformation in the coordinate transformation unit 1010) based on the tree information generated by the tree analysis unit 1030 and the approximate surface information generated by the approximate surface analysis unit 1040.

[0288] The frame buffer 1140 is configured to take the geometric information reconstructed by the geometric information reconstruction unit 1060 as input and store it as a reference frame.

[0289] The saved reference frames are read from the frame buffer 1140 and used as reference frames when the tree analysis unit 1030 performs interpretation.

[0290] The color conversion unit 1070 is configured to perform color conversion if the input attribute information is color information. Color conversion is not always necessary; whether or not the color conversion process is performed is encoded as part of the control data and transmitted to the point cloud decoder 200.

[0291] The attribute transfer unit 1080 is configured to correct attribute values so as to minimize distortion of attribute information, based on the position information of the input point cloud, the position information of the point cloud after reconstruction by the geometric information reconstruction unit 1060, and the attribute information after color change by the color conversion unit 1070. For example, the method described in Non-Patent Document 2 can be applied as a specific correction method.

[0292] The RAHT unit 1090 is configured to take the attribute information after the attribute transfer by the attribute transfer unit 1080 and the geometric information reconstruction unit 1060 as input, and generate residual information for each point using a type of Haar transform called RAHT (Region Adaptive Hierarchical Transform). As for the specific processing of RAHT, for example, the method described in Non-Patent Document 2 above can be used.

[0293] The LoD calculation unit 1100 is configured to take geometric information generated by the geometric information reconstruction unit 1060 as input and generate LoD (Level of Detail).

[0294] LoD (Level of Data) is information used to define reference relationships (referring points and referenced points) for implementing predictive coding, which involves predicting the attribute information of one point from the attribute information of another point and then encoding or decoding the prediction residual.

[0295] In other words, LoD is information that defines a hierarchical structure in which each point included in geometric information is classified into multiple levels, and the attributes of points belonging to lower levels are encoded or decoded using the attribute information of points belonging to higher levels.

[0296] As for a specific method for determining the LoD, for example, the method described in Non-Patent Document 2 above may be used.

[0297] The lifting unit 1110 is configured to generate residual information through a lifting process using the LoD generated by the LoD calculation unit 1100 and the attribute information after attribute transfer in the attribute transfer unit 1080.

[0298] As for the specific lifting process, for example, the method described in Non-Patent Document 2 above may be used.

[0299] The attribute information quantization unit 1120 is configured to quantize the residual information output from the RAHT unit 1090 or the lifting unit 1110. Here, when the quantization step size is 1, it is equivalent to not performing quantization.

[0300] The attribute information encoding unit 1130 is configured to encode the quantized residual information output from the attribute information quantization unit 1120 as syntax, and to generate a bitstream related to attribute information (attribute information bitstream).

[0301] The encoding process is, for example, context-adaptive binary arithmetic encoding. Here, for example, the syntax includes control data (flags and parameters) to control the decoding process of attribute information.

[0302] The point cloud encoding device 100 is configured to perform encoding processing on the positional information and attribute information of each point in the point cloud as input, and to output a geometric information bitstream and an attribute information bitstream.

[0303] Furthermore, the point cloud coding device 100 and point cloud decoding device 200 described above may be implemented as programs that cause a computer to execute each function (each process).

[0304] In the above embodiments, the present invention was described using the application to a point cloud coding device 100 and a point cloud decoding device 200 as an example. However, the present invention is not limited to such examples and can be similarly applied to a point cloud coding / decoding system equipped with the functions of the point cloud coding device 100 and the point cloud decoding device 200. [Industrial applicability]

[0305] Furthermore, according to this embodiment, for example, it is possible to achieve an overall improvement in service quality in video communication, thereby contributing to Goal 9 of the United Nations-led Sustainable Development Goals (SDGs), "Build resilient infrastructure, promote sustainable industrialization and foster innovation." [Explanation of symbols]

[0306] 10…Point cloud processing system 100...Point cloud encoding device 1010... Coordinate transformation section 1020...Geometric information quantization section 1030...Tree Analysis Unit 1040…Approximate surface analysis section 1050...Geometric information encoding unit 1060...Geometric information reconstruction unit 1070...Color conversion unit 1080... Attribute Transfer Section 1090...RAHT Department 1100...LoD calculation unit 1110... Lifting Section 1120...Attribute information quantization section 1130...Attribute information encoding unit 1140... Frame buffer 200... Point cloud decoder 2010...Geometric Information Decoding Unit 2020... Tree Composition Section 2030…Approximate surface synthesis part 2040...Geometric information reconstruction unit 2050... Inverse coordinate transformation unit 2060... Attribute Information Decoding Unit 2070...Inverse quantization section 2080…RAHT Department 2090...LoD calculation unit 2100... Reverse lifting section 2110... Reverse color conversion unit 2120... Frame buffer

Claims

1. A point cloud decoder, Equipped with a tree synthesis section, The aforementioned tree synthesis unit, In Predictive coding's Angular mode, the processing performed is switched depending on the sensor type or operating mode. A point cloud decoding device characterized in that, in the Angular mode, and when the sensor type indicates Non-Spinning LiDAR data or when the operation mode indicates the activation of elevation angle component prediction and residual decoding, intra-prediction is performed for the coordinate values r, θ, and φ of the target node after polar coordinate transformation, based on the values of the parent node and ancestor node of the target node.

2. The point cloud decoding device according to claim 1, characterized in that the tree synthesis unit performs intra-prediction using a value obtained by adding the difference between the value of the parent node and the value of the ancestor node to the value of the parent node as the predicted value.

3. The point cloud decoding device according to claim 1, characterized in that the tree synthesis unit performs intra-prediction using a value obtained by adding the average value of the differences between nodes in the branch processed immediately before to the value of the parent node as the predicted value.

4. The point cloud decoding device according to claim 1, characterized in that the tree synthesis unit performs intra-prediction using a predicted value which is the value obtained by adding the difference between the value of the parent node and the child node of the node that is most similar to the value of the parent node in the branch processed immediately before, to the value of the parent node.

5. The point cloud decoding device according to any one of claims 1 to 4, wherein the tree synthesis unit has a plurality of intra-prediction methods for the coordinate values r, θ, and φ of the node to be processed after polar coordinate transformation, and switches the intra-prediction method based on a flag.

6. A point cloud decoding method, In the Angular mode of predictive coding, there is a process of switching the processing to be performed depending on the sensor type or operating mode, A point cloud decoding method characterized by comprising the step of performing intraprediction of the coordinate values r, θ, and φ of the target node after polar coordinate transformation, based on the values of the parent node and ancestor node of the target node, in the case of the Angular mode and when the sensor type indicates Non-Spinning LiDAR data or when the operation mode indicates the activation of elevation angle component prediction and residual decoding.

7. A program that makes a computer function as a point cloud decoder, The point cloud decoding device is, Equipped with a tree synthesis section, The aforementioned tree synthesis unit, In Predictive coding's Angular mode, the processing performed is switched depending on the sensor type or operating mode. A program characterized in that, when the Angular mode is enabled and the sensor type indicates Non-Spinning LiDAR data, or when the operation mode indicates the activation of elevation angle component prediction and residual decoding, intraprediction is performed for the coordinate values r, θ, and φ of the target node after polar coordinate transformation, based on the values of the parent node and ancestor node of the target node.