Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
By encoding and decoding three-dimensional data with connection information based on regional relationships and user location, the method efficiently selects and displays relevant data, addressing the challenge of timely and relevant data display in three-dimensional maps.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- PANASONIC INTELLECTUAL PROPERTY CORP OF AMERICA
- Filing Date
- 2025-04-23
- Publication Date
- 2026-06-26
AI Technical Summary
Existing methods for encoding and decoding three-dimensional data, such as point clouds, struggle with efficiently selecting and decoding a desired set of three-dimensional points from a large volume of encoded data, particularly in applications like three-dimensional maps where timely and relevant data display is crucial.
A method and device that generate and utilize connection information based on visibility and relationships between regions, including tile information and association information, to encode and decode three-dimensional points in a manner that prioritizes relevance to a user's current location or desired viewing direction, allowing for efficient selection and decoding of relevant data.
Enables the appropriate selection and decoding of desired three-dimensional points, reducing processing time and improving the relevance of displayed data, especially in three-dimensional map applications.
Smart Images

Figure 0007881017000001 
Figure 0007881017000002 
Figure 0007881017000003
Abstract
Description
Technical Field
[0006] , , , , ,
[0001] The present disclosure relates to a three-dimensional data encoding method, a three-dimensional data decoding method, a three-dimensional data encoding apparatus, and a three-dimensional data decoding apparatus.
Background Art
[0002] In a wide range of fields such as computer vision, map information, monitoring, infrastructure inspection, or video distribution for autonomous operation of automobiles or robots, the spread of devices or services utilizing three-dimensional data is expected in the future. Three-dimensional data is acquired by various methods such as a distance sensor such as a range finder, a stereo camera, or a combination of a plurality of monocular cameras.
[0003] As one method of expressing three-dimensional data, there is a method called point cloud that represents the shape of a three-dimensional structure by a point group in a three-dimensional space. In a point cloud, the positions and colors of the point group are stored. Although the point cloud is expected to become mainstream as a method of expressing three-dimensional data, the point group has a very large data volume. Therefore, in the accumulation or transmission of three-dimensional data, as in the case of two-dimensional moving images (for example, MPEG-4 AVC or HEVC standardized by MPEG), compression of the data volume by encoding is essential.
[0004] Also, regarding the compression of the point cloud, it is partially supported by a publicly available library (Point Cloud Library) that performs point cloud-related processing.
[0005] Also, a technique for searching and displaying facilities located around a vehicle using three-dimensional map data is known (see, for example, Patent Document 1).
Prior Art Documents
Patent Documents
[0007] In encoded three-dimensional data (multiple three-dimensional points), it is desirable to be able to appropriately select and decode a desired set of encoded three-dimensional points from among the encoded points.
[0008] The purpose of this disclosure is to provide a three-dimensional data encoding method, a three-dimensional data decoding method, a three-dimensional data encoding device, or a three-dimensional data decoding device that can appropriately select and decode a desired number of encoded three-dimensional points from a plurality of encoded three-dimensional points. [Means for solving the problem]
[0009] A three-dimensional data encoding method according to one aspect of this disclosure is performed by an encoding device. Encoding three-dimensional data containing multiple three-dimensional points located in multiple regions. An encoding method 、 The plurality of three-dimensional points Encoded data Generate a bitstream that includes The bitstream includes additional information, the additional information includes connection information corresponding to each of the plurality of regions, and each of the connection information includes one or more other region identification pieces that identify one or more other regions determined to be related to the region corresponding to the connection information. The aforementioned judgement The above The connection information corresponding to the above Based on visibility relative to viewpoints within the domain It will be held.
[0010] A three-dimensional data decoding method according to one aspect of this disclosure is performed by a decoding device. Decode three-dimensional data containing multiple three-dimensional points located in multiple regions. A decoding method, The aforementioned Multiple three-dimensional points Encoded data Obtain a bitstream containing, The bitstream includes additional information, the additional information includes connection information corresponding to each of the plurality of regions, each of the connection information includes one or more other region identification pieces that identify one or more other regions determined to be related to the region corresponding to the connection information, and the decoding method further includes the Based on the connection information, the encoding data Decrypt the above judgement The above The connection information corresponding to the above Based on visibility relative to viewpoints within the domain Watch . [Effects of the Invention]
[0011] The present disclosure can provide a three-dimensional data encoding method, a three-dimensional data decoding method, a three-dimensional data encoding device, or a three-dimensional data decoding device that can appropriately select and decode a desired plurality of encoded three-dimensional points from a plurality of encoded three-dimensional points.
Brief Description of the Drawings
[0012] [Figure 1] FIG. 1 is a diagram showing the configuration of a three-dimensional data encoding / decoding system according to Embodiment 1. [Figure 2] FIG. 2 is a diagram showing a configuration example of point group data according to Embodiment 1. [Figure 3] FIG. 3 is a diagram showing a configuration example of a data file in which point group data information according to Embodiment 1 is described. [Figure 4] FIG. 4 is a diagram showing the types of point group data according to Embodiment 1. [Figure 5] FIG. 5 is a diagram showing the configuration of the first encoding unit according to Embodiment 1. [Figure 6] FIG. 6 is a block diagram of the first encoding unit according to Embodiment 1. [Figure 7] FIG. 7 is a diagram showing the configuration of the first decoding unit according to Embodiment 1. [Figure 8] FIG. 8 is a block diagram of the first decoding unit according to Embodiment 1. [Figure 9] FIG. 9 is a block diagram of a three-dimensional data encoding device according to Embodiment 1. [Figure 10] FIG. 10 is a diagram showing an example of position information according to Embodiment 1. [Figure 11] FIG. 11 is a diagram showing an example of an octree representation of position information according to Embodiment 1. [Figure 12] FIG. 12 is a block diagram of a three-dimensional data decoding device according to Embodiment 1. [Figure 13] ]>FIG. 13 is a block diagram of an attribute information encoding unit according to Embodiment 1. [Figure 14] FIG. 14 is a block diagram of an attribute information decoding unit according to Embodiment 1. <000009 [Figure 15] Figure 15 is a block diagram showing the configuration of the attribute information encoding unit according to Embodiment 1. [Figure 16] Figure 16 is a block diagram of the attribute information encoding unit according to Embodiment 1. [Figure 17] Figure 17 is a block diagram showing the configuration of the attribute information decoding unit according to Embodiment 1. [Figure 18] Figure 18 is a block diagram of the attribute information decoding unit according to Embodiment 1. [Figure 19] Figure 19 shows the configuration of the second encoding unit according to Embodiment 1. [Figure 20] Figure 20 is a block diagram of the second encoding unit according to Embodiment 1. [Figure 21] Figure 21 is a diagram showing the configuration of the second decoding unit according to Embodiment 1. [Figure 22] Figure 22 is a block diagram of the second decoding unit according to Embodiment 1. [Figure 23] Figure 23 is a diagram showing the protocol stack related to PCC encoded data according to Embodiment 1. [Figure 24] Figure 24 shows the configuration of the encoding unit and multiplexing unit according to Embodiment 2. [Figure 25] Figure 25 shows an example of the structure of encoded data according to Embodiment 2. [Figure 26] Figure 26 shows an example of the configuration of encoded data and NAL unit according to Embodiment 2. [Figure 27] Figure 27 shows an example of the semantics of pcc_nal_unit_type according to Embodiment 2. [Figure 28] Figure 28 is a block diagram of the first encoding unit according to Embodiment 3. [Figure 29] Figure 29 is a block diagram of the first decoding unit according to Embodiment 3. [Figure 30] Figure 30 is a block diagram of the divided section according to Embodiment 3. [Figure 31]Figure 31 shows an example of slice and tile division according to Embodiment 3. [Figure 32] Figure 32 shows an example of a slice and tile division pattern according to Embodiment 3. [Figure 33] Figure 33 is a diagram showing an example of a dependency relationship according to Embodiment 3. [Figure 34] Figure 34 shows an example of the data decoding order according to Embodiment 3. [Figure 35] Figure 35 is a flowchart of the encoding process according to Embodiment 3. [Figure 36] Figure 36 is a block diagram of the joint according to Embodiment 3. [Figure 37] Figure 37 shows an example of the configuration of encoded data and a NAL unit according to Embodiment 3. [Figure 38] Figure 38 is a flowchart of the encoding process according to Embodiment 3. [Figure 39] Figure 39 is a flowchart of the decoding process according to Embodiment 3. [Figure 40] Figure 40 shows an example of the syntax for tile addition information according to Embodiment 4. [Figure 41] Figure 41 is a block diagram of the coding and decoding system according to Embodiment 4. [Figure 42] Figure 42 shows an example of the syntax for slice addition information according to Embodiment 4. [Figure 43] Figure 43 is a flowchart of the encoding process according to Embodiment 4. [Figure 44] Figure 44 is a flowchart of the decoding process according to Embodiment 4. [Figure 45] Figure 45 shows an example of a division method according to Embodiment 5. [Figure 46] Figure 46 shows an example of point cloud data division according to Embodiment 5. [Figure 47] Figure 47 shows an example of the syntax for tile addition information according to Embodiment 5. [Figure 48]Figure 48 shows an example of index information according to Embodiment 5. [Figure 49] Figure 49 shows an example of a dependency relationship according to Embodiment 5. [Figure 50] Figure 50 shows an example of transmission data according to Embodiment 5. [Figure 51] Figure 51 shows an example of the configuration of the NAL unit according to Embodiment 5. [Figure 52] Figure 52 is a diagram showing an example of a dependency relationship according to Embodiment 5. [Figure 53] Figure 53 shows an example of the data decoding order according to Embodiment 5. [Figure 54] Figure 54 is a diagram showing an example of a dependency relationship according to Embodiment 5. [Figure 55] Figure 55 shows an example of the data decoding order according to Embodiment 5. [Figure 56] Figure 56 is a flowchart of the encoding process according to Embodiment 5. [Figure 57] Figure 57 is a flowchart of the decoding process according to Embodiment 5. [Figure 58] Figure 58 is a flowchart of the encoding process according to Embodiment 5. [Figure 59] Figure 59 is a flowchart of the encoding process according to Embodiment 5. [Figure 60] Figure 60 shows examples of transmitted and received data according to Embodiment 5. [Figure 61] Figure 61 is a flowchart of the decoding process according to Embodiment 5. [Figure 62] Figure 62 shows examples of transmitted and received data according to Embodiment 5. [Figure 63] Figure 63 is a flowchart of the decoding process according to Embodiment 5. [Figure 64] Figure 64 is a flowchart of the encoding process according to Embodiment 5. [Figure 65] Figure 65 shows an example of index information according to Embodiment 5. [Figure 66] Figure 66 is a diagram showing an example of a dependency relationship according to Embodiment 5. [Figure 67] Figure 67 shows an example of transmission data according to Embodiment 5. [Figure 68] Figure 68 shows examples of transmitted and received data according to Embodiment 5. [Figure 69] Figure 69 is a flowchart of the decoding process according to Embodiment 5. [Figure 70] Figure 70 shows an example of GPS syntax according to Embodiment 6. [Figure 71] Figure 71 is a flowchart of the three-dimensional data decoding process according to Embodiment 6. [Figure 72] Figure 72 shows an example of an application according to Embodiment 6. [Figure 73] Figure 73 shows examples of tile division and slice division according to Embodiment 6. [Figure 74] Figure 74 is a flowchart of the processing in the system according to Embodiment 6. [Figure 75] Figure 75 is a flowchart of the processing in the system according to Embodiment 6. [Figure 76] Figure 76 is a block diagram of a three-dimensional data encoding device according to Embodiment 7. [Figure 77] Figure 77 is a block diagram of a three-dimensional data decoding device according to Embodiment 7. [Figure 78] Figure 78 is a block diagram of a three-dimensional data encoding device according to Embodiment 7. [Figure 79] Figure 79 is a block diagram showing the configuration of a three-dimensional data decoding device according to Embodiment 7. [Figure 80] Figure 80 shows an example of point cloud data according to Embodiment 7. [Figure 81] Figure 81 shows an example of a point-by-point normal vector according to Embodiment 7. [Figure 82]Figure 82 shows an example of the syntax of a normal vector according to Embodiment 7. [Figure 83] Figure 83 is a flowchart of the three-dimensional data encoding process according to Embodiment 7. [Figure 84] Figure 84 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 85] Figure 85 shows an example of the bitstream configuration according to Embodiment 7. [Figure 86] Figure 86 shows an example of point cloud information according to Embodiment 7. [Figure 87] Figure 87 is a flowchart of the three-dimensional data encoding process according to Embodiment 7. [Figure 88] Figure 88 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 89] Figure 89 shows an example of normal vector division according to Embodiment 7. [Figure 90] Figure 90 shows an example of normal vector division according to Embodiment 7. [Figure 91] Figure 91 shows an example of point cloud data according to Embodiment 7. [Figure 92] Figure 92 shows an example of a normal vector according to Embodiment 7. [Figure 93] Figure 93 is a diagram showing an example of normal vector information according to Embodiment 7. [Figure 94] Figure 94 shows an example of a cube according to Embodiment 7. [Figure 95] Figure 95 shows an example of a cube face according to Embodiment 7. [Figure 96] Figure 96 shows an example of a cube face according to Embodiment 7. [Figure 97] Figure 97 shows an example of a cube face according to Embodiment 7. [Figure 98] Figure 98 shows an example of the visibility of a slice according to Embodiment 7. [Figure 99]Figure 99 shows an example of the bitstream configuration according to Embodiment 7. [Figure 100] Figure 100 shows an example of the syntax of a slice header for position information according to Embodiment 7. [Figure 101] Figure 101 shows an example of the syntax of a slice header for position information according to Embodiment 7. [Figure 102] Figure 102 is a flowchart of the three-dimensional data encoding process according to Embodiment 7. [Figure 103] Figure 103 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 104] Figure 104 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 105] Figure 105 shows an example of the bitstream configuration according to Embodiment 7. [Figure 106] Figure 106 shows an example of the slice information syntax according to Embodiment 7. [Figure 107] Figure 107 shows an example of the slice information syntax according to Embodiment 7. [Figure 108] Figure 108 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 109] Figure 109 shows an example of partial decoding processing according to Embodiment 7. [Figure 110] Figure 110 shows an example of the configuration of a three-dimensional data decoding device according to Embodiment 7. [Figure 111] Figure 111 shows an example of processing by the random access control unit according to Embodiment 7. [Figure 112] Figure 112 shows an example of processing by the random access control unit according to Embodiment 7. [Figure 113] Figure 113 shows an example of the relationship between distance and resolution according to Embodiment 7. [Figure 114] Figure 114 shows an example of a brick and normal vector according to Embodiment 7. [Figure 115] Figure 115 shows an example of the level according to Embodiment 7. [Figure 116] Figure 116 shows an example of an octave tree structure according to Embodiment 7. [Figure 117] Figure 117 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 118] Figure 118 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 119] Figure 119 shows an example of a brick to be decoded according to Embodiment 7. [Figure 120] Figure 120 shows an example of the level to be decoded according to Embodiment 7. [Figure 121] Figure 121 shows an example of the syntax of a slice header for position information according to Embodiment 7. [Figure 122] Figure 122 is a flowchart of the three-dimensional data encoding process according to Embodiment 7. [Figure 123] Figure 123 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 124] Figure 124 shows an example of point cloud data according to Embodiment 7. [Figure 125] Figure 125 shows an example of point cloud data according to Embodiment 7. [Figure 126] Figure 126 is a diagram showing an example of the system configuration according to Embodiment 7. [Figure 127] Figure 127 is a diagram showing an example of the system configuration according to Embodiment 7. [Figure 128] Figure 128 is a diagram showing an example of the system configuration according to Embodiment 7. [Figure 129] Figure 129 is a diagram showing an example of the system configuration according to Embodiment 7. [Figure 130] Figure 130 shows an example of the bitstream configuration according to Embodiment 7. [Figure 131] Figure 131 shows an example of the configuration of a three-dimensional data encoding device according to Embodiment 7. [Figure 132] Figure 132 shows an example of the configuration of a three-dimensional data decoding device according to Embodiment 7. [Figure 133] Figure 133 shows the basic structure of ISOBMFF according to Embodiment 7. [Figure 134] Figure 134 is a protocol stack diagram showing the case where the NAL unit common to the PCC codec according to Embodiment 7 is stored in ISOBMFF. [Figure 135] Figure 135 shows an example of converting a bitstream to a file format according to Embodiment 7. [Figure 136] Figure 136 shows an example of the slice information syntax according to Embodiment 7. [Figure 137] Figure 137 shows an example of the syntax for a PCC random access table according to Embodiment 7. [Figure 138] Figure 138 shows an example of the syntax for a PCC random access table according to Embodiment 7. [Figure 139] Figure 139 shows an example of the syntax for a PCC random access table according to Embodiment 7. [Figure 140] Figure 140 is a flowchart of the three-dimensional data encoding process according to Embodiment 7. [Figure 141] Figure 141 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 142] Figure 142 is a flowchart of the three-dimensional data encoding process according to Embodiment 7. [Figure 143] Figure 143 is a flowchart of the three-dimensional data decoding process according to Embodiment 7. [Figure 144] Figure 144 is a block diagram of a three-dimensional data creation device according to Embodiment 5. [Figure 145] Figure 145 is a flowchart of the three-dimensional data creation method according to Embodiment 5. [Figure 146] Figure 146 is a diagram showing the configuration of the system according to Embodiment 8. [Figure 147] Figure 147 is a block diagram of the client device according to Embodiment 8. [Figure 148] Figure 148 is a block diagram of the server according to Embodiment 8. [Figure 149] Figure 149 is a flowchart of the three-dimensional data creation process by the client device according to Embodiment 8. [Figure 150] Figure 150 is a flowchart of the sensor information transmission process by the client device according to Embodiment 8. [Figure 151] Figure 151 is a flowchart of the three-dimensional data creation process performed by the server according to Embodiment 8. [Figure 152] Figure 152 is a flowchart of the three-dimensional map transmission process by the server according to Embodiment 8. [Figure 153] Figure 153 shows a modified configuration of the system according to Embodiment 8. [Figure 154] Figure 154 is a diagram showing the configuration of the server and client device according to Embodiment 8. [Figure 155] Figure 155 is a diagram showing the configuration of the server and client device according to Embodiment 8. [Figure 156] Figure 156 is a flowchart of the processing performed by the client device according to Embodiment 8. [Figure 157] Figure 157 is a diagram showing the configuration of the sensor information collection system according to Embodiment 8. [Figure 158] Figure 158 shows an example of a system according to Embodiment 8. [Figure 159] Figure 159 shows a modified example of the system according to Embodiment 8. [Figure 160] Figure 160 is a flowchart showing an example of application processing according to Embodiment 8. [Figure 161] Figure 161 shows the sensor ranges of various sensors according to Embodiment 8. [Figure 162]Figure 162 is a diagram showing an example configuration of an automated driving system according to Embodiment 8. [Figure 163] Figure 163 is a diagram showing an example of the bitstream configuration according to Embodiment 8. [Figure 164] Figure 164 is a flowchart of the point cloud selection process according to Embodiment 8. [Figure 165] Figure 165 shows an example of the point cloud selection process screen according to Embodiment 8. [Figure 166] Figure 166 is a diagram showing an example of the point cloud selection process screen according to Embodiment 8. [Figure 167] Figure 167 is a diagram showing an example of the point cloud selection process screen according to Embodiment 8. [Figure 168] Figure 168 is a diagram illustrating a first example of a connectivity determination method according to Embodiment 9. [Figure 169] Figure 169 is a diagram illustrating a second example of the connectivity determination method according to Embodiment 9. [Figure 170] Figure 170 is a block diagram illustrating an example of how a three-dimensional data decoding device determines connectivity and / or the strength of connectivity. [Figure 171] Figure 171 is a block diagram illustrating another example of how a three-dimensional data decoding device determines connectivity and / or the strength of connectivity. [Figure 172] Figure 172 is a block diagram showing the configuration of a three-dimensional data encoding device according to Embodiment 9. [Figure 173] Figure 173 is a block diagram showing the configuration of a three-dimensional data decoding device. [Figure 174] Figure 174 is a diagram illustrating the signaling of additional information, including connection information. [Figure 175] Figure 175 is a diagram illustrating a first example of another method for determining connectivity according to Embodiment 9. [Figure 176] Figure 176 is a diagram illustrating another example of the first example of another connectivity determination method according to Embodiment 9. [Figure 177]Figure 177 shows an example of the connection information syntax in the example shown in Figure 176. [Figure 178] Figure 178 is a diagram illustrating a second example of another method for determining connectivity according to Embodiment 9. [Figure 179] Figure 179 shows an example of the connection information syntax in the example shown in Figure 178. [Figure 180] Figure 180 is a diagram illustrating a third example of another method for determining connectivity according to Embodiment 9. [Figure 181] Figure 181 shows the first example of a container type. [Figure 182] Figure 182 shows a second example of the container type. [Figure 183] Figure 183 shows a third example of the container type. [Figure 184] Figure 184 shows a fourth example of the container type. [Figure 185] Figure 185 shows an example of tile arrangement. [Figure 186] Figure 186 shows an example of the sequence in which a three-dimensional data encoding device encodes each tile in the tile arrangement example shown in Figure 185. [Figure 187] Figure 187 shows an example of connection information syntax that includes information indicating the encoded tile number. [Figure 188] Figure 188 shows an example of connection information syntax that includes information indicating the container type. [Figure 189] Figure 189 is a diagram illustrating a first example of the three-dimensional data decoding process of a three-dimensional data decoding device according to Embodiment 9. [Figure 190] Figure 190 is a diagram illustrating a second example of the three-dimensional data decoding process of the three-dimensional data decoding device according to Embodiment 9. [Figure 191] Figure 191 is a diagram illustrating a third example of the three-dimensional data decoding process of the three-dimensional data decoding device according to Embodiment 9. [Figure 192]Figure 192 is a diagram illustrating a fourth example of the three-dimensional data decoding process of the three-dimensional data decoding device according to Embodiment 9. [Figure 193] Figure 193 is a diagram illustrating a fourth example of the three-dimensional data decoding process of the three-dimensional data decoding device according to Embodiment 9. [Figure 194] Figure 194 is a diagram illustrating a fourth example of the three-dimensional data decoding process of the three-dimensional data decoding device according to Embodiment 9. [Figure 195] Figure 195 is a diagram illustrating a fifth example of the three-dimensional data decoding process of the three-dimensional data decoding device according to Embodiment 9. [Figure 196] Figure 196 shows an example of a bitstream configuration. [Figure 197] Figure 197 shows an example of the configuration of a three-dimensional data encoding device. [Figure 198] Figure 198 shows an example of the configuration of a three-dimensional data decoding device. [Figure 199] Figure 199 shows the basic structure of ISOBMFF. [Figure 200] Figure 200 is a protocol stack diagram showing the case where the NAL unit common to the PCC codec is stored in ISOBMFF. [Figure 201] Figure 201 shows an example of converting a bitstream to a file format. [Figure 202] Figure 202 shows an example of the syntax for slice information. [Figure 203] Figure 203 shows an example of the syntax for a PCC random access table. [Figure 204] Figure 204 shows an example of the syntax for a PCC random access table. [Figure 205] Figure 205 shows an example of the syntax for a PCC random access table. [Figure 206] Figure 206 is a flowchart showing the encoding process of the three-dimensional data encoding device according to Embodiment 9. [Figure 207] Figure 207 is a flowchart showing the decoding process of the three-dimensional data decoding device according to Embodiment 9. [Modes for carrying out the invention]
[0013] A three-dimensional data encoding method according to one aspect of the present disclosure encodes a plurality of three-dimensional points, each located in one of a plurality of regions, for each region, generates connection information based on the relationship between a predetermined region among the plurality of regions and a plurality of other regions other than the predetermined region, the connection information includes (i) tile information indicating a value uniquely assigned to each of the plurality of regions, and (ii) association information indicating that there is a relationship between the predetermined region and the other regions based on the tile information, and generates a bitstream including the generated connection information and the encoded plurality of three-dimensional points.
[0014] For example, a three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in a bitstream. Here, for example, if the three-dimensional data represents a three-dimensional map, when a user views the map, the user often wants to see the map around the center of their current location. In such a case, the three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in the bitstream and displays an image showing the multiple three-dimensional points on a display device in the order they were decoded. However, this may take time for the location the user wants to see to be displayed on the display device. Also, depending on the user's current location, there may be three-dimensional points that do not need to be decoded. Therefore, the three-dimensional data encoding device generates connection information that includes tile information indicating a value uniquely assigned to each of the multiple regions based on the relationships between multiple regions, and relationship information indicating that there is a relationship between a predetermined region and other regions based on the tile information. According to this, for example, when the three-dimensional data decoding device receives information indicating the user's current location from a device owned by the user, it can determine a predetermined region based on that information and decode the three-dimensional points encoded based on the connection information sequentially from the determined predetermined region. According to this, the three-dimensional data decoding device can decode encoded three-dimensional points in order, starting from those located in the region most likely to be desired by the user. In other words, the three-dimensional data encoding method according to this disclosure allows for the appropriate selection and decoding of a desired set of encoded three-dimensional points from among a set of encoded three-dimensional points.
[0015] Furthermore, for example, the relationship described above is the positional relationship between a predetermined region among the plurality of regions and the plurality of other regions other than the predetermined region.
[0016] Furthermore, for example, in generating the connection information, the connection information is generated which includes the relevant information indicating that an encoded three-dimensional point located in another region among the plurality of other regions that is in contact with or overlaps with the predetermined region is to be decoded.
[0017] For example, if three-dimensional data represents a three-dimensional map, other regions that are in contact with or overlap with a given region are likely to be a continuation of the map of that given region. Therefore, a three-dimensional data decoding device can further appropriately select and decode a desired set of three-dimensional points from among a set of encoded three-dimensional points.
[0018] Furthermore, for example, in generating the connection information, the connection information is generated which includes the relevant information indicating that an encoded three-dimensional point located in another region that is located in the direction relative to the predetermined region is decoded, based on orientation information indicating the orientation from the predetermined region.
[0019] For example, if three-dimensional data represents a three-dimensional map, other areas that do not touch or overlap with a predetermined area corresponding to the user's current location may also be areas that the user wants to quickly check. For instance, if a user is moving, they are likely to want to check the map of the area located in the direction of their movement. Therefore, this allows the three-dimensional data decoding device to decide whether or not to decode multiple three-dimensional points encoded according to their orientation, and thus further appropriately select and decode the desired multiple encoded three-dimensional points from among the multiple encoded three-dimensional points.
[0020] Furthermore, for example, in generating the connection information, the connection information is generated that includes related information indicating that the closer the other region is to the predetermined region, the earlier the region in which the encoded three-dimensional point among the plurality of other regions is decoded.
[0021] Regions closer to a given region are more likely to be related to that region. Therefore, according to this, a three-dimensional data decoding device can further appropriately select and decode a desired set of three-dimensional points from among a set of encoded three-dimensional points.
[0022] Furthermore, for example, in generating the connection information, based on the relationship, it is determined which of a plurality of predetermined groups each of the plurality of other regions belongs to, and the connection information is generated which includes group information indicating the determined predetermined group.
[0023] For example, when three-dimensional data represents a three-dimensional map, depending on the user's current location, different regions within a given area may have similar levels of priority. In such cases, if there is information grouping regions with similar levels of priority desired by the user, the three-dimensional data decoding device can determine whether or not to decode the encoded three-dimensional points for each group, or determine the order in which to decode them. Therefore, this reduces the amount of processing required for the three-dimensional data decoding device to determine whether or not to decode the encoded three-dimensional points, or to determine the order in which to decode them.
[0024] Furthermore, a three-dimensional data decoding method according to one aspect of the present disclosure acquires a bitstream containing a plurality of three-dimensional points, each of which is located in one of a plurality of regions, and which are encoded for each region; acquires connection information generated based on the relationship between a predetermined region among the plurality of regions and a plurality of other regions among the plurality of regions other than the predetermined region, which includes (i) tile information indicating a value uniquely assigned to each of the plurality of regions, and (ii) association information indicating that there is a relationship between the predetermined region and the other regions based on the tile information; and selectively decodes the encoded plurality of three-dimensional points for each region based on the acquired connection information.
[0025] For example, a three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in a bitstream. Here, for example, if the three-dimensional data represents a three-dimensional map, when a user views the map, the user often wants to see the map around the center of their current location. In such a case, the three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in the bitstream and displays an image showing the multiple three-dimensional points on a display device in the order they were decoded. However, this may take time for the location the user wants to see to be displayed on the display device. Also, depending on the user's current location, there may be three-dimensional points that do not need to be decoded. Therefore, the three-dimensional data encoding device generates tile information that shows a value uniquely assigned to each of the multiple regions based on the relationships between multiple regions, and connection information that shows the relationship between a predetermined region and other regions based on the tile information. According to this, for example, when the three-dimensional data decoding device receives information indicating the user's current location from a device owned by the user, it can determine a predetermined region based on that information and decode the three-dimensional points encoded based on the connection information sequentially from the determined predetermined region. According to this, the three-dimensional data decoding device can decode encoded three-dimensional points sequentially, starting from those located in areas that are likely to be desired by the user. In other words, the three-dimensional data decoding method according to this disclosure can appropriately select and decode a desired number of encoded three-dimensional points from among a plurality of encoded three-dimensional points.
[0026] Furthermore, for example, the relationship described above is the positional relationship between a predetermined region among the plurality of regions and the plurality of other regions other than the predetermined region.
[0027] Furthermore, for example, in acquiring the connection information, the connection information is acquired which includes the relevant information indicating that an encoded three-dimensional point located in another region among the plurality of other regions that is in contact with or overlaps with the predetermined region is to be decoded.
[0028] For example, if three-dimensional data represents a three-dimensional map, other regions that touch or overlap with a given region are likely to be continuations of the map of that given region. Therefore, this allows for the more appropriate selection and decoding of desired encoded three-dimensional points from among multiple encoded three-dimensional points.
[0029] Furthermore, for example, in acquiring the connection information, the connection information is acquired which includes the related information generated based on orientation information indicating the orientation from the predetermined region, and which indicates decoding an encoded three-dimensional point located in another region that is located in the orientation when viewed from the predetermined region.
[0030] For example, if three-dimensional data represents a three-dimensional map, other areas that do not touch or overlap with a predetermined area corresponding to the user's current location may also be areas that the user wants to quickly check. For instance, if a user is moving, they are likely to want to check the map of the area located in the direction of their movement. Therefore, this allows for the determination of whether or not to decode multiple three-dimensional points encoded according to orientation, and thus enables the more appropriate selection and decoding of desired multiple encoded three-dimensional points from among the multiple encoded three-dimensional points.
[0031] Furthermore, for example, in acquiring the connection information, the connection information is acquired which includes related information indicating that the closer the other region is to the predetermined region, the earlier the region in which the encoded three-dimensional point among the plurality of other regions is decoded.
[0032] Regions closer to a given region are more likely to be related to that region. Therefore, this allows for the more appropriate selection and decoding of desired encoded three-dimensional points from among multiple encoded three-dimensional points.
[0033] Furthermore, for example, in obtaining the connection information, the connection information is obtained, based on the relationship, which includes group information indicating which of a plurality of predetermined groups each of the plurality of other regions belongs to.
[0034] For example, when three-dimensional data represents a three-dimensional map, depending on the user's current location, different regions within a given area may have similar levels of priority. In such cases, if information exists that groups regions with similar levels of priority desired by the user, it becomes possible to determine whether or not to decode the encoded three-dimensional points for each group, or to determine the order in which they will be decoded. This reduces the amount of processing required to determine whether or not to decode the encoded three-dimensional points, or to determine the order in which they will be decoded.
[0035] Furthermore, for example, the connection information is included in the bitstream, and the acquisition of the connection information involves acquiring the connection information included in the bitstream.
[0036] According to this, it is possible to determine whether or not to decode the three-dimensional points encoded using the connection information contained in the bitstream, or to determine the order in which to decode them.
[0037] Furthermore, for example, in obtaining the connection information, the connection information is obtained by generating the connection information based on a plurality of encoded three-dimensional points included in the bitstream.
[0038] According to this, even if the bitstream does not contain connection information, it is possible to determine whether or not to decode the encoded three-dimensional points, or to determine the order in which to decode them.
[0039] Furthermore, a three-dimensional data encoding device according to one aspect of the present disclosure comprises a processor and a memory, the processor using the memory to encode a plurality of three-dimensional points, each located in one of a plurality of regions, for each region, and generates connection information based on the relationship between a predetermined region among the plurality of regions and a plurality of other regions among the plurality of regions other than the predetermined region, the connection information including (i) tile information indicating a value uniquely assigned to each of the plurality of regions, and (ii) association information indicating that there is a relationship between the predetermined region and the other regions based on the tile information, and generates a bitstream including the generated connection information and the encoded plurality of three-dimensional points.
[0040] For example, a three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in a bitstream. Here, for example, if the three-dimensional data represents a three-dimensional map, when a user views the map, the user often wants to see the map around the center of their current location. In such a case, the three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in the bitstream and displays an image showing the multiple three-dimensional points on a display device in the order they were decoded. However, this may take time for the location the user wants to see to be displayed on the display device. Also, depending on the user's current location, there may be three-dimensional points that do not need to be decoded. Therefore, the three-dimensional data encoding device generates tile information that shows a value uniquely assigned to each of the multiple regions based on the relationships between multiple regions, and connection information that shows the relationship between a predetermined region and other regions based on the tile information. According to this, for example, when the three-dimensional data decoding device receives information indicating the user's current location from a device owned by the user, it can determine a predetermined region based on that information and decode the three-dimensional points encoded based on the connection information sequentially from the determined predetermined region. According to this, the three-dimensional data decoding device can decode encoded three-dimensional points in order, starting from those located in areas that are likely to be desired by the user. In other words, the three-dimensional data encoding device according to this disclosure can appropriately select and decode a desired number of encoded three-dimensional points from among a plurality of encoded three-dimensional points.
[0041] Furthermore, a three-dimensional data decoding device according to one aspect of the present disclosure comprises a processor and a memory, wherein the processor uses the memory to acquire a bitstream containing a plurality of three-dimensional points encoded for each region, and acquires connection information generated based on the relationship between a predetermined region among the plurality of regions and a plurality of other regions among the plurality of regions other than the predetermined region, which includes (i) tile information indicating a value uniquely assigned to each of the plurality of regions, and (ii) association information indicating that there is a relationship between the predetermined region and the other regions based on the tile information, and selectively decodes the plurality of encoded three-dimensional points for each region based on the acquired connection information.
[0042] For example, a three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in a bitstream. Here, for example, if the three-dimensional data represents a three-dimensional map, when a user views the map, the user often wants to see the map around the center of their current location. In such a case, the three-dimensional data decoding device sequentially decodes multiple three-dimensional points encoded in the order of the data contained in the bitstream and displays an image showing the multiple three-dimensional points on a display device in the order they were decoded. However, this may take time for the location the user wants to see to be displayed on the display device. Also, depending on the user's current location, there may be three-dimensional points that do not need to be decoded. Therefore, the three-dimensional data encoding device generates tile information that shows a value uniquely assigned to each of the multiple regions based on the relationships between multiple regions, and connection information that shows the relationship between a predetermined region and other regions based on the tile information. According to this, for example, when the three-dimensional data decoding device receives information indicating the user's current location from a device owned by the user, it can determine a predetermined region based on that information and decode the three-dimensional points encoded based on the connection information sequentially from the determined predetermined region. According to this, the three-dimensional data decoding device can decode encoded three-dimensional points in order, starting from those located in areas that are likely to be desired by the user. In other words, the three-dimensional data decoding device according to this disclosure can appropriately select and decode a desired number of encoded three-dimensional points from among a plurality of encoded three-dimensional points.
[0043] These comprehensive or specific embodiments may be implemented as a system, method, integrated circuit, computer program, or recording medium such as a computer-readable CD-ROM, or as any combination of a system, method, integrated circuit, computer program, and recording medium.
[0044] The embodiments will be described in detail below with reference to the drawings. Note that the embodiments described below are all specific examples of this disclosure. The numerical values, shapes, materials, components, arrangement and connection configurations of components, steps, and the order of steps shown in the following embodiments are examples and are not intended to limit this disclosure. Furthermore, among the components in the following embodiments, those not described in the independent claim representing the highest-level concept will be described as optional components.
[0045] (Embodiment 1) When using encoded point cloud data in actual devices or services, it is desirable to send and receive necessary information depending on the application in order to reduce network bandwidth. However, until now, such functionality has not existed in the encoded structure of three-dimensional data, nor has there been an encoding method for that purpose.
[0046] This embodiment describes a three-dimensional data encoding method and a three-dimensional data encoding device for providing a function to send and receive information necessary for use in encoded data of a three-dimensional point cloud, a three-dimensional data decoding method and a three-dimensional data decoding device for decoding the encoded data, a three-dimensional data multiplexing method for multiplexing the encoded data, and a three-dimensional data transmission method for transmitting the encoded data.
[0047] In particular, while two encoding methods (encoding schemes) for point cloud data are currently being considered, the structure of the encoded data and the method for storing the encoded data in a system format have not been defined. As a result, there is a problem in that MUX processing (multiplexing), transmission, or storage cannot be performed in the encoding unit.
[0048] Furthermore, there has been no existing method to support formats like PCC (Point Cloud Compression) that use a mixture of two codecs, a first encoding method and a second encoding method.
[0049] This embodiment describes the structure of PCC encoded data in which two codecs, a first encoding method and a second encoding method, coexist, and a method for storing the encoded data in a system format.
[0050] First, the configuration of the three-dimensional data (point cloud data) encoding and decoding system according to this embodiment will be described. Figure 1 is a diagram showing an example of the configuration of the three-dimensional data encoding and decoding system according to this embodiment. As shown in Figure 1, the three-dimensional data encoding and decoding system includes a three-dimensional data encoding system 4601, a three-dimensional data decoding system 4602, a sensor terminal 4603, and an external connection unit 4604.
[0051] The three-dimensional data encoding system 4601 generates encoded data or multiplexed data by encoding point cloud data, which is three-dimensional data. The three-dimensional data encoding system 4601 may be a three-dimensional data encoding device implemented by a single device, or it may be a system implemented by multiple devices. Furthermore, the three-dimensional data encoding device may include some of the multiple processing units included in the three-dimensional data encoding system 4601.
[0052] The three-dimensional data encoding system 4601 includes a point cloud data generation system 4611, a presentation unit 4612, an encoding unit 4613, a multiplexing unit 4614, an input / output unit 4615, and a control unit 4616. The point cloud data generation system 4611 includes a sensor information acquisition unit 4617 and a point cloud data generation unit 4618.
[0053] The sensor information acquisition unit 4617 acquires sensor information from the sensor terminal 4603 and outputs the sensor information to the point cloud data generation unit 4618. The point cloud data generation unit 4618 generates point cloud data from the sensor information and outputs the point cloud data to the encoding unit 4613.
[0054] The display unit 4612 presents sensor information or point cloud data to the user. For example, the display unit 4612 displays information or images based on sensor information or point cloud data.
[0055] The encoding unit 4613 encodes (compresses) the point cloud data and outputs the resulting encoded data, control information obtained during the encoding process, and other additional information to the multiplexing unit 4614. The additional information includes, for example, sensor information.
[0056] The multiplexing unit 4614 generates multiplexed data by multiplexing the encoded data input from the encoding unit 4613, control information, and additional information. The format of the multiplexed data is, for example, a file format for storage or a packet format for transmission.
[0057] The input / output unit 4615 (for example, the communication unit or interface) outputs the multiplexed data to the outside. Alternatively, the multiplexed data is stored in a storage unit such as internal memory. The control unit 4616 (or application execution unit) controls each processing unit. In other words, the control unit 4616 performs control such as encoding and multiplexing.
[0058] The sensor information may also be input to the encoding unit 4613 or the multiplexing unit 4614. Furthermore, the input / output unit 4615 may output the point cloud data or encoded data directly to the outside.
[0059] The transmission signal (multiplexed data) output from the three-dimensional data encoding system 4601 is input to the three-dimensional data decoding system 4602 via the external connection unit 4604.
[0060] The three-dimensional data decoding system 4602 generates point cloud data, which is three-dimensional data, by decoding encoded data or multiplexed data. The three-dimensional data decoding system 4602 may be a three-dimensional data decoding device implemented by a single device, or it may be a system implemented by multiple devices. Furthermore, the three-dimensional data decoding device may include some of the multiple processing units included in the three-dimensional data decoding system 4602.
[0061] The three-dimensional data decoding system 4602 includes a sensor information acquisition unit 4621, an input / output unit 4622, a demultiplexing unit 4623, a decoding unit 4624, a presentation unit 4625, a user interface 4626, and a control unit 4627.
[0062] The sensor information acquisition unit 4621 acquires sensor information from the sensor terminal 4603.
[0063] The input / output unit 4622 acquires the transmission signal, decodes the multiplexed data (file format or packet) from the transmission signal, and outputs the multiplexed data to the demultiplexing unit 4623.
[0064] The demultiplexing unit 4623 acquires encoded data, control information, and additional information from the multiplexed data, and outputs the encoded data, control information, and additional information to the decoding unit 4624.
[0065] The decoding unit 4624 reconstructs the point cloud data by decoding the encoded data.
[0066] The presentation unit 4625 presents point cloud data to the user. For example, the presentation unit 4625 displays information or images based on the point cloud data. The user interface 4626 acquires instructions based on user operations. The control unit 4627 (or application execution unit) controls each processing unit. In other words, the control unit 4627 performs control such as demultiplexing, decoding, and presentation.
[0067] The input / output unit 4622 may acquire point cloud data or encoded data directly from an external source. The presentation unit 4625 may acquire additional information such as sensor information and present information based on that additional information. The presentation unit 4625 may also make presentations based on user instructions acquired through the user interface 4626.
[0068] The sensor terminal 4603 generates sensor information, which is information obtained from the sensor. The sensor terminal 4603 is a terminal equipped with a sensor or camera, and may be, for example, a mobile object such as an automobile, an aerial object such as an airplane, a mobile terminal, or a camera.
[0069] The sensor information that can be acquired by the sensor terminal 4603 includes, for example, (1) the distance between the sensor terminal 4603 and the object, or the reflectivity of the object, obtained from a LiDAR, millimeter-wave radar, or infrared sensor, and (2) the distance between the camera and the object, or the reflectivity of the object, obtained from multiple monocular camera images or stereo camera images. The sensor information may also include the sensor's attitude, orientation, gyroscope (angular velocity), position (GPS information or altitude), speed, or acceleration. The sensor information may also include temperature, atmospheric pressure, humidity, or magnetism.
[0070] The external connection unit 4604 is implemented by an integrated circuit (LSI or IC), an external storage unit, communication with a cloud server via the internet, or broadcasting, etc.
[0071] Next, we will explain point cloud data. Figure 2 shows the structure of point cloud data. Figure 3 shows an example of the structure of a data file containing information about point cloud data.
[0072] Point cloud data contains data for multiple points. Each point's data includes location information (three-dimensional coordinates) and attribute information related to that location. A collection of these points is called a point cloud. For example, a point cloud represents the three-dimensional shape of an object.
[0073] Position information, such as three-dimensional coordinates, is sometimes referred to as geometry. Furthermore, the data for each point may include attribute information of multiple attribute types. Attribute types include, for example, color or reflectance.
[0074] One location information may be associated with one attribute information, or multiple attribute information of different attribute types may be associated with one location information. Furthermore, multiple attribute information of the same attribute type may be associated with one location information.
[0075] The example data file structure shown in Figure 3 represents a case where location information and attribute information correspond one-to-one, and it shows the location information and attribute information of the N points that make up the point cloud data.
[0076] Location information includes, for example, information for the three axes: x, y, and z. Attribute information includes, for example, RGB color information. A typical data file is a ply file.
[0077] Next, we will explain the types of point cloud data. Figure 4 is a diagram illustrating the types of point cloud data. As shown in Figure 4, point cloud data includes static objects and dynamic objects.
[0078] A static object is three-dimensional point cloud data for any given time (a specific moment). A dynamic object is three-dimensional point cloud data that changes over time. Hereafter, three-dimensional point cloud data for a given time will be referred to as a PCC frame, or simply a frame.
[0079] The object can be a point cloud with a somewhat limited area, like regular video data, or it can be a large-scale point cloud with no area limitations, like map information.
[0080] Furthermore, point cloud data of various densities may exist, including both sparse and dense point cloud data.
[0081] The details of each processing unit are described below. Sensor information is acquired by various methods, such as distance sensors like LIDAR or rangefinders, stereo cameras, or combinations of multiple monocular cameras. The point cloud data generation unit 4618 generates point cloud data based on the sensor information obtained by the sensor information acquisition unit 4617. The point cloud data generation unit 4618 generates position information as point cloud data and adds attribute information to the position information.
[0082] The point cloud data generation unit 4618 may process the point cloud data when generating position information or adding attribute information. For example, the point cloud data generation unit 4618 may reduce the amount of data by deleting point clouds with overlapping positions. The point cloud data generation unit 4618 may also transform the position information (such as position shifting, rotation, or normalization) or render the attribute information.
[0083] In Figure 1, the point cloud data generation system 4611 is included in the three-dimensional data encoding system 4601, but it may also be provided independently outside of the three-dimensional data encoding system 4601.
[0084] The encoding unit 4613 generates encoded data by encoding the point cloud data based on a predetermined encoding method. There are two main types of encoding methods. The first is an encoding method using positional information, which will be referred to as the first encoding method hereafter. The second is an encoding method using a video codec, which will be referred to as the second encoding method hereafter.
[0085] The decoding unit 4624 decodes the point cloud data by decoding the encoded data based on a predetermined encoding method.
[0086] The multiplexing unit 4614 generates multiplexed data by multiplexing the encoded data using an existing multiplexing method. The generated multiplexed data is transmitted or stored. In addition to PCC encoded data, the multiplexing unit 4614 multiplexes other media such as video, audio, subtitles, applications, files, or reference time information. Furthermore, the multiplexing unit 4614 may also multiplex attribute information related to sensor information or point cloud data.
[0087] Multiplexing methods or file formats include ISOBMFF, ISOBMFF-based transmission methods such as MPEG-DASH, MMT, MPEG-2 TS Systems, and RMP.
[0088] The demultiplexing unit 4623 extracts PCC encoded data, other media, and time information from the multiplexed data.
[0089] The input / output unit 4615 transmits the multiplexed data using a method appropriate to the transmission medium or storage medium, such as broadcasting or communication. The input / output unit 4615 may communicate with other devices via the internet or with storage units such as cloud servers.
[0090] Communication protocols such as HTTP, FTP, TCP, or UDP can be used. Either a pull-type or push-type communication method may be employed.
[0091] Either wired or wireless transmission may be used. Wired transmission methods include Ethernet®, USB, RS-232C, HDMI®, or coaxial cable. Wireless transmission methods include wireless LAN, Wi-Fi®, Bluetooth®, or millimeter wave.
[0092] Furthermore, broadcasting formats such as DVB-T2, DVB-S2, DVB-C2, ATSC3.0, or ISDB-S3 may be used.
[0093] Figure 5 shows the configuration of a first encoding unit 4630, which is an example of an encoding unit 4613 that performs encoding using the first encoding method. Figure 6 is a block diagram of the first encoding unit 4630. The first encoding unit 4630 generates encoded data (encoded stream) by encoding point cloud data using the first encoding method. This first encoding unit 4630 includes a location information encoding unit 4631, an attribute information encoding unit 4632, an additional information encoding unit 4633, and a multiplexing unit 4634.
[0094] The first encoding unit 4630 is characterized by performing encoding while being aware of the three-dimensional structure. Furthermore, the first encoding unit 4630 is characterized by the attribute information encoding unit 4632 performing encoding using information obtained from the location information encoding unit 4631. The first encoding method is also called GPCC (Geometry-based PCC).
[0095] The point cloud data is PCC point cloud data such as a PLY file, or PCC point cloud data generated from sensor information, and includes position information, attribute information, and other additional information (metadata). The position information is input to the position information encoding unit 4631, the attribute information is input to the attribute information encoding unit 4632, and the additional information is input to the additional information encoding unit 4633.
[0096] The location information encoding unit 4631 generates encoded location information (Compressed Geometry), which is encoded data, by encoding location information. For example, the location information encoding unit 4631 encodes location information using an N-tree structure such as an octree. Specifically, in an octree, the target space is divided into 8 nodes (subspaces), and 8 bits of information (occupancy code) are generated to indicate whether or not a point cloud is contained in each node. Furthermore, nodes containing point clouds are further divided into 8 nodes, and 8 bits of information are generated to indicate whether or not a point cloud is contained in each of these 8 nodes. This process is repeated until the number of point clouds contained in a predetermined hierarchy or node falls below a threshold.
[0097] The attribute information encoding unit 4632 generates encoded attribute information (Compressed Attribute), which is encoded data, by encoding it using the configuration information generated by the location information encoding unit 4631. For example, the attribute information encoding unit 4632 determines the reference point (reference node) to be referenced in encoding the target point (target node) to be processed, based on the octave tree structure generated by the location information encoding unit 4631. For example, the attribute information encoding unit 4632 references a surrounding node or adjacent node whose parent node in the octave tree is the same as the target node. Note that the method for determining the reference relationship is not limited to this.
[0098] Furthermore, the attribute information encoding process may include at least one of the following: quantization, prediction, and arithmetic encoding. In this case, a reference means using a reference node to calculate the predicted value of the attribute information, or using the state of a reference node (for example, occupancy information indicating whether or not the reference node contains a point cloud) to determine the encoding parameters. For example, encoding parameters may be quantization parameters in the quantization process, or context in arithmetic encoding.
[0099] The additional information encoding unit 4633 generates encoded data, or compressed additional information (Compressed MetaData), by encoding the compressible data from the additional information.
[0100] The multiplexing unit 4634 generates a compressed stream, which is encoded data, by multiplexing encoded position information, encoded attribute information, encoded additional information, and other additional information. The generated compressed stream is output to a processing unit of the system layer (not shown).
[0101] Next, we will describe a first decoding unit 4640, which is an example of a decoding unit 4624 that performs decoding of the first encoding method. Figure 7 is a diagram showing the configuration of the first decoding unit 4640. Figure 8 is a block diagram of the first decoding unit 4640. The first decoding unit 4640 generates point cloud data by decoding the encoded data (encoded stream) encoded by the first encoding method using the first encoding method. This first decoding unit 4640 includes a demultiplexing unit 4641, a location information decoding unit 4642, an attribute information decoding unit 4643, and an additional information decoding unit 4644.
[0102] A compressed stream, which is encoded data, is input to the first decoding unit 4640 from a processing unit of the system layer (not shown).
[0103] The demultiplexing unit 4641 separates encoded location information (Compressed Geometry), encoded attribute information (Compressed Attribute), encoded additional information (Compressed MetaData), and other additional information from the encoded data.
[0104] The location information decoding unit 4642 generates location information by decoding the encoded location information. For example, the location information decoding unit 4642 reconstructs the location information of a point cloud represented by three-dimensional coordinates from encoded location information represented by an N-tree structure such as an octree.
[0105] The attribute information decoding unit 4643 decodes the encoded attribute information based on the configuration information generated by the location information decoding unit 4642. For example, the attribute information decoding unit 4643 determines the reference point (reference node) to be referenced in the decoding of the target point (target node) to be processed, based on the octave tree structure obtained by the location information decoding unit 4642. For example, the attribute information decoding unit 4643 references a surrounding node or adjacent node whose parent node in the octave tree is the same as the target node. Note that the method for determining the reference relationship is not limited to this.
[0106] Furthermore, the attribute information decoding process may include at least one of the following: inverse quantization, prediction, and arithmetic decoding. In this case, "reference" means using a reference node to calculate the predicted value of the attribute information, or using the state of the reference node (for example, occupancy information indicating whether or not the reference node contains a point cloud) to determine the decoding parameters. For example, decoding parameters may be quantization parameters in the inverse quantization process, or context in arithmetic decoding.
[0107] The additional information decoding unit 4644 generates additional information by decoding the encoded additional information. The first decoding unit 4640 uses the additional information necessary for decoding location information and attribute information during decoding and outputs the additional information necessary for the application to the outside.
[0108] Next, an example of the configuration of the location information coding unit will be described. Figure 9 is a block diagram of the location information coding unit 2700 according to this embodiment. The location information coding unit 2700 comprises an octree generation unit 2701, a geometric information calculation unit 2702, a coding table selection unit 2703, and an entropy coding unit 2704.
[0109] The octree generation unit 2701 generates an octree from the input position information and generates occupancy codes for each node in the octree. The geometric information calculation unit 2702 obtains information indicating whether the adjacent nodes of the target node are occupied nodes or not. For example, the geometric information calculation unit 2702 calculates the occupancy information of adjacent nodes (information indicating whether the adjacent node is an occupied node or not) from the occupancy code of the parent node to which the target node belongs. Alternatively, the geometric information calculation unit 2702 may store the encoded nodes in a list and search for adjacent nodes from that list. The geometric information calculation unit 2702 may also switch adjacent nodes depending on the position of the target node within the parent node.
[0110] The coding table selection unit 2703 selects a coding table to be used for entropy coding of the target node using the occupancy information of adjacent nodes calculated by the geometric information calculation unit 2702. For example, the coding table selection unit 2703 may generate a bit sequence using the occupancy information of adjacent nodes and select a coding table with an index number generated from that bit sequence.
[0111] The entropy coding unit 2704 generates coded location information and metadata by performing entropy coding on the occupancy code of the target node using the coding table of the selected index number. The entropy coding unit 2704 may also add information indicating the selected coding table to the coded location information.
[0112] The following describes the octree representation and the scanning order of location information. Location information (location data) is converted into an octree structure (octreeization) and then encoded. An octree structure consists of nodes and leaves. Each node has eight nodes or leaves, and each leaf has voxel (VXL) information. Figure 10 shows an example of the structure of location information containing multiple voxels. Figure 11 shows an example of the location information shown in Figure 10 converted into an octree structure. Here, among the leaves shown in Figure 11, leaves 1, 2, and 3 represent the voxels VXL1, VXL2, and VXL3 shown in Figure 10, respectively, and represent VXL containing the point cloud (hereinafter referred to as effective VXL).
[0113] Specifically, node 1 corresponds to the overall space encompassing the positional information in Figure 10. The overall space corresponding to node 1 is divided into eight nodes, and of these eight nodes, the node containing the valid VXL is further divided into eight nodes or leaves, and this process is repeated for each level of the tree structure. Here, each node corresponds to a subspace and holds information (occupancy code) indicating the position of the next node or leaf after the division as node information. In addition, the lowest-level block is set as a leaf, and leaf information such as the number of points contained within the leaf is held.
[0114] Next, an example of the configuration of the location information decoding unit will be described. Figure 12 is a block diagram of the location information decoding unit 2710 according to this embodiment. The location information decoding unit 2710 comprises an octree generation unit 2711, a geometric information calculation unit 2712, an encoding table selection unit 2713, and an entropy decoding unit 2714.
[0115] The octane tree generation unit 2711 generates an octane tree of a given space (node) using the header information or metadata of the bitstream. For example, the octane tree generation unit 2711 generates a large space (root node) using the size of the x, y, and z axes of a given space attached to the header information, and then generates an octane tree by dividing that space into two in each of the x, y, and z axes to generate eight small spaces A (nodes A0 to A7). Also, nodes A0 to A7 are set in order as the target nodes.
[0116] The geometric information calculation unit 2712 obtains occupancy information indicating whether an adjacent node to the target node is an occupied node. For example, the geometric information calculation unit 2712 calculates the occupancy information of an adjacent node from the occupancy code of the parent node to which the target node belongs. Alternatively, the geometric information calculation unit 2712 may store the decoded nodes in a list and search for adjacent nodes from that list. The geometric information calculation unit 2712 may also switch adjacent nodes depending on the position of the target node within its parent node.
[0117] The coding table selection unit 2713 selects a coding table (decoding table) to be used for entropy decoding of the target node using the occupancy information of adjacent nodes calculated by the geometric information calculation unit 2712. For example, the coding table selection unit 2713 may generate a bit sequence using the occupancy information of adjacent nodes and select a coding table with an index number generated from that bit sequence.
[0118] The entropy decoding unit 2714 generates location information by entropy decoding the occupancy code of the target node using the selected coding table. Alternatively, the entropy decoding unit 2714 may decode and obtain the information of the selected coding table from the bitstream, and then entropy decode the occupancy code of the target node using the coding table indicated by that information.
[0119] The configuration of the attribute information encoding unit and the attribute information decoding unit will be described below. Figure 13 is a block diagram showing an example configuration of the attribute information encoding unit A100. The attribute information encoding unit may include multiple encoding units that perform different encoding methods. For example, the attribute information encoding unit may switch between the following two methods depending on the use case.
[0120] The attribute information encoding unit A100 includes the LoD attribute information encoding unit A101 and the transformation attribute information encoding unit A102. The LoD attribute information encoding unit A101 uses the positional information of the three-dimensional points to classify each three-dimensional point into multiple layers, predicts the attribute information of the three-dimensional points belonging to each layer, and encodes the predicted residual. Here, each classified layer is called LoD (Level of Detail).
[0121] The attribute information encoding unit A102 encodes attribute information using RAHT (Region Adaptive Hierarchical Transform). Specifically, the attribute information encoding unit A102 applies RAHT or Haar transform to each attribute information based on the position information of three-dimensional points to generate high-frequency and low-frequency components of each layer, and encodes these values using quantization and entropy coding, etc.
[0122] Figure 14 is a block diagram showing an example configuration of the attribute information decoding unit A110. The attribute information decoding unit may include multiple decoding units that perform different decoding methods. For example, the attribute information decoding unit may decode by switching between the following two methods based on the information contained in the header and metadata.
[0123] The attribute information decoding unit A110 includes the LoD attribute information decoding unit A111 and the converted attribute information decoding unit A112. The LoD attribute information decoding unit A111 classifies each three-dimensional point into multiple layers using the positional information of the three-dimensional points, and decodes the attribute values while predicting the attribute information of the three-dimensional points belonging to each layer.
[0124] The attribute information decoding unit A112 decodes attribute information using RAHT (Region Adaptive Hierarchical Transform). Specifically, the attribute information decoding unit A112 decodes attribute values by applying inverse RAHT or inverse Haar transform to the high-frequency and low-frequency components of each attribute value based on the position information of the three-dimensional points.
[0125] Figure 15 is a block diagram showing the configuration of an attribute information encoding unit 3140, which is an example of an LoD attribute information encoding unit A101.
[0126] The attribute information coding unit 3140 includes an LoD generation unit 3141, a surrounding search unit 3142, a prediction unit 3143, a prediction residual calculation unit 3144, a quantization unit 3145, an arithmetic coding unit 3146, an inverse quantization unit 3147, a decoded value generation unit 3148, and a memory 3149.
[0127] The LoD generation unit 3141 generates LoDs using the positional information of three-dimensional points.
[0128] The surrounding search unit 3142 uses the LoD generation result from the LoD generation unit 3141 and distance information indicating the distance between each three-dimensional point to search for neighboring three-dimensional points adjacent to each three-dimensional point.
[0129] The prediction unit 3143 generates predicted values for the attribute information of the target three-dimensional point to be encoded.
[0130] The prediction residual calculation unit 3144 calculates (generates) the prediction residual of the predicted value of the attribute information generated by the prediction unit 3143.
[0131] The quantization unit 3145 quantizes the predicted residuals of the attribute information calculated by the predicted residual calculation unit 3144.
[0132] The arithmetic coding unit 3146 arithmetically codes the predicted residuals after quantization by the quantization unit 3145. The arithmetic coding unit 3146 outputs the bitstream containing the arithmetic coded predicted residuals to, for example, a three-dimensional data decoding device.
[0133] The predicted residual may be binarized, for example, by the quantization unit 3145 before being arithmetic-coded by the arithmetic coding unit 3146.
[0134] Furthermore, for example, the arithmetic coding unit 3146 may initialize the coding table used for arithmetic coding before arithmetic coding. The arithmetic coding unit 3146 may initialize the coding table used for arithmetic coding for each layer. In addition, the arithmetic coding unit 3146 may output information indicating the position of the layer in which the coding table was initialized in the bitstream.
[0135] The inverse quantization unit 3147 inversely quantizes the predicted residual after it has been quantized by the quantization unit 3145.
[0136] The decoded value generation unit 3148 generates a decoded value by adding the predicted value of the attribute information generated by the prediction unit 3143 and the predicted residual after inverse quantization by the inverse quantization unit 3147.
[0137] Memory 3149 is a memory that stores the decoded values of the attribute information of each three-dimensional point decoded by the decoded value generation unit 3148. For example, when the prediction unit 3143 generates prediction values for three-dimensional points that have not yet been encoded, it uses the decoded values of the attribute information of each three-dimensional point stored in memory 3149 to generate the prediction values.
[0138] Figure 16 is a block diagram of an attribute information encoding unit 6600, which is an example of a conversion attribute information encoding unit A102. The attribute information encoding unit 6600 comprises a sorting unit 6601, a Haar conversion unit 6602, a quantization unit 6603, an inverse quantization unit 6604, an inverse Haar conversion unit 6605, a memory 6606, and an arithmetic encoding unit 6607.
[0139] The sorting unit 6601 generates Morton codes using the position information of three-dimensional points and sorts multiple three-dimensional points in Morton code order. The Haar transform unit 6602 generates coding coefficients by applying the Haar transform to the attribute information. The quantization unit 6603 quantizes the coding coefficients of the attribute information.
[0140] The inverse quantization unit 6604 inversely quantizes the encoded coefficients after quantization. The inverse Haar transform unit 6605 applies the inverse Haar transform to the encoded coefficients. The memory 6606 stores the attribute information values of multiple decoded three-dimensional points. For example, the attribute information of decoded three-dimensional points stored in the memory 6606 may be used for predicting unencoded three-dimensional points.
[0141] The arithmetic coding unit 6607 calculates ZeroCnt from the quantized coding coefficients and arithmetically codes ZeroCnt. The arithmetic coding unit 6607 also arithmetically codes the non-zero coding coefficients after quantization. The arithmetic coding unit 6607 may also binarize the coding coefficients before arithmetic coding. Furthermore, the arithmetic coding unit 6607 may generate and code various header information.
[0142] Figure 17 is a block diagram showing the configuration of an attribute information decoding unit 3150, which is an example of an LoD attribute information decoding unit A111.
[0143] The attribute information decoding unit 3150 includes an LoD generation unit 3151, a surrounding search unit 3152, a prediction unit 3153, an arithmetic decoding unit 3154, an inverse quantization unit 3155, a decoded value generation unit 3156, and a memory 3157.
[0144] The LoD generation unit 3151 generates LoD using the position information of the three-dimensional points decoded by a position information decoding unit (not shown in FIG. 17).
[0145] The surrounding search unit 3152 searches for neighboring three-dimensional points adjacent to each three-dimensional point using the LoD generation result by the LoD generation unit 3151 and distance information indicating the distance between each three-dimensional point.
[0146] The prediction unit 3153 generates a predicted value of the attribute information of the target three-dimensional point to be decoded.
[0147] The arithmetic decoding unit 3154 arithmetically decodes the prediction residual in the bit stream obtained from the attribute information encoding unit 3140 shown in FIG. 15. The arithmetic decoding unit 3154 may initialize the decoding table used for arithmetic decoding. The arithmetic decoding unit 3154 initializes the decoding table used for arithmetic decoding for the layer on which the arithmetic encoding unit 3146 shown in FIG. 15 performed the encoding process. The arithmetic decoding unit 3154 may initialize the decoding table for each layer. Also, the arithmetic decoding unit 3154 may initialize the decoding table based on the information indicating the position of the layer for which the encoding table was initialized, included in the bit stream.
[0148] The inverse quantization unit 3155 inverse quantizes the prediction residual arithmetically decoded by the arithmetic decoding unit 3154.
[0149] The decoded value generation unit 3156 generates a decoded value by adding the predicted value generated by the prediction unit 3153 and the prediction residual after being inverse quantized by the inverse quantization unit 3155. The decoded value generation unit 3156 outputs the decoded attribute information data to another device.
[0150] The memory 3157 is a memory that stores the decoded values of the attribute information of each three-dimensional point decoded by the decoded value generation unit 3156. For example, when the prediction unit 3153 generates a predicted value of a three-dimensional point that has not yet been decoded, the prediction unit 3153 generates a predicted value using the decoded values of the attribute information of each three-dimensional point stored in the memory 3157.
[0151] FIG. 18 is a block diagram of an attribute information decoding unit 6610 which is an example of the conversion attribute information decoding unit A112. The attribute information decoding unit 6610 includes an arithmetic decoding unit 6611, an inverse quantization unit 6612, an inverse Haar transform unit 6613, and a memory 6614.
[0152] The arithmetic decoding unit 6611 arithmetically decodes ZeroCnt and the encoding coefficients included in the bit stream. Note that the arithmetic decoding unit 6611 may decode various header information.
[0153] The inverse quantization unit 6612 inverse-quantizes the arithmetically decoded encoding coefficients. The inverse Haar transform unit 6613 applies an inverse Haar transform to the encoding coefficients after inverse quantization. The memory 6614 stores the values of the attribute information of a plurality of decoded three-dimensional points. For example, the decoded attribute information of the three-dimensional points stored in the memory 6614 may be used for prediction of undecoded three-dimensional points.
[0154] Next, a second encoding unit 4650 which is an example of an encoding unit 4613 that performs encoding using the second encoding method will be described. FIG. 19 is a diagram showing the configuration of the second encoding unit 4650. FIG. 20 is a block diagram of the second encoding unit 4650.
[0155] The second encoding unit 4650 generates encoded data (encoded stream) by encoding point cloud data using the second encoding method. This second encoding unit 4650 includes an additional information generation unit 4651, a position image generation unit 4652, an attribute image generation unit 4653, a video encoding unit 4654, an additional information encoding unit 4655, and a multiplexing unit 4656.
[0156] The second encoding unit 4650 has a feature of generating a position image and an attribute image by projecting a three-dimensional structure onto a two-dimensional image, and encoding the generated position image and attribute image using an existing video encoding method. The second encoding method is also called VPCC (Video based PCC).
[0157] The point cloud data is PCC point cloud data such as a PLY file, or PCC point cloud data generated from sensor information, and includes position information, attribute information, and other additional information (metadata).
[0158] The additional information generation unit 4651 generates map information for multiple two-dimensional images by projecting a three-dimensional structure onto a two-dimensional image.
[0159] The position image generation unit 4652 generates a position image (geometry image) based on position information and map information generated by the additional information generation unit 4651. This position image is, for example, a depth image in which the distance is indicated as a pixel value. This depth image may be an image of multiple point clouds viewed from one viewpoint (an image of multiple point clouds projected onto a single two-dimensional plane), or multiple images of multiple point clouds viewed from multiple viewpoints, or a single image formed by integrating these multiple images.
[0160] The attribute image generation unit 4653 generates an attribute image based on attribute information and map information generated by the additional information generation unit 4651. This attribute image is, for example, an image in which attribute information (e.g., color (RGB)) is shown as pixel values. This image may be an image of multiple point clouds viewed from one viewpoint (an image of multiple point clouds projected onto a single two-dimensional plane), or multiple images of multiple point clouds viewed from multiple viewpoints, or a single image formed by integrating these multiple images.
[0161] The video encoding unit 4654 generates encoded data, namely a compressed geometric image and a compressed attribute image, by encoding the position image and attribute image using a video encoding scheme. Any known encoding scheme may be used as the video encoding scheme. For example, the video encoding scheme may be AVC or HEVC.
[0162] The additional information encoding unit 4655 generates encoded additional information (Compressed MetaData) by encoding additional information and map information included in the point cloud data.
[0163] The multiplexing unit 4656 generates a compressed stream, which is encoded data, by multiplexing the encoded position image, encoded attribute image, encoded additional information, and other additional information. The generated compressed stream is output to a processing unit of the system layer (not shown).
[0164] Next, we will describe a second decoding unit 4660, which is an example of a decoding unit 4624 that performs decoding of the second encoding method. Figure 21 is a diagram showing the configuration of the second decoding unit 4660. Figure 22 is a block diagram of the second decoding unit 4660. The second decoding unit 4660 generates point cloud data by decoding the encoded data (encoded stream) encoded by the second encoding method using the second encoding method. This second decoding unit 4660 includes a demultiplexing unit 4661, a video decoding unit 4662, an additional information decoding unit 4663, a location information generation unit 4664, and an attribute information generation unit 4665.
[0165] A compressed stream, which is encoded data, is input to the second decoding unit 4660 from a processing unit of the system layer (not shown).
[0166] The demultiplexing unit 4661 separates the encoded location image (Compressed Geometry Image), encoded attribute image (Compressed Attribute Image), encoded additional information (Compressed MetaData), and other additional information from the encoded data.
[0167] The video decoding unit 4662 generates a position image and an attribute image by decoding the encoded position image and the encoded attribute image using a video encoding scheme. Any known encoding scheme may be used as the video encoding scheme. For example, the video encoding scheme may be AVC or HEVC.
[0168] The additional information decoding unit 4663 generates additional information including map information and the like by decoding the encoded additional information.
[0169] The position information generation unit 4664 generates position information using the position image and map information. The attribute information generation unit 4665 generates attribute information using the attribute image and map information.
[0170] The second decoding unit 4660 uses the additional information necessary for decoding during decoding and outputs the additional information necessary for the application to the outside.
[0171] Hereinafter, the problems in the PCC encoding method will be described. FIG. 23 is a diagram showing a protocol stack related to PCC encoded data. FIG. 23 shows an example in which data of other media such as video (for example, HEVC) or audio is multiplexed with the PCC encoded data and transmitted or stored.
[0172] The multiplexing method and file format have functions for multiplexing various encoded data and transmitting or storing it. In order to transmit or store the encoded data, the encoded data must be converted into the format of the multiplexing method. For example, in HEVC, a technique of storing the encoded data in a data structure called a NAL unit and storing the NAL unit in ISOBMFF is defined.
[0173] On the other hand, currently, a first encoding method (Codec1) and a second encoding method (Codec2) are being considered as encoding methods for point cloud data, but the configuration of the encoded data and the method of storing the encoded data in the system format are not defined, and there is a problem that MUX processing (multiplexing), transmission, and storage in the encoding unit cannot be performed as it is.
[0174] In the following, unless a specific encoding method is described, it shall indicate either the first encoding method or the second encoding method.
[0175] (Embodiment 2) This embodiment describes the types of encoded data (geometry, attribute, and metadata) generated by the first encoding unit 4630 or the second encoding unit 4650 described above, the method for generating metadata, and the multiplexing process in the multiplexing unit. Note that metadata may also be referred to as parameter sets or control information.
[0176] In this embodiment, we will explain using the dynamic object (three-dimensional point cloud data that changes over time) described in Figure 4 as an example, but the same method may be used for static objects (three-dimensional point cloud data at any given time).
[0177] Figure 24 shows the configuration of the encoding unit 4801 and the multiplexing unit 4802 included in the three-dimensional data encoding device according to this embodiment. The encoding unit 4801 corresponds, for example, to the first encoding unit 4630 or the second encoding unit 4650 described above. The multiplexing unit 4802 corresponds to the multiplexing unit 4634 or 4656 described above.
[0178] The encoding unit 4801 encodes point cloud data from multiple PCC (Point Cloud Compression) frames and generates encoded data (Multiple Compressed Data) containing multiple location information, attribute information, and additional information.
[0179] The multiplexing unit 4802 converts data of multiple data types (location information, attribute information, and additional information) into NAL units, thereby transforming the data into a data configuration that takes into account data access by the decoding device.
[0180] Figure 25 shows an example of the structure of encoded data generated by the encoding unit 4801. The arrows in the figure indicate dependencies related to the decoding of encoded data, with the source of the arrow depending on the data at the end of the arrow. In other words, the decoding device decodes the data at the end of the arrow and uses that decoded data to decode the source of the arrow. To put it another way, dependency means that the dependent data is referenced (used) in the processing of the dependent data (encoding or decoding, etc.).
[0181] First, let's explain the process of generating encoded location data. The encoding unit 4801 generates compressed location data (Compressed Geometry Data) for each frame by encoding the location information of each frame. The encoded location data is represented by G(i), where i represents the frame number or the time of the frame.
[0182] Furthermore, the encoding unit 4801 generates a position parameter set (GPS(i)) corresponding to each frame. The position parameter set includes parameters that can be used to decode the encoded position data. Also, the encoded position data for each frame depends on the corresponding position parameter set.
[0183] Furthermore, encoded position data consisting of multiple frames is defined as a position sequence (Geometry Sequence). The encoding unit 4801 generates a position sequence parameter set (Geometry Sequence PS: also written as Position SPS) that stores parameters commonly used for decoding multiple frames within the position sequence. The position sequence depends on the Position SPS.
[0184] Next, the process for generating encoded attribute data will be explained. The encoding unit 4801 generates compressed attribute data for each frame by encoding the attribute information of each frame. The compressed attribute data is represented by A(i). Figure 25 shows an example where attribute X and attribute Y exist, with the compressed attribute data for attribute X represented by AX(i) and the compressed attribute data for attribute Y represented by AY(i).
[0185] Furthermore, the encoding unit 4801 generates an attribute parameter set (APS(i)) corresponding to each frame. The attribute parameter set for attribute X is represented by AXPS(i), and the attribute parameter set for attribute Y is represented by AYPS(i). The attribute parameter set includes parameters that can be used to decode the encoded attribute information. The encoded attribute data depends on the corresponding attribute parameter set.
[0186] Furthermore, encoded attribute data consisting of multiple frames is defined as an attribute sequence. The encoding unit 4801 generates an attribute sequence parameter set (Attribute Sequence PS, also written as attribute SPS) that stores parameters commonly used for decoding multiple frames within the attribute sequence. The attribute sequence depends on the attribute SPS.
[0187] Furthermore, in the first encoding method, the encoded attribute data depends on the encoded position data.
[0188] Figure 25 also shows an example where there are two types of attribute information (attribute X and attribute Y). When there are two types of attribute information, for example, two encoding units generate the respective data and metadata. Also, for example, an attribute sequence is defined for each type of attribute information, and an attribute SPS is generated for each type of attribute information.
[0189] Note that Figure 25 shows an example where there is one type of positional information and two types of attribute information, but the example is not limited to this; there may be one type of attribute information or three or more types. In this case as well, encoded data can be generated in the same way. Furthermore, in the case of point cloud data that does not have attribute information, attribute information is not required. In that case, the encoding unit 4801 does not need to generate a parameter set related to attribute information.
[0190] Next, the process of generating additional information (metadata) will be described. The encoding unit 4801 generates a PCC stream PS (also written as Stream PS), which is a parameter set for the entire PCC stream. The encoding unit 4801 stores in Stream PS parameters that can be used in common for decoding one or more location sequences and one or more attribute sequences. For example, Stream PS includes identification information indicating the codec of the point cloud data, and information indicating the algorithm used for encoding. The location sequences and attribute sequences depend on Stream PS.
[0191] Next, we will explain the Access Unit and GOF. In this embodiment, we introduce the new concepts of Access Unit (AU) and GOF (Group of Frame).
[0192] An access unit is the basic unit for accessing data during decryption, and consists of one or more data points and one or more metadata points. For example, an access unit consists of location information at the same time and one or more attribute information points. A GOF (Group of Four) is a random access unit and consists of one or more access units.
[0193] The encoding unit 4801 generates an access unit header (AU Header) as identification information indicating the beginning of an access unit. The encoding unit 4801 stores parameters related to the access unit in the access unit header. For example, the access unit header includes the structure or information of the encoded data contained in the access unit. The access unit header also includes parameters commonly used in the data contained in the access unit, such as parameters related to the decoding of the encoded data.
[0194] The encoding unit 4801 may generate an access unit delimiter that does not include parameters related to the access unit, instead of an access unit header. This access unit delimiter is used as identification information to indicate the beginning of the access unit. The decoding device identifies the beginning of the access unit by detecting the access unit header or the access unit delimiter.
[0195] Next, the generation of identification information at the beginning of the GOF will be explained. The encoding unit 4801 generates a GOF header as identification information indicating the beginning of the GOF. The encoding unit 4801 stores parameters related to the GOF in the GOF header. For example, the GOF header includes the structure or information of the encoded data included in the GOF. The GOF header also includes parameters commonly used in the data included in the GOF, such as parameters related to decoding the encoded data.
[0196] The encoding unit 4801 may generate a GOF delimiter that does not include parameters related to the GOF, instead of a GOF header. This GOF delimiter is used as identification information to indicate the beginning of the GOF. The decoding device identifies the beginning of the GOF by detecting either the GOF header or the GOF delimiter.
[0197] In PCC encoded data, for example, an access unit is defined as a PCC frame. The decoder accesses the PCC frame based on the identification information at the beginning of the access unit.
[0198] Furthermore, for example, a GOF (Group of Frames) is defined as a single random access unit. The decryption device accesses the random access unit based on the identification information at the beginning of the GOF. For example, if PCC frames are independent of each other and can be decrypted individually, then a PCC frame may be defined as a random access unit.
[0199] Furthermore, two or more PCC frames may be assigned to a single access unit, and multiple random access units may be assigned to a single GOF.
[0200] Furthermore, the encoding unit 4801 may define and generate parameter sets or metadata other than those described above. For example, the encoding unit 4801 may generate SEI (Supplemental Enhancement Information) that stores parameters that may not necessarily be used during decoding (optional parameters).
[0201] Next, we will explain the structure of the encoded data and how to store the encoded data in the NAL unit.
[0202] For example, a data format is defined for each type of encoded data. Figure 26 shows an example of encoded data and a NAL unit.
[0203] For example, as shown in Figure 26, encoded data includes a header and a payload. The encoded data may also include length information indicating the length (data volume) of the encoded data, header, or payload. Furthermore, the encoded data does not necessarily have to include a header.
[0204] The header includes, for example, identification information to identify the data. This identification information may indicate, for example, the data type or frame number.
[0205] The header contains, for example, identification information indicating a reference relationship. This identification information is stored in the header when there is a dependency between data, and it is information used to reference the referenced data from the source. For example, the header of the referenced data contains identification information to identify that data. The header of the referenced data contains identification information indicating the referenced data.
[0206] Furthermore, if the referenced or source can be identified or derived from other information, the identifying information for identifying the data or identifying information indicating the reference relationship may be omitted.
[0207] The multiplexing unit 4802 stores the encoded data in the payload of the NAL unit. The NAL unit header contains pcc_nal_unit_type, which is identification information for the encoded data. Figure 27 shows an example of the semantics of pcc_nal_unit_type.
[0208] As shown in Figure 27, when pcc_codec_type is Codec 1 (Codec1: First encoding method), values of pcc_nal_unit_type from 0 to 10 are assigned to the encoded position data (Geometry), encoded attribute X data (AttributeX), encoded attribute Y data (AttributeY), position PS (Geom.PS), attribute XPS (AttrX.PS), attribute YPS (AttrX.PS), position SPS (Geometry Sequence PS), attribute XSPS (AttributeX Sequence PS), attribute YSPS (AttributeY Sequence PS), AU header (AU Header), and GOF header (GOF Header) in Codec 1. Values 11 and above are assigned to the reserves of Codec 1.
[0209] If pcc_codec_type is Codec 2 (the second encoding method), values of pcc_nal_unit_type from 0 to 2 are assigned to the codec's Data A, Metadata A, and Metadata B. Values 3 and above are assigned to the backup of Codec 2.
[0210] (Embodiment 3) HEVC coding has data partitioning tools such as slicing or tiling to enable parallel processing in the decoding device, but PCC (Point Cloud Compression) coding does not yet have such tools.
[0211] In PCC, various data partitioning methods are possible depending on parallel processing, compression efficiency, and compression algorithms. This section explains the definitions of slices and tiles, data structures, and transmission / reception methods.
[0212] Figure 28 is a block diagram showing the configuration of a first encoding unit 4910 included in the three-dimensional data encoding device according to this embodiment. The first encoding unit 4910 generates encoded data (encoded stream) by encoding point cloud data using a first encoding method (GPCC (Geometry based PCC)). This first encoding unit 4910 includes a division unit 4911, a plurality of position information encoding units 4912, a plurality of attribute information encoding units 4913, an additional information encoding unit 4914, and a multiplexing unit 4915.
[0213] The division unit 4911 generates multiple divided data by dividing the point cloud data. Specifically, the division unit 4911 generates multiple divided data by dividing the space of the point cloud data into multiple subspaces. Here, a subspace is either a tile or a slice, or a combination of a tile and a slice. More specifically, the point cloud data includes location information, attribute information, and additional information. The division unit 4911 divides the location information into multiple divided location information and the attribute information into multiple divided attribute information. The division unit 4911 also generates additional information related to the division.
[0214] Multiple location information encoding units 4912 generate multiple encoded location information by encoding multiple divided location information. For example, multiple location information encoding units 4912 process multiple divided location information in parallel.
[0215] The multiple attribute information encoding unit 4913 generates multiple encoded attribute information by encoding multiple divided attribute information. For example, the multiple attribute information encoding unit 4913 processes multiple divided attribute information in parallel.
[0216] The additional information encoding unit 4914 generates encoded additional information by encoding the additional information contained in the point cloud data and the additional information related to data division generated by the division unit 4911 during division.
[0217] The multiplexing unit 4915 generates encoded data (encoded stream) by multiplexing multiple encoded position information, multiple encoded attribute information, and encoded additional information, and transmits the generated encoded data. The encoded additional information is used during decoding.
[0218] In Figure 28, an example is shown where there are two location information encoding units 4912 and two attribute information encoding units 4913. However, the number of location information encoding units 4912 and attribute information encoding units 4913 may be one or three or more. Furthermore, multiple divided data may be processed in parallel within the same chip, such as multiple cores in a CPU, or in parallel across the cores of multiple chips, or across multiple cores on multiple chips.
[0219] Figure 29 is a block diagram showing the configuration of the first decoding unit 4920. The first decoding unit 4920 restores point cloud data by decoding encoded data (encoded stream) generated when point cloud data is encoded using a first encoding method (GPCC). This first decoding unit 4920 includes a demultiplexing unit 4921, multiple location information decoding units 4922, multiple attribute information decoding units 4923, an additional information decoding unit 4924, and a coupling unit 4925.
[0220] The demultiplexing unit 4921 generates multiple encoded position information, multiple encoded attribute information, and encoded additional information by demultiplexing the encoded data (encoded stream).
[0221] The multiple location information decoding units 4922 generate multiple segmented location information by decoding multiple encoded location information. For example, the multiple location information decoding units 4922 process multiple encoded location information in parallel.
[0222] The multiple attribute information decoding unit 4923 generates multiple segmented attribute information by decoding multiple encoded attribute information. For example, the multiple attribute information decoding unit 4923 processes multiple encoded attribute information in parallel.
[0223] Multiple additional information decoding units 4924 generate additional information by decoding encoded additional information.
[0224] The coupling unit 4925 generates position information by combining multiple division position information using additional information. The coupling unit 4925 generates attribute information by combining multiple division attribute information using additional information.
[0225] In Figure 29, an example is shown where there are two location information decoding units 4922 and two attribute information decoding units 4923. However, the number of location information decoding units 4922 and attribute information decoding units 4923 may be one or three or more. Furthermore, multiple divided data may be processed in parallel within the same chip, such as multiple cores in a CPU, or in parallel across the cores of multiple chips, or across multiple cores on multiple chips.
[0226] Next, the configuration of the division section 4911 will be described. Figure 30 is a block diagram of the division section 4911. The division section 4911 includes a slice division section 4931, a geometry tile division section 4932, and an attribute tile division section 4933.
[0227] The slice division unit 4931 generates multiple slice position information by dividing position information (Position(Geometry)) into slices. The slice division unit 4931 also generates multiple slice attribute information by dividing attribute information (Attribute) into slices. Furthermore, the slice division unit 4931 outputs slice additional information (SliceMetaData) which includes information related to slice division and information generated during slice division.
[0228] The location information tile division unit 4932 generates multiple divided location information (multiple tile location information) by dividing multiple slice location information into tiles. The location information tile division unit 4932 also outputs location tile additional information (Geometry Tile MetaData) which includes information related to the tile division of the location information and information generated in the tile division of the location information.
[0229] The attribute information tile division unit 4933 generates multiple divided attribute information (multiple tile attribute information) by dividing multiple slice attribute information into tiles. The attribute information tile division unit 4933 also outputs attribute tile additional information (Attribute Tile MetaData) which includes information related to the tile division of attribute information and information generated in the tile division of attribute information.
[0230] The number of slices or tiles to be divided must be one or more. In other words, it is not necessary to divide the slices or tiles.
[0231] Furthermore, while this example shows tile division after slicing, slicing may also be performed after tile division. In addition to slicing and tiling, new division types may be defined, and division may be performed using three or more division types.
[0232] The following describes methods for dividing point cloud data. Figure 31 shows examples of slicing and tiling.
[0233] First, let's explain the slicing method. The splitting unit 4911 divides the three-dimensional point cloud data into arbitrary point clouds in slice units. In slicing, the splitting unit 4911 does not separate the position information and attribute information that constitute a point, but rather divides the position information and attribute information together. That is, the splitting unit 4911 performs slicing so that the position information and attribute information at any given point belong to the same slice. Note that the number of divisions and the division method can be any method as long as these are followed. Also, the smallest unit of division is a point. For example, the number of divisions for position information and attribute information is the same. For example, the three-dimensional point corresponding to the position information after slicing and the three-dimensional point corresponding to the attribute information are included in the same slice.
[0234] Furthermore, the division unit 4911 generates slice supplemental information, which is additional information relating to the number of divisions and the division method, during slice division. The slice supplemental information is the same for both positional information and attribute information. For example, the slice supplemental information includes information indicating the reference coordinate position, size, or side length of the bounding box after division. The slice supplemental information also includes information indicating the number of divisions and the division type.
[0235] Next, the method of tile division will be explained. The division unit 4911 divides the sliced data into slice position information (G slice) and slice attribute information (A slice), and then divides the slice position information and slice attribute information into tile units.
[0236] Although Figure 31 shows an example of partitioning using an octave tree structure, the number of partitions and the partitioning method can be any method.
[0237] Furthermore, the division unit 4911 may divide the position information and attribute information using different division methods, or it may divide them using the same division method. Also, the division unit 4911 may divide multiple slices into tiles using different division methods, or it may divide them into tiles using the same division method.
[0238] Furthermore, the division unit 4911 generates tile addition information related to the number of divisions and the division method when dividing tiles. The tile addition information (position tile addition information and attribute tile addition information) is independent of the position information and attribute information. For example, the tile addition information includes information indicating the reference coordinate position, size, or side length of the bounding box after division. The tile addition information also includes information indicating the number of divisions and the division type.
[0239] Next, an example of a method for dividing point cloud data into slices or tiles will be described. The division unit 4911 may use a predetermined method for slicing or tiling, or it may adaptively switch the method used depending on the point cloud data.
[0240] During slicing, the division unit 4911 divides the three-dimensional space collectively based on positional information and attribute information. For example, the division unit 4911 determines the shape of an object and divides the three-dimensional space into slices according to the object's shape. For example, the division unit 4911 extracts objects such as trees or buildings and divides them on an object-by-object basis. For example, the division unit 4911 performs slicing so that the entirety of one or more objects is included in one slice. Alternatively, the division unit 4911 divides a single object into multiple slices.
[0241] In this case, the encoding device may, for example, change the encoding method for each slice. For example, the encoding device may use a high-quality compression method for a specific object or a specific part of an object. In this case, the encoding device may store information indicating the encoding method for each slice in additional information (metadata).
[0242] Furthermore, the division unit 4911 may perform slicing based on map information or location information so that each slice corresponds to a predetermined coordinate space.
[0243] When dividing into tiles, the division unit 4911 divides the position information and attribute information independently. For example, the division unit 4911 divides a slice into tiles according to the amount of data or processing load. For example, the division unit 4911 determines whether the amount of data in a slice (for example, the number of three-dimensional points included in the slice) is greater than a predetermined threshold. If the amount of data in a slice is greater than the threshold, the division unit 4911 divides the slice into tiles. If the amount of data in a slice is less than the threshold, the division unit 4911 does not divide the slice into tiles.
[0244] For example, the division unit 4911 divides the slice into tiles so that the processing amount or processing time in the decoding device is within a certain range (less than or equal to a predetermined value). This ensures that the processing amount per tile in the decoding device is constant, facilitating distributed processing in the decoding device.
[0245] Furthermore, if the processing load differs between location information and attribute information, for example, if the processing load of location information is greater than that of attribute information, the division unit 4911 will increase the number of divisions for location information to more than the number of divisions for attribute information.
[0246] Furthermore, for example, if the decoding device may decode and display location information quickly and decode and display attribute information later at a slower pace depending on the content, the division unit 4911 may divide the location information into more divisions than the attribute information. This allows the decoding device to process location information in parallel, thus enabling faster processing of location information than of attribute information.
[0247] Furthermore, the decoding device does not necessarily need to process the sliced or tiled data in parallel; it may decide whether or not to process them in parallel depending on the number or capacity of the decoding processing units.
[0248] By dividing the data in the manner described above, adaptive encoding can be achieved according to the content or object. Furthermore, parallel processing can be implemented in the decoding process. This improves the flexibility of the point cloud coding system or point cloud decoding system.
[0249] Figure 32 shows examples of slice and tile division patterns. In the figure, DU stands for Data Unit, representing data for a tile or slice. Each DU also includes a Slice Index and a Tile Index. The number in the upper right corner of the DU indicates the Slice Index, and the number in the lower left corner indicates the Tile Index.
[0250] In Pattern 1, the number of divisions and the division method are the same for G slices and A slices in slice partitioning. In tile partitioning, the number of divisions and the division method for G slices are different from those for A slices. Also, the same number of divisions and division method are used between multiple G slices. The same number of divisions and division method are used between multiple A slices.
[0251] In Pattern 2, the number of divisions and the division method are the same for G slices and A slices in slice partitioning. In tile partitioning, the number of divisions and the division method for G slices are different from those for A slices. Also, the number of divisions and the division method differ between multiple G slices. The number of divisions and the division method differ between multiple A slices.
[0252] Next, the method for encoding the divided data will be described. The three-dimensional data encoding device (first encoding unit 4910) encodes each of the divided data. When encoding attribute information, the three-dimensional data encoding device generates dependency information as additional information, indicating which configuration information (location information, additional information, or other attribute information) was used as the basis for encoding. In other words, the dependency information indicates, for example, the configuration information of the reference (dependent). In this case, the three-dimensional data encoding device generates the dependency information based on the configuration information corresponding to the division shape of the attribute information. Note that the three-dimensional data encoding device may generate dependency information based on configuration information corresponding to multiple division shapes.
[0253] Dependency information may be generated by a three-dimensional data encoding device and sent to a three-dimensional data decoding device. Alternatively, the three-dimensional data decoding device may generate the dependency information, and the three-dimensional data encoding device may not send it. Furthermore, the dependencies used by the three-dimensional data encoding device may be predetermined, and the three-dimensional data encoding device may not send the dependency information.
[0254] Figure 33 shows an example of the dependencies between data. In the figure, the tip of the arrow indicates the dependent data, and the base of the arrow indicates the dependent data. The three-dimensional data decoding device decodes the data in the order of dependent data to dependent data. Also, in the figure, data shown with solid lines are data that is actually transmitted, and data shown with dotted lines are data that is not transmitted.
[0255] In the same figure, G indicates location information and A indicates attribute information. Gs1 indicates location information for slice number 1, and Gs2 indicates location information for slice number 2. Gs1t1 indicates location information for both slice number 1 and tile number 1, Gs1t2 indicates location information for both slice number 1 and tile number 2, Gs2t1 indicates location information for both slice number 2 and tile number 1, and Gs2t2 indicates location information for both slice number 2 and tile number 2. Similarly, As1 indicates attribute information for slice number 1, and As2 indicates attribute information for slice number 2. As1t1 indicates attribute information for both slice number 1 and tile number 1, As1t2 indicates attribute information for both slice number 1 and tile number 2, As2t1 indicates attribute information for both slice number 2 and tile number 1, and As2t2 indicates attribute information for both slice number 2 and tile number 2.
[0256] Mslice indicates slice additional information, MGtile indicates position tile additional information, and MAtile indicates attribute tile additional information. Ds1t1 indicates dependency information of attribute information As1t1, and Ds2t1 indicates dependency information of attribute information As2t1.
[0257] Furthermore, the three-dimensional data encoding device may rearrange the data in the order of decoding so that the three-dimensional data decoding device does not need to rearrange the data. Alternatively, the data may be rearranged in the three-dimensional data decoding device, or both the three-dimensional data encoding device and the three-dimensional data decoding device may rearrange the data.
[0258] Figure 34 shows an example of the data decoding order. In the example in Figure 34, decoding is performed sequentially from left to right. The 3D data decoding device decodes dependent data first among dependent data. For example, the 3D data encoding device pre-arranges and sends the data in this order. Any order is acceptable as long as the dependent data comes first. The 3D data encoding device may also send additional information and dependency information before the data.
[0259] Figure 35 is a flowchart showing the processing flow by the three-dimensional data encoding device. First, the three-dimensional data encoding device encodes data from multiple slices or tiles as described above (S4901). Next, the three-dimensional data encoding device rearranges the data so that dependent data comes first, as shown in Figure 34 (S4902). Next, the three-dimensional data encoding device multiplexes (NAL units) the rearranged data (S4903).
[0260] Next, the configuration of the coupling unit 4925 included in the first decoding unit 4920 will be described. Figure 36 is a block diagram showing the configuration of the coupling unit 4925. The coupling unit 4925 includes a geoometry tile combiner 4941, an attribute tile combiner 4942, and a slice combiner.
[0261] The position information tile joining unit 4941 generates multiple slice position information by joining multiple divided position information using position tile additional information. The attribute information tile joining unit 4942 generates multiple slice attribute information by joining multiple divided attribute information using attribute tile additional information.
[0262] The slice joining unit 4943 generates position information by combining multiple slice position information using slice additional information. Furthermore, the slice joining unit 4943 generates attribute information by combining multiple slice attribute information using slice additional information.
[0263] The number of slices or tiles to be divided must be one or more. In other words, the slices or tiles do not need to be divided at all.
[0264] Furthermore, while this example shows tile division after slicing, slicing may also be performed after tile division. In addition to slicing and tiling, new division types may be defined, and division may be performed using three or more division types.
[0265] Next, the structure of the sliced or tiled encoded data and the method of storing the encoded data in the NAL unit (multiplexing method) will be explained. Figure 37 shows the structure of the encoded data and the method of storing the encoded data in the NAL unit.
[0266] The encoded data (splitting position information and splitting attribute information) is stored in the NAL unit's payload.
[0267] Encoded data includes a header and a payload. The header includes identification information to identify the data contained in the payload. This identification information includes, for example, the type of slice or tile division (slice_type, tile_type), index information to identify the slice or tile (slice_idx, tile_idx), location information of the data (slice or tile), or the address of the data (address). Index information to identify a slice is also written as SliceIndex. Index information to identify a tile is also written as TileIndex. The type of division can be, for example, a method based on the object shape as described above, a method based on map information or location information, or a method based on the amount of data or processing amount.
[0268] Furthermore, all or part of the above information may be stored in one of the headers of the partitioned location information and the partitioned attribute information, but not in the other. For example, if the same partitioning method is used for both location information and attribute information, the partitioning type (slice_type, tile_type) and index information (slice_idx, tile_idx) will be the same for both location information and attribute information. Therefore, this information may be included in one of the headers of the location information or attribute information. For example, if attribute information depends on location information, the location information is processed first. Therefore, this information may be included in the header of the location information, but not in the header of the attribute information. In this case, the three-dimensional data decoding device will determine, for example, that the dependent attribute information belongs to the same slice or tile as the dependent location information slice or tile.
[0269] Furthermore, additional information related to slice or tile division (slice additional information, location tile additional information, or attribute tile additional information), and dependency information indicating dependencies, etc., may be stored in an existing parameter set (GPS, APS, location SPS, or attribute SPS, etc.) and transmitted. If the division method changes from frame to frame, information indicating the division method may be stored in the parameter set for each frame (GPS or APS, etc.). If the division method does not change within a sequence, information indicating the division method may be stored in the parameter set for each sequence (location SPS or attribute SPS). Moreover, if the same division method is used for location information and attribute information, information indicating the division method may be stored in the parameter set of the PCC stream (stream PS).
[0270] Furthermore, the above information may be stored in any of the parameter sets described above, or in multiple parameter sets. Alternatively, a parameter set for tiling or slicing may be defined, and the above information may be stored in that parameter set. Additionally, this information may be stored in the header of the encoded data.
[0271] Furthermore, the header of the encoded data includes identification information indicating dependencies. In other words, if there are dependencies between data, the header includes identification information for referencing the dependent data from the dependent data source. For example, the header of the dependent data includes identification information to identify that data. The header of the dependent data includes identification information indicating the dependent data. Note that if the identification information for identifying the data, additional information related to slicing or tiling, and identification information indicating dependencies can be identified or derived from other information, this information may be omitted.
[0272] Next, the flow of the point cloud data encoding and decoding processes according to this embodiment will be described. Figure 38 is a flowchart of the point cloud data encoding process according to this embodiment.
[0273] First, the three-dimensional data encoding device determines the division method to be used (S4911). This division method includes whether or not to perform slicing division or tiling division. The division method may also include the number of divisions if slicing or tiling division is performed, and the type of division. The type of division refers to methods based on object shape, methods based on map information or location information, or methods based on data volume or processing volume, as described above. The division method may also be predetermined.
[0274] If slice splitting is performed (Yes in S4912), the three-dimensional data encoding device generates multiple slice location information and multiple slice attribute information by splitting the location information and attribute information together (S4913). The three-dimensional data encoding device also generates slice addition information related to the slice splitting. The three-dimensional data encoding device may also split the location information and attribute information independently.
[0275] If tile division is performed (Yes in S4914), the three-dimensional data encoding device generates multiple division position information and multiple division attribute information by independently dividing multiple slice position information and multiple slice attribute information (or position information and attribute information) (S4915). The three-dimensional data encoding device also generates position tile addition information and attribute tile addition information related to tile division. The three-dimensional data encoding device may divide the slice position information and slice attribute information together.
[0276] Next, the three-dimensional data encoding device generates multiple encoded location information and multiple encoded attribute information by encoding each of the multiple division location information and multiple division attribute information (S4916). The three-dimensional data encoding device also generates dependency information.
[0277] Next, the three-dimensional data encoding device generates encoded data (encoded stream) by NAL unitizing (multiplexing) multiple encoded position information, multiple encoded attribute information, and additional information (S4917). The three-dimensional data encoding device also transmits the generated encoded data.
[0278] Figure 39 is a flowchart of the point cloud data decoding process according to this embodiment. First, the three-dimensional data decoding device determines the division method by analyzing the additional information related to the division method (slice additional information, position tile additional information, and attribute tile additional information) contained in the encoded data (encoded stream) (S4921). This division method includes whether or not to perform slice division and whether or not to perform tile division. The division method may also include the number of divisions and the type of division when slice division or tile division is performed.
[0279] Next, the three-dimensional data decoding device generates partitioning location information and partitioning attribute information by decoding multiple encoded position information and multiple encoded attribute information contained in the encoded data using dependency information contained in the encoded data (S4922).
[0280] If the additional information indicates that tile division has been performed (Yes in S4923), the three-dimensional data decoding device generates multiple slice position information and multiple slice attribute information by combining multiple division position information and multiple division attribute information in their respective methods, based on the position tile additional information and attribute tile additional information (S4924). The three-dimensional data decoding device may combine the multiple division position information and multiple division attribute information in the same method.
[0281] If the additional information indicates that slice division has been performed (Yes in S4925), the three-dimensional data decoding device generates position information and attribute information by combining multiple slice position information and multiple slice attribute information (multiple division position information and multiple division attribute information) in the same way based on the slice additional information (S4926). The three-dimensional data decoding device may combine the multiple slice position information and multiple slice attribute information in different ways.
[0282] Furthermore, attribute information for tiles or slices (identifier, area information, address information, and location information, etc.) may be stored not only in SEI but also in other control information. For example, attribute information may be stored in control information that shows the overall structure of the PCC data, or it may be stored in control information for each tile or slice.
[0283] Furthermore, when a three-dimensional data encoding device (three-dimensional data transmission device) transmits PCC data to another device, it may convert control information such as SEI into control information specific to the protocol of that system and present it accordingly.
[0284] For example, when a three-dimensional data encoding device converts PCC data containing attribute information to ISOBMFF (ISO Base Media File Format), it may store the SEI together with the PCC data in an "mdat box," or it may store it in a "track box" that contains control information related to the stream. In other words, the three-dimensional data encoding device may store the control information in a table for random access. Furthermore, when the three-dimensional data encoding device packets and transmits PCC data, it may store the SEI in the packet header. By making attribute information available at the system layer in this way, access to attribute information and tile data or slice data becomes easier, and the access speed can be improved.
[0285] In the configuration of the three-dimensional data decoding device, the memory management unit may determine in advance whether the information necessary for the decoding process is in memory, and if the information necessary for the decoding process is not available, it may obtain the information from storage or the network.
[0286] When a three-dimensional data decoding device acquires PCC data from storage or a network using Pull in a protocol such as MPEG-DASH, the memory management unit may identify the attribute information of the data necessary for decoding based on information from the localization unit or the like, request tiles or slices containing the identified attribute information, and acquire the necessary data (PCC stream). The identification of tiles or slices containing attribute information may be performed on the storage or network side, or by the memory management unit. For example, the memory management unit may acquire the SEI of all PCC data in advance and identify tiles or slices based on that information.
[0287] If all PCC data is transmitted from storage or the network using Push via the UDP protocol, the memory management unit may, based on information from the localization unit or the like, identify the data attribute information and tiles or slices necessary for decryption processing, and obtain the desired data by filtering the desired tiles or slices from the transmitted PCC data.
[0288] Furthermore, the three-dimensional data encoding device may determine, when acquiring data, whether the desired data exists, whether real-time processing is possible based on the data size, etc., or the communication status, etc. If the three-dimensional data encoding device determines, based on this determination result, that data acquisition is difficult, it may select and acquire a different slice or tile with a different priority or data volume.
[0289] Alternatively, the three-dimensional data decoding device may transmit information from the localization unit or other sources to a cloud server, which may then determine the necessary information based on that information.
[0290] (Embodiment 4) Next, we will explain the tile appending information. The three-dimensional data encoding device generates tile appending information, which is metadata about the tile division method, and transmits the generated tile appending information to the three-dimensional data decoding device.
[0291] Figure 40 shows an example of the syntax for tile metadata (TileMetaData). As shown in Figure 40, for example, tile metadata includes division method information (type_of_divide), shape information (topview_shape), overlap flag (tile_overlap_flag), overlap information (type_of_overlap), height information (tile_height), number of tiles (tile_number), and tile position information (global_position, relative_position).
[0292] The division method information (type_of_divide) indicates how the tiles are divided. For example, the division method information indicates whether the tiles are divided based on map information, i.e., based on a top view (top_view), or something else (other).
[0293] Shape information (topview_shape) is included in the tile information when, for example, the tile division method is based on a top view. Shape information indicates the shape of the tile when viewed from above. For example, this shape includes squares and circles. This shape may also include polygons other than ellipses, rectangles, or quadrilaterals, or other shapes. Furthermore, shape information is not limited to the shape of the tile when viewed from above, but may also indicate the three-dimensional shape of the tile (for example, cubes and cylinders).
[0294] The tile_overlap_flag indicates whether tiles overlap or not. For example, the tile overlap flag is included in the tile information when the tile division method is based on a top view. In this case, the tile overlap flag indicates whether tiles overlap in a top view. The tile overlap flag may also indicate whether tiles overlap in three-dimensional space.
[0295] The overlap information (type_of_overlap) is included in the tile information when tiles overlap, for example. The overlap information indicates how the tiles overlap, such as the size of the overlapping area.
[0296] The height information (tile_height) indicates the height of the tile. The height information may also include information indicating the shape of the tile. For example, if the tile is rectangular when viewed from above, this information may indicate the lengths of the sides (vertical and horizontal lengths) of that rectangle. Alternatively, if the tile is circular when viewed from above, this information may indicate the diameter or radius of that circle.
[0297] Furthermore, the height information may indicate the height of each tile, or it may indicate a common height for multiple tiles. Alternatively, multiple height types for roads and overpasses may be predefined, and the height information may indicate the height of each height type and the height type of each tile. Or, the height of each height type may be predefined, and the height information may indicate the height type of each tile. In other words, the height of each height type does not necessarily have to be indicated by the height information.
[0298] The tile_number indicates the number of tiles. Note that tile information may also include information indicating the spacing between tiles.
[0299] Tile position information (global_position, relative_position) is information used to identify the location of each tile. For example, tile position information indicates the absolute or relative coordinates of each tile.
[0300] Furthermore, some or all of the above information may be provided for each tile, or for multiple tiles (for example, for each frame or for multiple frames).
[0301] The three-dimensional data encoding device may include the tile addition information in the SEI (Supplemental Enhancement Information) and send it. Alternatively, the three-dimensional data encoding device may store the tile addition information in an existing parameter set (PPS, GPS, or APS, etc.) and send it.
[0302] For example, if the tile information changes from frame to frame, the tile information may be stored in a parameter set for each frame (such as GPS or APS). If the tile information does not change within a sequence, the tile information may be stored in a parameter set for each sequence (location SPS or attribute SPS). Furthermore, if the same tile division information is used for both location information and attribute information, the tile information may be stored in the parameter set of the PCC stream (stream PS).
[0303] Furthermore, tile information may be stored in any of the parameter sets described above, or in multiple parameter sets. Additionally, tile information may be stored in the header of the encoded data. Furthermore, tile information may be stored in the header of the NAL unit.
[0304] Furthermore, all or part of the tile addition information may be stored in one of the headers of the division location information and the division attribute information, but not in the other. For example, if the same tile addition information is used for both location information and attribute information, the tile addition information may be included in one of the headers of the location information or attribute information. For example, if attribute information depends on location information, the location information is processed first. Therefore, the header of the location information may contain this tile addition information, while the header of the attribute information may not. In this case, the three-dimensional data decoding device will determine, for example, that the attribute information of the dependency belongs to the same tile as the tile of the location information to which it depends.
[0305] The 3D data decoding device reconstructs the tiled point cloud data based on the tile information. If there is duplicate point cloud data, the 3D data decoding device identifies the multiple duplicate point cloud data, selects one, or merges the multiple point cloud data.
[0306] Furthermore, the three-dimensional data decoding device may perform decoding using tile-added information. For example, if multiple tiles overlap, the three-dimensional data decoding device may decode each tile, perform processing using the decoded data (e.g., smoothing or filtering), and generate point cloud data. This may enable highly accurate decoding.
[0307] Figure 41 shows an example of a system configuration including a three-dimensional data encoding device and a three-dimensional data decoding device. The tile division unit 5051 divides point cloud data, including position information and attribute information, into a first tile and a second tile. The tile division unit 5051 also sends tile addition information related to tile division to the decoding unit 5053 and the tile joining unit 5054.
[0308] The encoding unit 5052 generates encoded data by encoding the first tile and the second tile.
[0309] The decoding unit 5053 reconstructs the first and second tiles by decoding the encoded data generated by the encoding unit 5052. The tile joining unit 5054 reconstructs the point cloud data (position information and attribute information) by joining the first and second tiles using the tile addition information.
[0310] Next, we will explain slice appending information. The three-dimensional data encoding device generates slice appending information, which is metadata about the slice division method, and transmits the generated slice appending information to the three-dimensional data decoding device.
[0311] Figure 42 shows an example of the syntax for slice metadata (SliceMetaData). As shown in Figure 42, for example, slice metadata includes division method information (type_of_divide), overlap flag (slice_overlap_flag), overlap information (type_of_overlap), number of slices (slice_number), slice position information (global_position, relative_position), and slice size information (slice_bounding_box_size).
[0312] The division method information (type_of_divide) indicates how the slice is divided. For example, the division method information indicates whether the slice is divided based on object information as shown in Figure 60 (object). Note that the slice supplement information may also include information indicating how the object is divided. For example, this information indicates whether one object is divided into multiple slices or assigned to one slice. This information may also indicate the number of divisions if one object is divided into multiple slices.
[0313] The overlap flag (slice_overlap_flag) indicates whether or not the slices overlap. The overlap information (type_of_overlap) is included in the slice append information, for example, if the slices overlap. The overlap information indicates how the slices overlap, for example, the size of the overlapping area.
[0314] The slice_number indicates the number of slices.
[0315] Slice position information (global_position, relative_position) and slice size information (slice_bounding_box_size) are information about the region of the slice. Slice position information is information used to identify the position of each slice. For example, slice position information indicates the absolute or relative coordinates of each slice. Slice size information (slice_bounding_box_size) indicates the size of each slice. For example, slice size information indicates the size of the bounding box of each slice.
[0316] The three-dimensional data encoding device may include slice addition information in the SEI and send it out. Alternatively, the three-dimensional data encoding device may store the slice addition information in an existing parameter set (PPS, GPS, or APS, etc.) and send it out.
[0317] For example, if slice addition information changes from frame to frame, the slice addition information may be stored in a parameter set for each frame (such as GPS or APS). If the slice addition information does not change within a sequence, the slice addition information may be stored in a parameter set for each sequence (location SPS or attribute SPS). Furthermore, if the same slice division information is used for both location information and attribute information, the slice addition information may be stored in the parameter set of the PCC stream (stream PS).
[0318] Furthermore, slice addition information may be stored in any of the parameter sets mentioned above, or in multiple parameter sets. Also, slice addition information may be stored in the header of the encoded data. Additionally, slice addition information may be stored in the header of the NAL unit.
[0319] Furthermore, all or part of the slice addition information may be stored in one of the headers of the division location information and the division attribute information, but not in the other. For example, if the same slice addition information is used for both location information and attribute information, the slice addition information may be included in one of the headers of the location information or attribute information. For example, if attribute information depends on location information, the location information is processed first. Therefore, the header of the location information may contain this slice addition information, while the header of the attribute information may not. In this case, the three-dimensional data decoding device will determine, for example, that the attribute information that depends on the location information belongs to the same slice as the slice of the location information it depends on.
[0320] The 3D data decoding device reconstructs the sliced point cloud data based on the slice addition information. If there is duplicate point cloud data, the 3D data decoding device identifies the multiple duplicate point cloud data, selects one, or merges the multiple point cloud data.
[0321] Furthermore, the three-dimensional data decoding device may perform decoding using slice-added information. For example, if multiple slices overlap, the three-dimensional data decoding device may decode each slice, perform processing (e.g., smoothing or filtering) using the decoded data, and generate point cloud data. This may enable highly accurate decoding.
[0322] Figure 43 is a flowchart of the three-dimensional data encoding process, including the generation of tile-added information, using the three-dimensional data encoding device according to this embodiment.
[0323] First, the three-dimensional data encoding device determines the method for dividing the tiles (S5031). Specifically, the three-dimensional data encoding device determines whether to use a top-view-based division method or another method. The three-dimensional data encoding device also determines the shape of the tiles when using the top-view-based division method. Furthermore, the three-dimensional data encoding device determines whether or not a tile overlaps with other tiles.
[0324] If the tile division method determined in step S5031 is a division method based on a top view (Yes in S5032), the three-dimensional data encoding device indicates in the tile addition information that the tile division method is a division method based on a top view (top_view) (S5033).
[0325] On the other hand, if the tile division method determined in step S5031 is other than the division method based on the top view (No in S5032), the three-dimensional data encoding device indicates in the tile addition information that the tile division method is other than the division method based on the top view (top_view) (S5034).
[0326] Furthermore, if the shape of the tile viewed from above, as determined in step S5031, is a square (square in S5035), the three-dimensional data encoding device records that the shape of the tile viewed from above is a square in the tile supplement information (S5036). On the other hand, if the shape of the tile viewed from above, as determined in step S5031, is a circle (circle in S5035), the three-dimensional data encoding device records that the shape of the tile viewed from above is a circle in the tile supplement information (S5037).
[0327] Next, the three-dimensional data encoding device determines whether a tile overlaps with another tile (S5038). If a tile overlaps with another tile (Yes in S5038), the three-dimensional data encoding device records that the tile overlaps in the tile information (S5039). On the other hand, if a tile does not overlap with another tile (No in S5038), the three-dimensional data encoding device records that the tile does not overlap in the tile information (S5040).
[0328] Next, the three-dimensional data encoding device divides the tiles based on the tile division method determined in step S5031, encodes each tile, and sends out the generated encoded data and tile-related information (S5041).
[0329] Figure 44 is a flowchart of the three-dimensional data decoding process using tile-added information by the three-dimensional data decoding device according to this embodiment.
[0330] First, the three-dimensional data decoding device analyzes the tile addition information contained in the bitstream (S5051).
[0331] If the tile information indicates that a tile does not overlap with other tiles (No in S5052), the 3D data decoding device generates point cloud data for each tile by decoding each tile (S5053). Next, the 3D data decoding device reconstructs point cloud data from the point cloud data of each tile based on the tile division method and tile shape indicated in the tile information (S5054).
[0332] On the other hand, if the tile addition information indicates that a tile overlaps with other tiles (Yes in S5052), the 3D data decoding device generates point cloud data for each tile by decoding each tile. The 3D data decoding device also identifies the overlapping portion of the tiles based on the tile addition information (S5055). The 3D data decoding device may use multiple pieces of overlapping information to perform the decoding process for the overlapping portion. Next, the 3D data decoding device reconstructs point cloud data from the point cloud data of each tile based on the tile division method, tile shape, and overlapping information indicated in the tile addition information (S5056).
[0333] The following describes variations related to slicing. The three-dimensional data encoding device may transmit additional information indicating the type of object (road, building, tree, etc.) or attributes (dynamic information, static information, etc.). Alternatively, encoding parameters may be predetermined according to the object, and the three-dimensional data encoding device may notify the three-dimensional data decoding device of the encoding parameters by sending the type of object or attributes.
[0334] The following methods may be used for the encoding order and transmission order of slice data. For example, the 3D data encoding device may encode slice data in order from data that is easy to recognize or cluster. Alternatively, the 3D data encoding device may encode slice data in order from slice data that has been clustered first. The 3D data encoding device may also transmit the encoded slice data in order. Alternatively, the 3D data encoding device may transmit slice data in order of the decoding priority in the application. For example, if the decoding priority of dynamic information is high, the 3D data encoding device may transmit slice data in order from slices grouped by dynamic information.
[0335] Furthermore, if the order of encoded data differs from the order of decoding priority, the three-dimensional data encoding device may rearrange the encoded data before sending it out. Also, when storing encoded data, the three-dimensional data encoding device may rearrange the encoded data before storing it.
[0336] The application (3D data decoding device) requests the server (3D data encoding device) to send slices containing the desired data. The server sends the slice data required by the application, and does not need to send unnecessary slice data.
[0337] The application requests the server to send tiles containing the desired data. The server sends the tile data that the application needs, and does not need to send any tile data that is not needed.
[0338] (Embodiment 5) This embodiment describes the processing of division units that do not contain points (e.g., tiles or slices). First, the method for dividing point cloud data will be described.
[0339] In video encoding standards such as HEVC, data exists for every pixel in a two-dimensional image. Therefore, even if the two-dimensional space is divided into multiple data regions, data exists in all data regions. On the other hand, in encoding three-dimensional point cloud data, the points themselves, which are elements of the point cloud data, are the data, and it is possible that data does not exist in some regions.
[0340] There are various methods for spatially dividing point cloud data, but these methods can be classified based on whether the divided data unit (e.g., tile or slice) always contains one or more point data points.
[0341] A division method in which each of the multiple division units contains at least one point data is called the first division method. One example of the first division method is dividing point cloud data while considering the encoding processing time or the size of the encoded data. In this case, the number of points in each division unit is approximately equal.
[0342] Figure 45 shows an example of a division method. For example, as a first division method, as shown in Figure 45(a), a method may be used to divide points belonging to the same space into two identical spaces. Alternatively, as shown in Figure 45(b), a space may be divided into multiple subspaces (division units) such that each division unit contains a point.
[0343] Because these methods involve point-based divisions, every division unit always contains at least one point.
[0344] A division method in which one or more division units may contain no point data is called a second division method. For example, as a second division method, a method of dividing the space equally can be used, as shown in Figure 45(c). In this case, points are not necessarily present in the division units. In other words, there may be cases where no points exist in the division units.
[0345] When a three-dimensional data encoding device divides point cloud data, it may indicate in the division supplement information (metadata) related to the division (e.g., tile supplement information or slice supplement information) whether (1) a division method was used in which all of the multiple division units contain one or more point data, (2) a division method was used in which one or more of the multiple division units do not contain point data, or (3) a division method was used in which one or more of the multiple division units may contain one or more division units that do not contain point data, and transmit the division supplement information.
[0346] Furthermore, the three-dimensional data encoding device may indicate the above information as the type of division method. Alternatively, the three-dimensional data encoding device may perform division using a predetermined division method and not transmit additional division information. In this case, the three-dimensional data encoding device will clearly indicate in advance whether the division method is the first division method or the second division method.
[0347] The following describes the second partitioning method and an example of generating and transmitting encoded data. While tiling is used as an example of a three-dimensional space partitioning method, it is not limited to tiling, and the following techniques can be applied to partitioning methods using different units than tiles. For example, tiling may be replaced with slicing.
[0348] Figure 46 shows an example of dividing point cloud data into six tiles. Figure 46 shows an example where the smallest unit is a point, and demonstrates how to divide both geometry and attribute information together. The same applies when geometry and attribute information are divided using separate division methods or numbers, when there is no attribute information, and when there are multiple attribute information.
[0349] In the example shown in Figure 46, after tile division, there are tiles that contain dots (#1, #2, #4, #6) and tiles that do not contain dots (#3, #5). Tiles that do not contain dots are called null tiles.
[0350] Furthermore, any method of division is acceptable, not limited to dividing into six tiles. For example, the division unit may be a cube, a rectangular prism, a cylinder, or any other shape that is not cubic. Multiple division units may be the same shape, or they may include different shapes. In addition, a predetermined method of division may be used, or different methods may be used for each predetermined unit (e.g., PCC frame).
[0351] In this partitioning method, when point cloud data is divided into tiles, if there is no data in a tile, a bitstream is generated that includes information indicating that the tile is a null tile.
[0352] The following describes the method for sending null tiles and the method for signaling null tiles. The three-dimensional data encoding device may generate and send the following information as additional information (metadata) related to data division. Figure 47 shows an example of the syntax of tile additional information (TileMetaData). The tile additional information includes division method information (type_of_divide), division method null information (type_of_divide_null), number of tile divisions (number_of_tiles), and tile null flag (tile_null_flag).
[0353] The division method information (type_of_divide) is information about the division method or division type. For example, the division method information indicates one or more division methods or division types. For example, division methods include top view division and equal division. Note that if there is only one definition for the division method, the division method information does not need to be included in the tile addition information.
[0354] The null information for the division method (type_of_divide_null) indicates whether the division method used is the first division method or the second division method described below. Here, the first division method is a division method in which all of the multiple division units always contain at least one point data. The second division method is a division method in which at least one of the multiple division units does not contain point data, or a division method in which there is a possibility that at least one of the multiple division units does not contain point data.
[0355] Furthermore, the tile information may include at least one of the following as division information for the entire tile: (1) information indicating the number of divisions of the tile (number of tile divisions (number_of_tiles)), or information for identifying the number of divisions of the tile; (2) information indicating the number of null tiles, or information for identifying the number of null tiles; and (3) information indicating the number of tiles other than null tiles, or information for identifying the number of tiles other than null tiles. In addition, the tile information may include information indicating the shape of the tile, or information indicating whether or not tiles overlap, as division information for the entire tile.
[0356] Furthermore, the tile information sequentially indicates the division information for each tile. For example, the order of the tiles is predetermined for each division method and is known to the three-dimensional data encoding device and the three-dimensional data decoding device. If the order of the tiles is not predetermined, the three-dimensional data encoding device may send information indicating the order to the three-dimensional data decoding device.
[0357] The tile division information includes a tile null flag (tile_null_flag) which indicates whether or not data (points) exist within the tile. Note that if there is no data within a tile, the tile null flag may also be included in the tile division information.
[0358] Furthermore, if a tile is not a null tile, the tile information includes segmentation information for each tile (position information (e.g., coordinates of the origin (origin_x, origin_y, origin_z)) and tile height information, etc.). However, if a tile is a null tile, the tile information does not include segmentation information for each tile.
[0359] For example, if the tile division information stores slice division information for each tile, the 3D data encoding device does not need to store slice division information for null tiles in the additional information.
[0360] In this example, the number of tiles (number_of_tiles) refers to the number of tiles, including null tiles. Figure 48 shows an example of tile index information (idx). In the example shown in Figure 48, index information is also assigned to null tiles.
[0361] Next, we will explain the data structure and transmission method of encoded data including null tiles. Figures 49 to 51 show the data structure when location information and attribute information are divided into six tiles, and no data exists in the third and fifth tiles.
[0362] Figure 49 shows an example of the dependencies between data. In the figure, the tip of the arrow indicates the dependent data, and the base of the arrow indicates the dependent data. Also in the figure, Gtn (n is 1 to 6) indicates the location information of tile number n, Atn indicates the attribute information of tile number n, and Mtile indicates additional tile information.
[0363] Figure 50 shows an example of the configuration of the output data, which is encoded data sent from a three-dimensional data encoding device. Figure 51 shows the configuration of the encoded data and the method of storing the encoded data in the NAL unit.
[0364] As shown in Figure 51, the headers of the location information (divided location information) and attribute information (divided attribute information) data each contain the tile index information (tile_idx).
[0365] Furthermore, as shown in Structure 1 of Figure 50, the three-dimensional data encoding device does not need to transmit location information or attribute information that constitutes a null tile. Alternatively, as shown in Structure 2 of Figure 50, the three-dimensional data encoding device may transmit information indicating that the tile is a null tile as data for the null tile. For example, the three-dimensional data encoding device may indicate that the type of the data is a null tile in the tile_type stored in the header of the NAL unit or in the header within the NAL unit payload (nal_unit_payload), and transmit the header. Note that the following explanation will assume Structure 1.
[0366] In Structure 1, if a null tile exists, the value of the tile index information (tile_idx) included in the header of the location information data or attribute information data in the transmitted data will be gappy and not continuous.
[0367] Furthermore, the three-dimensional data encoding device sends out data so that the referenced data is decoded before the referenced data when there are dependencies between data. Note that attribute information tiles have dependencies on location information tiles. Attribute information and location information with dependencies are assigned the same tile index number.
[0368] The tile addition information related to tile division may be stored in both the location information parameter set (GPS) and the attribute information parameter set (APS), or in either one. If tile addition information is stored in either GPS or APS, the other GPS or APS may store reference information indicating the referenced GPS or APS. Furthermore, if the tile division method differs between location information and attribute information, different tile addition information will be stored in GPS and APS, respectively. Also, if the tile division method is the same for a sequence (multiple PCC frames), the tile addition information may be stored in GPS, APS, or SPS (sequence parameter set).
[0369] For example, if tile information is stored in both GPS and APS, the GPS will store tile information for location information, and the APS will store tile information for attribute information. Also, if tile information is stored in common information such as SPS, tile information used in common for both location information and attribute information may be stored, or tile information for location information and tile information for attribute information may be stored separately.
[0370] The following explains the combination of tile partitioning and slice partitioning. First, we will explain the data structure and data transmission when tile partitioning is performed after slice partitioning.
[0371] Figure 52 shows an example of the data dependencies when tiling is performed after slicing. In the figure, the tip of the arrow indicates the dependent data, and the base of the arrow indicates the dependent data. Also, in the figure, data shown with solid lines is data that is actually sent, and data shown with dotted lines is data that is not sent.
[0372] In the same figure, G represents location information and A represents attribute information. Gs1 represents the location information of slice number 1, and Gs2 represents the location information of slice number 2. Gs1t1 represents the location information of slice number 1 and tile number 1, and Gs2t2 represents the location information of slice number 2 and tile number 2. Similarly, As1 represents the attribute information of slice number 1, and As2 represents the attribute information of slice number 2. As1t1 represents the attribute information of slice number 1 and tile number 1, and As2t1 represents the attribute information of slice number 2 and tile number 1.
[0373] Mslice indicates slice additional information, MGtile indicates position tile additional information, and MAtile indicates attribute tile additional information. Ds1t1 indicates dependency information of attribute information As1t1, and Ds2t1 indicates dependency information of attribute information As2t1.
[0374] The three-dimensional data encoding device does not need to generate and transmit positional information and attribute information related to null tiles.
[0375] Furthermore, even if the number of tile divisions is the same for all slices, the number of tiles generated and sent between slices may differ. For example, if the number of tile divisions for location information and attribute information differs, a null tile may exist in one of the location information or attribute information while it does not exist in the other. In the example shown in Figure 52, the location information (Gs1) of slice 1 is divided into two tiles, Gs1t1 and Gs1t2, of which Gs1t2 is a null tile. On the other hand, the attribute information (As1) of slice 1 is not divided and there is one As1t1, and no null tiles exist.
[0376] Furthermore, the three-dimensional data encoding device generates and transmits attribute information dependency information if data exists in at least one attribute information tile, regardless of whether a null tile is included in the slice of position information. For example, when the three-dimensional data encoding device stores slice division information for each tile in the slice division information included in the slice addition information related to slice division, it stores information in this information indicating whether or not the tile is a null tile.
[0377] Figure 53 shows an example of the data decoding order. In the example in Figure 53, decoding is performed sequentially from left to right. The 3D data decoding device decodes dependent data first among dependent data. For example, the 3D data encoding device pre-arranges and sends the data in this order. Any order is acceptable as long as the dependent data comes first. The 3D data encoding device may also send additional information and dependency information before the data.
[0378] Next, we will explain the data structure and data transmission when performing slice division after tile division.
[0379] Figure 54 shows an example of the data dependencies when slicing is performed after tiling. In the figure, the tip of the arrow indicates the dependent data, and the base of the arrow indicates the dependent data. Also, in the figure, data shown with solid lines is data that is actually sent, and data shown with dotted lines is data that is not sent.
[0380] In the same figure, G indicates location information and A indicates attribute information. Gt1 indicates the location information of tile number 1. Gt1s1 indicates the location information of tile number 1 and slice number 1, and Gt1s2 indicates the location information of tile number 1 and slice number 2. Similarly, At1 indicates the attribute information of tile number 1, and At1s1 indicates the attribute information of tile number 1 and slice number 1.
[0381] Mtile indicates tile information, MGslice indicates location slice information, and MAslice indicates attribute slice information. Dt1s1 indicates dependency information for attribute information At1s1, and Dt2s1 indicates dependency information for attribute information At2s1.
[0382] The three-dimensional data encoding device does not slice null tiles. Furthermore, it does not need to generate or transmit positional information, attribute information, or dependency information related to the null tiles.
[0383] Figure 55 shows an example of the data decoding order. In the example in Figure 55, decoding is performed sequentially from left to right. The 3D data decoding device decodes dependent data first among dependent data. For example, the 3D data encoding device pre-arranges and sends the data in this order. Any order is acceptable as long as the dependent data comes first. The 3D data encoding device may also send additional information and dependency information before the data.
[0384] Next, we will explain the process of dividing and joining point cloud data. While we will explain tile and slice division as examples, similar methods can be applied to other spatial divisions.
[0385] Figure 56 is a flowchart of the three-dimensional data encoding process, including data partitioning by a three-dimensional data encoding device. First, the three-dimensional data encoding device determines the partitioning method to be used (S5101). Specifically, the three-dimensional data encoding device decides whether to use the first partitioning method or the second partitioning method. For example, the three-dimensional data encoding device may determine the partitioning method based on a specification from the user or an external device (e.g., a three-dimensional data decoding device), or it may determine the partitioning method according to the input point cloud data. The partitioning method to be used may also be predetermined.
[0386] Here, the first division method is a division method in which all of the multiple division units (tiles or slices) always contain at least one point data. The second division method is a division method in which there is at least one division unit that does not contain point data among the multiple division units, or a division method in which there is a possibility that there is at least one division unit that does not contain point data among the multiple division units.
[0387] If the determined division method is the first division method (first division method in S5102), the three-dimensional data encoding device indicates that the division method used is the first division method in the division addition information (e.g., tile addition information or slice addition information), which is metadata related to data division (S5103). Then, the three-dimensional data encoding device encodes all division units (S5104).
[0388] On the other hand, if the determined division method is the second division method (second division method in S5102), the three-dimensional data encoding device indicates that the division method used in the division supplement information is the second division method (S5105). Then, the three-dimensional data encoding device encodes the division units from among the multiple division units, excluding division units that do not contain point data (e.g., null tiles) (S5106).
[0389] Figure 57 is a flowchart of the three-dimensional data decoding process, including data merging by the three-dimensional data decoding device. First, the three-dimensional data decoding device refers to the partitioning information contained in the bitstream and determines whether the partitioning method used is the first partitioning method or the second partitioning method (S5111).
[0390] If the division method used is the first division method (first division method in S5112), the three-dimensional data decoder receives encoded data for all division units and decodes the received encoded data to generate decoded data for all division units (S5113). Next, the three-dimensional data decoder reconstructs the three-dimensional point cloud using the decoded data for all division units (S5114). For example, the three-dimensional data decoder reconstructs the three-dimensional point cloud by combining multiple division units.
[0391] On the other hand, if the division method used is the second division method (second division method in S5112), the three-dimensional data decoding device receives encoded data of division units containing point data and encoded data of division units not containing point data, and generates decoded data by decoding the received encoded data of division units (S5115). Note that if no division units without point data are transmitted, the three-dimensional data decoding device does not need to receive and decode division units without point cloud data. Next, the three-dimensional data decoding device reconstructs the three-dimensional point cloud using the decoded data of division units containing point data (S5116). For example, the three-dimensional data decoding device reconstructs the three-dimensional point cloud by combining multiple division units.
[0392] The following describes other methods for dividing point cloud data. When dividing space equally as shown in Figure 45(c), there may be cases where no points exist in the divided space. In this case, the three-dimensional data encoding device combines the space without points with other spaces that do contain points. This allows the three-dimensional data encoding device to form multiple division units such that all division units contain at least one point.
[0393] Figure 58 is a flowchart of the data partitioning in this case. First, the three-dimensional data encoding device partitions the data in a specific way (S5121). For example, the specific way is the second partitioning method described above.
[0394] Next, the three-dimensional data encoding device determines whether or not the target division unit, which is the division unit to be processed, contains a point (S5122). If the target division unit contains a point (Yes in S5122), the three-dimensional data encoding device encodes the target division unit (S5123). On the other hand, if the target division unit does not contain a point (No in S5122), the three-dimensional data encoding device combines the target division unit with other division units that contain a point, and encodes the combined division unit (S5124). In other words, the three-dimensional data encoding device encodes the target division unit together with other division units that contain a point.
[0395] Here, we have described an example in which judgment and merging are performed for each division unit, but the processing method is not limited to this. For example, the three-dimensional data encoding device may determine whether each of the multiple division units contains a point, merge them so that there are no division units that do not contain a point, and then encode each of the merged multiple division units.
[0396] Next, we will explain how to send data that includes null tiles. The three-dimensional data encoding device does not send data for a target tile if that tile is a null tile. Figure 59 is a flowchart of the data transmission process.
[0397] First, the three-dimensional data encoding device determines the tile division method and divides the point cloud data into tiles using the determined division method (S5131).
[0398] Next, the three-dimensional data encoding device determines whether the target tile is a null tile (S5132). In other words, the three-dimensional data encoding device determines whether or not there is no data in the target tile.
[0399] If the target tile is a null tile (Yes in S5132), the 3D data encoding device indicates in the tile addition information that the target tile is a null tile, and does not indicate the information of the target tile (such as the tile's position and size) (S5133). In addition, the 3D data encoding device does not send out the target tile (S5134).
[0400] On the other hand, if the target tile is not a null tile (No in S5132), the 3D data encoding device indicates in the tile addition information that the target tile is not a null tile and displays information for each tile (S5135). The 3D data encoding device also sends out the target tile (S5136).
[0401] In this way, by not including information about null tiles in the tile appending information, the amount of information in the tile appending information can be reduced.
[0402] The following describes how to decode encoded data that includes null tiles. First, we will explain how to handle cases where there is no packet loss.
[0403] Figure 60 shows an example of transmitted data, which is encoded data sent from a three-dimensional data encoding device, and received data, which is input to a three-dimensional data decoding device. Note that this assumes a system environment without packet loss, and the received data is the same as the transmitted data.
[0404] In a system environment without packet loss, the three-dimensional data decoding device receives all of the transmitted data. Figure 61 is a flowchart of the processing performed by the three-dimensional data decoding device.
[0405] First, the three-dimensional data decoding device refers to the tile addition information (S5141) and determines whether each tile is a null tile or not (S5142).
[0406] If the tile information indicates that the target tile is not a null tile (No in S5142), the 3D data decoding device determines that the target tile is not a null tile and decodes the target tile (S5143). Next, the 3D data decoding device obtains tile information (tile position information (origin coordinates, etc.) and size, etc.) from the tile information and reconstructs the 3D data by combining multiple tiles using the obtained information (S5144).
[0407] On the other hand, if the tile information indicates that the target tile is not a null tile (Yes in S5142), the 3D data decoding device determines that the target tile is a null tile and does not decode the target tile (S5145).
[0408] Furthermore, the three-dimensional data decoding device may determine that missing data points are null tiles by sequentially analyzing the index information shown in the header of the encoded data. Alternatively, the three-dimensional data decoding device may combine a determination method using tile addition information with a determination method using index information.
[0409] Next, we will explain how to handle cases where packet loss occurs. Figure 62 shows an example of transmitted data sent from a three-dimensional data encoding device and received data input to a three-dimensional data decoding device. Here, we assume a system environment where packet loss occurs.
[0410] In a system environment with packet loss, the 3D data decoder may not receive all of the transmitted data. In this example, packets Gt2 and At2 are lost.
[0411] Figure 63 is a flowchart of the processing of the three-dimensional data decoding device in this case. First, the three-dimensional data decoding device analyzes the continuity of the index information shown in the header of the encoded data (S5151) and determines whether or not the index number of the target tile exists (S5152).
[0412] If an index number exists for the target tile (Yes in S5152), the 3D data decoding device determines that the target tile is not a null tile and performs the decoding process for the target tile (S5153). Next, the 3D data decoding device obtains tile information (tile position information (origin coordinates, etc.) and size, etc.) from the tile addition information and reconstructs the 3D data by combining multiple tiles using the obtained information (S5154).
[0413] On the other hand, if index information for the target tile does not exist (No in S5152), the three-dimensional data decoding device determines whether the target tile is a null tile by referring to the tile addition information (S5155).
[0414] If the target tile is not a null tile (No in S5156), the 3D data decoding device determines that the target tile is lost (packet loss) and performs error decoding (S5157). Error decoding is, for example, a process that attempts to decode the original data as if the data had been there. In this case, the 3D data decoding device may regenerate the 3D data and perform reconstruction of the 3D data (S5154).
[0415] On the other hand, if the target tile is a null tile (Yes in S5156), the 3D data decoding device does not perform decoding or reconstruction of the 3D data, treating the target tile as a null tile (S5158).
[0416] Next, we will explain the encoding method when null tiles are not explicitly defined. The three-dimensional data encoding device may generate encoded data and additional information using the following method.
[0417] The 3D data encoding device does not include information about null tiles in the tile appending information. The 3D data encoding device adds the index numbers of tiles excluding null tiles to the data header. The 3D data encoding device does not transmit null tiles.
[0418] In this case, the number of tile divisions (number_of_tiles) indicates the number of divisions excluding null tiles. The 3D data encoding device may also store information indicating the number of null tiles separately in the bitstream. Furthermore, the 3D data encoding device may include information about null tiles in the additional information, or include some information about null tiles.
[0419] Figure 64 is a flowchart of the three-dimensional data encoding process performed by the three-dimensional data encoding device in this case. First, the three-dimensional data encoding device determines the tile division method and divides the point cloud data into tiles using the determined division method (S5161).
[0420] Next, the three-dimensional data encoding device determines whether the target tile is a null tile (S5162). In other words, the three-dimensional data encoding device determines whether or not there is no data in the target tile.
[0421] If the target tile is not a null tile (No in S5162), the 3D data encoding device adds index information for the tiles excluding null tiles to the data header (S5163). Then, the 3D data encoding device sends out the target tile (S5164).
[0422] On the other hand, if the target tile is a null tile (Yes in S5162), the three-dimensional data encoding device does not add index information for the target tile to the data header, nor does it send out the target tile.
[0423] Figure 65 shows an example of index information (idx) added to the data header. As shown in Figure 65, index information is not added to null tiles, and sequential numbers are added to tiles other than null tiles.
[0424] Figure 66 shows an example of the dependencies between data. In the figure, the tip of the arrow indicates the dependent data, and the base of the arrow indicates the dependent data. In the figure, Gtn (n is 1 to 4) indicates the location information of tile number n, Atn indicates the attribute information of tile number n, and Mtile indicates additional tile information.
[0425] Figure 67 shows an example of the configuration of the transmitted data, which is encoded data sent out from a three-dimensional data encoding device.
[0426] The following describes the decoding method when null tiles are not explicitly specified. Figure 68 shows an example of transmitted data sent from a three-dimensional data encoding device and received data input to a three-dimensional data decoding device. This assumes a system environment with packet loss.
[0427] Figure 69 is a flowchart of the processing of the three-dimensional data decoding device in this case. First, the three-dimensional data decoding device analyzes the tile index information shown in the header of the encoded data and determines whether or not the index number of the target tile exists. The three-dimensional data decoding device also obtains the number of tile divisions from the tile addition information (S5171).
[0428] If an index number for the target tile exists (Yes in S5172), the 3D data decoding device performs the decoding process for the target tile (S5173). Next, the 3D data decoding device obtains tile information (tile position information (origin coordinates, etc.) and size, etc.) from the tile addition information and reconstructs the 3D data by combining multiple tiles using the obtained information (S5175).
[0429] On the other hand, if the index number of the target tile does not exist (No in S5172), the three-dimensional data decoding device determines that the target tile is a packet loss and performs error decoding (S5174). In addition, the three-dimensional data decoding device determines that any space not present in the data is a null tile and reconstructs the three-dimensional data.
[0430] Furthermore, by explicitly indicating null tiles, the three-dimensional data encoding device can appropriately determine that there are no points within a tile, rather than due to measurement errors, data loss due to data processing, or packet loss.
[0431] Furthermore, the three-dimensional data encoding device may use both a method that explicitly indicates null packets and a method that does not explicitly indicate null packets. In this case, the three-dimensional data encoding device may indicate in the tile addition information whether or not to explicitly indicate null packets. Alternatively, depending on the type of partitioning method, the device may decide in advance whether or not to explicitly indicate null packets, and the three-dimensional data encoding device may indicate whether or not to explicitly indicate null packets by indicating the type of partitioning method.
[0432] Furthermore, while Figure 47 and other figures show an example where the tile information includes information relating to all tiles, the tile information may also include information relating to some of the tiles among a group of tiles, or it may include information relating to the null tiles of some of the tiles among a group of tiles.
[0433] Furthermore, while we have described an example in which information related to segmented data (tiles), such as whether or not segmented data (tiles) exist, is stored in the tile addition information, some or all of this information may be stored in the parameter set or as data. If this information is stored as data, for example, a nal_unit_type may be defined to indicate whether or not segmented data exists, and this information may be stored in the NAL unit. Alternatively, this information may be stored in both the addition information and the data.
[0434] (Embodiment 6) The following describes the process of quantization performed for each tile.
[0435] Figure 70 shows an example of GPS syntax. As shown in Figure 70, GPS includes a UniqueBetweenTilesFlag. The UniqueBetweenTilesFlag indicates whether or not there may be overlaps between tiles.
[0436] Figure 71 is a flowchart of the three-dimensional data decoding process. First, the three-dimensional data decoding device decodes UniqueBetweenTilesFlag and MergeDuplicatedPointFlag from the metadata contained in the bitstream (S6261). Next, the three-dimensional data decoding device decodes the position information and attribute information for each tile and reconstructs the point cloud (S6262).
[0437] Next, the three-dimensional data decoder determines whether or not merging of duplicate points is necessary (S6263). For example, the three-dimensional data decoder determines whether or not merging is necessary depending on whether or not the application can handle the duplicate points, or whether or not it would be better to merge the duplicate points. Alternatively, the three-dimensional data decoder may decide to merge the duplicate points in order to smooth or filter the multiple attribute information corresponding to the duplicate points in order to remove noise or improve estimation accuracy.
[0438] If merging of duplicate points is necessary (Yes in S6263), the 3D data decoder determines whether or not there are duplicates between tiles (duplicate points exist) (S6264). For example, the 3D data decoder may determine the presence or absence of duplicates between tiles based on the decoding results of UniqueBetweenTilesFlag and MergeDuplicatedPointFlag. This eliminates the need for the 3D data decoder to search for duplicate points, thereby reducing the processing load on the 3D data decoder. Alternatively, the 3D data decoder may determine whether or not duplicate points exist by searching for duplicate points after reconstructing the tiles.
[0439] If there is overlap between tiles (Yes in S6264), the 3D data decoder merges the overlapping points between tiles (S6265). Next, the 3D data decoder merges the multiple overlapping attribute information (S6266).
[0440] After step S6266, or if there are no overlaps between tiles (No in S6264), the 3D data decoder executes the application using the point cloud without overlapping points (S6267).
[0441] On the other hand, if merging of duplicate points is not necessary (No in S6263), the 3D data decoding device does not merge duplicate points and executes the application using the point cloud containing the duplicate points (S6268).
[0442] The following describes examples of applications. First, we will describe an example of an application that uses a point cloud with no overlapping points.
[0443] Figure 72 shows an example of an application. The example shown in Figure 72 illustrates a use case in which a moving object traveling from the area of tile A to the area of tile B downloads map point cloud data from a server in real time. The server stores encoded data of map point clouds for multiple overlapping areas. The moving object has already obtained map information for tile A and requests the server to obtain map information for tile B, which is located in the direction of travel.
[0444] In this process, the mobile device determines that the data in the overlapping portion between tile A and tile B is unnecessary and sends an instruction to the server to delete the overlapping portion between tile B and tile A contained within tile B. The server deletes the overlapping portion from tile B and delivers the deleted tile B to the mobile device. This reduces the amount of data transmitted and lowers the load on the decoding process.
[0445] The mobile object may also verify that there are no duplicate points based on a flag. Furthermore, if the mobile object has not already acquired tile A, it will request data from the server that does not remove duplicate parts. In addition, if the server does not have a function to remove duplicate points, or if it is unclear whether there are duplicate points, the mobile object may check the delivered data to determine whether there are duplicate points and, if there are duplicates, perform a merge.
[0446] Next, we will describe an example of an application that uses point clouds with overlapping points. A mobile device uploads map point cloud data acquired by LiDAR to a server in real time. For example, the mobile device uploads data acquired tile by tile to the server. In this case, there are overlapping areas between tile A and tile B, but the encoding mobile device does not merge the overlapping points between tiles and sends the data to the server along with a flag indicating that there is overlap between tiles. The server stores the received data as is, without merging the overlapping data contained in the received data.
[0447] Furthermore, when transmitting or storing point cloud data using systems such as ISOBMFF, MPEG-DASH / MMT, or MPEG-TS, the device may replace the flags included in the GPS, which indicate whether or not there are overlapping points within a tile or between tiles, with descriptors or metadata in the system layer and store them in SI, MPD, moov, or moof boxes. This allows applications to utilize the system's functionality.
[0448] Furthermore, as shown in Figure 73, the three-dimensional data encoding device may divide tile B into multiple slices based on overlapping areas with other tiles. In the example shown in Figure 73, slice 1 is an area that does not overlap with any tile, slice 2 is an area that overlaps with tile A, and slice 3 is an area that overlaps with tile C. This makes it easier to separate the desired data from the encoded data.
[0449] Furthermore, the map information may be in the form of point cloud data or mesh data. The point cloud data may be tiled by region and stored on a server.
[0450] Figure 74 is a flowchart showing the processing flow in the above system. First, the terminal (e.g., a mobile device) detects its movement from area A to area B (S6271). Next, the terminal starts acquiring map information for area B (S6272).
[0451] If the terminal has already downloaded the information for area A (Yes in S6273), the terminal instructs the server to retrieve data for area B that does not overlap with area A (S6274). The server deletes area A from area B and sends the data of area B after deletion to the terminal (S6275). The server may also encode and send the data for area B in real time to avoid overlapping points, in response to instructions from the terminal.
[0452] Next, the terminal merges (combines) the map information of area B with the map information of area A and displays the merged map information (S6276).
[0453] On the other hand, if the terminal has not yet downloaded the information for area A (No in S6273), the terminal instructs the server to retrieve data for area B, including points of overlap with area A (S6277). The server sends the data for area B to the terminal (S6278). Next, the terminal displays map information for area B, including points of overlap with area A (S6279).
[0454] Figure 75 is a flowchart illustrating another example of the system's operation. The transmitting device (three-dimensional data encoding device) transmits the tile data in order (S6281). The transmitting device also adds a flag to the data of the tile to be transmitted, indicating whether or not the tile to be transmitted overlaps with the data of the tile transmitted immediately before it, and then sends out the data (S6282).
[0455] The receiving device (three-dimensional data decoding device) determines, based on a flag attached to the data, whether the tile of the received data overlaps with a tile of previously received data (S6283). If the tile of the received data overlaps with a tile of previously received data (Yes in S6283), the receiving device deletes or merges the overlapping points (S6284). On the other hand, if the tile of the received data does not overlap with a tile of previously received data (No in S6283), the receiving device does not delete or merge the overlapping points and terminates the process. This reduces the processing load on the receiving device and improves the accuracy of attribute information estimation. Note that the receiving device does not need to merge overlapping points if merging is not necessary.
[0456] (Embodiment 7) This embodiment describes a viewpoint-based display method, a random access method for encoded data, a point cloud data encoding method, and a decoding method for an application using point clouds.
[0457] With improvements in sensor performance, it has become possible to obtain high-quality three-dimensional point clouds. However, in order to view these high-quality three-dimensional points, a viewing device (viewer) capable of reproducing these high-quality three-dimensional point clouds is necessary. Specifically, it is desirable to be able to display high-quality, large-data-volume three-dimensional point clouds without delay. In this embodiment, a three-dimensional point cloud viewing device (first application) that can efficiently display high-density point cloud data using a scalable method employing point cloud compression is described.
[0458] Point cloud compression is performed using multiple data partitioning methods. For example, using LoD (Levels of Details), the required resolution to represent the point cloud data is calculated based on the distance between the virtual camera and the point cloud data. This enables separation or hierarchical layering.
[0459] A three-dimensional point cloud viewing device (also called a three-dimensional data decoding device) selects visible point clouds for rendering. At this time, it is preferable that the three-dimensional data decoding device confirms that all visible point clouds are actually scanned data, not just approximations.
[0460] Figure 76 is a block diagram showing an example configuration of a three-dimensional data encoding device. The three-dimensional data encoding device comprises a point cloud encoding unit 8701 and a file format generation unit 8702. The point cloud encoding unit 8701 generates encoded data (bitstream) by encoding point cloud data. For example, the point cloud encoding unit 8701 encodes point cloud data using a location information-based encoding method using an octave, or a video-based encoding method.
[0461] The file format generation unit 8702 converts the encoded data (bitstream) into data in a predetermined file format. For example, the file format may be ISOBMFF or MP4. The three-dimensional data encoding device may output encoded data in the file format format (for example, to transmit to the three-dimensional data decoding device), or it may output encoded data in the bitstream format of the encoding scheme.
[0462] Figure 77 is a block diagram showing an example configuration of the three-dimensional data decoding device 8705. The three-dimensional data decoding device 8705 generates point cloud data by decoding encoded data. Here, the encoded data is, for example, encoded data in bitstream format or MP4 format. Note that unencoded point cloud data may also be used.
[0463] A brick is a collection of all or part of the data in a point cloud. These bricks are sometimes also called partitioned data, tiles, or slices. Partitioned data can also be further subdivided.
[0464] The 3D data decoding device 8705 acquires camera viewpoint information, which indicates the camera's viewpoint (angle), from an external source. Based on the camera viewpoint information, the 3D data decoding device 8705 acquires part or all of the encoded data and generates point cloud data by decoding the acquired encoded data. For example, the camera viewpoint information indicates the camera's position and direction (orientation). Subsequently, the 3D data decoding device 8705 displays the decoded point cloud data.
[0465] The three-dimensional data decoding device 8705 comprises a point cloud decoding unit 8706 and a brick decoding control unit 8707. Camera viewpoint information (camera field of view) is input to the brick decoding control unit 8707. The brick decoding control unit 8707 selects bricks to decode based on the visibility of the bricks determined based on the camera viewpoint information. The point cloud decoding unit 8706 decodes the selected bricks and outputs the decoded bricks.
[0466] The configuration of the three-dimensional data encoding device according to this embodiment will be described below. Figure 78 is a block diagram showing the configuration of the three-dimensional data encoding device 8710 according to this embodiment. The three-dimensional data encoding device 8710 generates encoded data (encoded stream) by encoding point cloud data. This three-dimensional data encoding device 8710 includes a division unit 8711, a plurality of position information encoding units 8712, a plurality of attribute information encoding units 8713, an additional information encoding unit 8714, a multiplexing unit 8715, and a normal vector generation unit 8716.
[0467] The division unit 8711 generates multiple division data by dividing the point cloud data. Specifically, the division unit 8711 generates multiple division data by dividing the space of the point cloud data into multiple subspaces. Here, a subspace is one of bricks, tiles, and slices, or a combination of two or more of bricks, tiles, and slices. More specifically, the point cloud data includes location information, attribute information (such as color or reflectance), and additional information. The division unit 8711 generates multiple division location information by dividing the location information and generates multiple division attribute information by dividing the attribute information. The division unit 8711 also generates additional information related to the division.
[0468] Multiple location information encoding units 8712 generate multiple encoded location information by encoding multiple partitioned location information. For example, the location information encoding unit 8712 encodes partitioned location information using an N-tree structure such as an octree. Specifically, in an octree, the target space is divided into 8 nodes (subspaces), and 8 bits of information (occupancy code) are generated to indicate whether or not a point cloud is contained in each node. Furthermore, nodes containing point clouds are further divided into 8 nodes, and 8 bits of information are generated to indicate whether or not a point cloud is contained in each of these 8 nodes. This process is repeated until the number of point clouds contained in a node falls below a predetermined threshold for the hierarchy or node. For example, multiple location information encoding units 8712 process multiple partitioned location information in parallel.
[0469] The attribute information encoding unit 8713 generates encoded attribute information, which is encoded data, by encoding attribute information using the configuration information generated by the location information encoding unit 8712. For example, the attribute information encoding unit 8713 determines the reference point (reference node) to be referenced in encoding the target point (target node) to be processed, based on the octave tree structure generated by the location information encoding unit 8712. For example, the attribute information encoding unit 8713 references a surrounding node or adjacent node whose parent node in the octave tree is the same as the target node. Note that the method for determining the reference relationship is not limited to this.
[0470] Furthermore, the encoding process for location information or attribute information may include at least one of the following: quantization processing, prediction processing, and arithmetic encoding processing. In this case, a reference means using a reference node to calculate the predicted value of attribute information, or using the state of a reference node (for example, occupancy information indicating whether or not a point cloud is included in the reference node) to determine the encoding parameters. For example, encoding parameters are quantization parameters in the quantization processing, or context in arithmetic encoding, etc.
[0471] The normal vector generation unit 8716 calculates a normal vector for each divided data. Note that the input data does not necessarily have to be divided. In this case, the normal vector generation unit 8716 may calculate a normal vector for each point instead of a normal vector for each divided data. Alternatively, the normal vector generation unit 8716 may calculate both a normal vector for each divided data and a normal vector for each point.
[0472] The additional information encoding unit 8714 generates encoded additional information by encoding the additional information contained in the point cloud data, the additional information related to data division generated by the division unit 8711 during division, and the normal vectors generated by the normal vector generation unit 8716.
[0473] The multiplexing unit 8715 generates encoded data (encoded stream) by multiplexing multiple encoded position information, multiple encoded attribute information, and encoded additional information, and transmits the generated encoded data. The encoded additional information is used during decoding.
[0474] The configuration of the three-dimensional data decoding device according to this embodiment will be described below. Figure 79 is a block diagram showing the configuration of the three-dimensional data decoding device 8720. The three-dimensional data decoding device 8720 restores point cloud data by decoding the encoded data (encoded stream) generated when point cloud data is encoded. This three-dimensional data decoding device 8720 includes a demultiplexing unit 8721, a plurality of position information decoding units 8722, a plurality of attribute information decoding units 8723, an additional information decoding unit 8724, a coupling unit 8725, a normal vector extraction unit 8726, a random access control unit 8727, and a selection unit 8728.
[0475] The demultiplexing unit 8721 generates multiple encoded position information, multiple encoded attribute information, and encoded additional information by demultiplexing the encoded data (encoded stream). The additional information decoding unit 8724 generates additional information by decoding the encoded additional information.
[0476] The normal vector extraction unit 8726 extracts normal vectors from the additional information. The random access control unit 8727 determines which segmented data to extract based, for example, on the normal vector of each segmented data. The selection unit 8728 extracts multiple segmented data (multiple encoded position information and multiple encoded attribute information) determined by the random access control unit 8727 from the multiple segmented data (multiple encoded position information and multiple encoded attribute information). The selection unit 8728 may extract only one segmented data.
[0477] The multiple location information decoding units 8722 generate multiple segmented location information by decoding the multiple encoded location information extracted by the selection unit 8728. For example, the multiple location information decoding units 8722 process the multiple encoded location information in parallel.
[0478] The multiple attribute information decoding unit 8723 generates multiple segmented attribute information by decoding the multiple encoded attribute information extracted by the selection unit 8728. For example, the multiple attribute information decoding unit 8723 processes the multiple encoded attribute information in parallel.
[0479] The coupling unit 8725 generates position information by combining multiple division position information using additional information. The coupling unit 8725 generates attribute information by combining multiple division attribute information using additional information.
[0480] Next, we will describe a first example of generating and encoding point-specific normal vectors. Figure 80 shows an example of point cloud data. Figure 81 shows an example of point-specific normal vectors. Normal vector encoding can be performed independently for each three-dimensional point cloud. Figures 80 and 81 show a three-dimensional point cloud of a book and the normal vectors of that three-dimensional point cloud. As shown in Figure 81, there are multiple normal vectors extending in the upward, rightward, and forward directions. Here, the surface of the book is a plane, and multiple normal vectors of a given surface extend in the same direction. On the other hand, if the surface is rounded, the normal vectors extend in multiple directions according to the surface normal.
[0481] Figure 82 shows an example of the syntax for normal vectors in a bitstream. In the normal vector NormalVector[i][face] shown in Figure 82, "i" represents the counter for each three-dimensional point cloud, and [face] represents the x, y, and z axes that represent the three-dimensional point cloud. In other words, NormalVector represents the magnitude of the normal vector for each axis.
[0482] Figure 83 is a flowchart of the three-dimensional data encoding process. First, the three-dimensional data encoding device encodes location information (geometry) and attribute information for each point (S8701). For example, the three-dimensional data encoding device encodes location information for each point. In addition, if attribute information corresponding to a point exists, the three-dimensional data encoding device may encode the attribute information for each point.
[0483] Next, the three-dimensional data encoding device encodes the normal vector (x, y, z) for each point (S8702). The three-dimensional data encoding device may encode the normal vector for each point. The three-dimensional data encoding device may also encode difference information, for example, showing the difference between the normal vector of the point being processed and the normal vectors of other points. This can reduce the amount of data. The three-dimensional data encoding device may encode the normal vector by including it in the position information or by including it in the attribute information. The three-dimensional data encoding device may also encode the normal vector independently of the position information and attribute information. If there are multiple normal vectors for a single point, the three-dimensional data encoding device may encode multiple normal vectors for each point.
[0484] Figure 84 is a flowchart of the three-dimensional data decoding process. First, the three-dimensional data decoding device decodes the position information and attribute information from the bitstream point by point (S8706). Next, the three-dimensional data decoding device decodes the normal vector from the bitstream point by point (S8707).
[0485] Note that the processing order shown in Figures 83 and 84 is just one example, and the encoding order and decoding order may be changed.
[0486] Furthermore, the three-dimensional data encoding device may reduce the amount of data by encoding the normal vector using positional information or the correlation of positional information. In that case, the three-dimensional data decoding device decodes the normal vector using positional information. By the above method, the normal vector of each point in the point cloud can be encoded and decoded.
[0487] Next, we will describe a second example of generating and encoding the normal vector for each point. Another method for encoding the normal vector for each point is to encode the normal vector as one of the attribute information. Below, we will describe an example in which encoding is performed using an attribute information encoding unit or an attribute information decoding unit as one of the attribute information.
[0488] For example, a three-dimensional data encoding device encodes color information as the first attribute information and the normal vector as the second attribute information. Figure 85 shows an example of the bitstream configuration. For example, Attr(0) in Figure 85 is the encoded data for the first attribute information, and Attr(1) is the encoded data for the second attribute information. Metadata related to encoding is stored in a parameter set (APS). A three-dimensional data decoding device decodes the encoded data by referring to the APS corresponding to the encoded data.
[0489] Furthermore, the SPS stores identification information (attribute_type=Normal Vector) indicating that the second attribute information is a normal vector. If the attribute information is a normal vector, information indicating that the normal vector is data with three elements for each point may also be stored in the SPS. Additionally, the SPS stores identification information (attribute_type=Color) indicating that the first attribute information is color information.
[0490] Figure 86 shows an example of point cloud information containing positional information, color information, and normal vectors. The three-dimensional data encoding device encodes the uncompressed point cloud data shown in Figure 86.
[0491] The range of values for the normal vector is from -1 to 1 in floating-point terms. For ease of representation, the 3D data encoder may convert floating-point values to integers depending on the required precision. For example, the 3D data encoder may convert floating-point values to values from -127 to 128 using an 8-bit representation. In other words, the 3D data encoder may convert floating-point values to integers or positive integer values. Since the normal vector is treated as a single attribute, different quantization processes can be applied. For example, different quantization parameters can be used for each attribute. This allows for different levels of precision to be achieved. For example, the quantization parameters are stored in the APS.
[0492] Figure 87 is a flowchart of the three-dimensional data encoding process. First, the three-dimensional data encoding device encodes position information and attribute information (color information, etc.) for each point (S8711). The three-dimensional data encoding device also encodes the normal vector for each point as attribute information with attribute_type="normal vector" using a predetermined method (S8712).
[0493] Figure 88 is a flowchart of the three-dimensional data decoding process. The three-dimensional data decoding device decodes positional information and attribute information for each point from the bitstream (S8716). The three-dimensional data decoding device also decodes the normal vector for each point from the bitstream as attribute information with attribute_type="normal vector" using a predetermined method (S8717).
[0494] Note that the processing order shown in Figures 87 and 88 is just one example, and the encoding order and decoding order may be changed.
[0495] Next, we will explain an example of generating normal vectors for each data unit containing multiple points. The 3D data encoding device divides point cloud data into multiple objects or regions based on the positional information and features of the point cloud. The divided data may be, for example, tiles, slices, or layered data. The 3D data encoding device generates normal vectors for these divided data units, that is, data units containing one or more points.
[0496] Here, visibility can be determined by the normal vector representation of the object within the brick. Figures 89 and 90 illustrate this process. For example, as shown in Figure 89, the three-dimensional data encoding device divides the normal vector direction into angles at 30° intervals with respect to the horizontal and vertical axes. A simpler method is to divide the normal vector into six directions, as shown in Figure 90: (0,0), (0,90), (0,-90), (90,0), (-90,0), and (180,180).
[0497] Furthermore, the three-dimensional data encoding device may calculate the valid normal vector using the median, mean, or other more effective algorithms. Alternatively, the three-dimensional data encoding device may use representative values or other methods as the valid normal vector values.
[0498] Furthermore, the normal vector for each segmented data can either show the original x, y, and z values as they are, or it can be quantized every 30 degrees as described above, or it can be quantized every 90 degrees. Quantization can reduce the amount of information.
[0499] Figure 91 shows an example of point cloud data, specifically an example of a face object. Figure 92 shows an example of a normal vector in this case. As shown in Figure 92, the normal vectors of the face object shown in Figure 91 point in the (0,0) and (90,0) directions. A three-dimensional data encoding device can use one bit for each direction to indicate whether or not the object's normal vector is present in that direction.
[0500] Thus, there may be two or more normal vectors for a single divided data unit. In that case, multiple normal vectors may be shown for a single divided data unit.
[0501] For example, the data example containing face objects shown in Figures 91 and 92 is an example where the data's normal vectors are shown in units of 90 degrees, with six different normal vectors for each face. In this example, the two normal vectors in the (0,0) and (90,0) directions are the normal vectors of this segmented data.
[0502] Alternatively, each of the six normal vectors can be represented by 1 bit of information. Figure 93 shows an example of this normal vector information. If the segmented data has the corresponding normal vector, the 1 bit of information is set to a value of 1; otherwise, it is set to 0. This reduces the amount of information compared to simply showing the x, y, and z values, because the data is quantized.
[0503] The following describes a simpler representation of normal vectors. A six-faced cube is used to represent the normal vector and its realizability (visibility) from a specific camera viewpoint. Figures 94-97 illustrate this process. Figure 94 shows an example of a six-faced cube. Figures 95, 96, and 97 show the front and back faces a and b, the left and right faces c and d, and the top and bottom faces e and f, respectively. Depending on the orientation of the object according to the field of view, the normal vector faces at least one or three faces. Six 1-bit flags can be used to represent one of the six faces (abcdef) of the cube representing each system. For example, (100000) is generated when viewed from the front, (001000) when viewed from the side, and (000001) when viewed from below. In this representation, size is not important; only direction is represented. It is also possible that an object may have three faces specified. Face a is the opposite face to face b, face c is the opposite face to face d, and face e is the opposite face to face f. Therefore, it is impossible to view face a and face b simultaneously. In other words, the normal vector can be represented using three flags (ace).
[0504] Thus, if the camera viewpoint (camera angle) is known in advance, the normal vector information can be represented with 3 bits. Figure 98 shows the visibility when viewing the objects of slice A or slice B from the direction of face c. Slice A is visible from the direction of face c, so it is represented as ace=(010). On the other hand, slice B is hidden by slice A when viewed from the direction of face c, so it is represented as ace=(000).
[0505] Next, a first method for encoding and decoding normal vectors for each brick will be described. Figure 99 shows an example of the bitstream configuration in this case. In the example shown in Figure 99, the normal vector information is stored in the slice header of the position information for each slice. Note that the normal vector information may be stored in the attribute information header, or it may be stored in metadata independent of the position information and attribute information.
[0506] FIG. 100 is a diagram showing a syntax example of a geometry slice header information of position information. The geometry slice header information of position information includes normal_vector_number, normal_vector_x, normal_vector_y, and normal_vector_z.
[0507] normal_vector_number indicates the number of normal vectors corresponding to the slice data. normal_vector_x, normal_vector_y, and normal_vector_z indicate the elements (x, y, z) of the normal vector corresponding to the slice data, respectively.
[0508] In this example, the number of normal_vectors can be changed. The normal_vector is shown as many as the number of normal_vector_number.
[0509] If the information of the normal vector is common for all slices, normal_vector_number may be stored in GPS or SPS that can store common information for multiple slices.
[0510] Also, the values of the normal vectors of x, y, and z may be quantized. For example, the three-dimensional data encoding device may quantize the value of the original normal vector by shifting it by a common bit amount s (bit), and send out the information indicating the bit amount s and the information indicating the quantized normal vector (normal_vector_x<<s, normal_vector_y<<s, normal_vector_z<<s). Thereby, the bit amount can be reduced.
[0511] FIG. 101 is a diagram showing another syntax example of the geometry slice header information of position information. This example shows the normal vectors simplified (quantized) for the six-sided data for each divided data. For each face, it is shown whether there is a normal vector or not.
[0512] This positional slice header includes an is_normal_vector. is_normal_vector is set to 1 if a normal vector corresponding to the slice data exists, and to 0 if no normal vector exists. For example, the order of multiple faces is predetermined.
[0513] The precision of quantization and the number or order of normal vectors are not limited to these. They may be fixed or variable.
[0514] Figure 102 is a flowchart of the three-dimensional data encoding process. First, the three-dimensional data encoding device generates multiple divided data by dividing the point cloud data (S8721). Next, the three-dimensional data encoding device encodes positional information and attribute information for each divided data (S8722). Next, the three-dimensional data encoding device stores the normal vector for each divided data in the slice header (S8723).
[0515] Figure 103 is a flowchart of the three-dimensional data decoding process. First, the three-dimensional data decoding device decodes position information and attribute information from the bitstream for each divided data (S8726). Next, the three-dimensional data decoding device decodes the normal vector for each divided data from the slice header for each divided data (S8727). Next, the three-dimensional data decoding device combines the multiple divided data (S8728).
[0516] Figure 104 is a flowchart of the three-dimensional data decoding process when partially decoding data. First, the three-dimensional data decoding device decodes the normal vector for each divided data from the slice header for each divided data (S8731). Next, the three-dimensional data decoding device determines the divided data to be decoded based on the normal vector and decodes the determined divided data (S8732). Next, the decoded multiple divided data are combined (S8733).
[0517] Next, a second method for encoding and decoding normal vectors for each brick is described. Another method for encoding normal vector information is to use metadata (e.g., SEI: Supplemental Enhancement Information). Figure 105 shows an example of a bitstream structure. As shown in Figure 105, the SEI may be included in the bitstream, or it may be generated as a separate file from the main encoded bitstream, depending on how the SEI is implemented in both the encoding and decoding devices.
[0518] Figure 106 shows an example of the syntax of slice information included in SEI. The slice information includes number_of_slice, bounding_box_origin_x, bounding_box_origin_y and bounding_box_origin_z, bounding_box_width, bounding_box_height and bounding_box_depth, normalVector_QP, number_of_normal_vector, normalVector_x, normalVector_y and normalVector_z.
[0519] `number_of_slice` indicates the number of sliced data. `bounding_box_origin_x`, `bounding_box_origin_y`, and `bounding_box_origin_z` indicate the origin coordinates of the bounding box for the slice data. `bounding_box_width`, `bounding_box_height`, and `bounding_box_depth` indicate the width, height, and depth of the bounding box for the slice data, respectively.
[0520] `normalVector_QP` indicates the scale information or bit shift information of the quantization if `normal_vector` is quantized. `number_of_normal_vector` indicates the number of normal vectors included in the slice data. `normalVector_x`, `normalVector_y`, and `normalVector_z` indicate the elements (x, y, z) of the normal vector, respectively.
[0521] Figure 107 shows another example of slice information included in SEI. The example shown in Figure 107 shows the normal vectors, simplified (quantized) for each of the six faces, for each divided data. Whether or not a normal vector exists for each face is indicated.
[0522] This slice information includes an is_normal_vector. is_normal_vector is set to 1 if a normal vector exists for the slice data, and to 0 if no normal vector exists. For example, the order of multiple faces is predetermined.
[0523] Furthermore, slice information may include a flag indicating whether or not the slice information includes bounding box information (origin, width, height, and depth) for each slice. In this case, if the flag is on (e.g., 1), the slice information includes bounding box information for each slice, and if the flag is off (e.g., 0), the slice information does not include bounding box information for each slice. Additionally, slice information may include a flag indicating whether or not the slice information includes normal vector information for each slice. In this case, if the flag is on (e.g., 1), the slice information includes normal vector information for each slice, and if the flag is off (e.g., 0), the slice information does not include normal vector information for each slice.
[0524] Next, random access and partial decoding will be described. The three-dimensional data decoding device decodes data independently for each slice, using information for each slice, such as the bounding box information and / or the normal vector of the slice.
[0525] Figure 108 is a flowchart of the three-dimensional data decoding process. First, the three-dimensional data decoding device determines the slices to be decoded and the decoding order of the slices using a predetermined method (S8741). Next, the three-dimensional data decoding device decodes specific slices in the determined order (S8742).
[0526] Figure 109 shows an example of this partial decoding process. For example, a three-dimensional data decoder receives the sliced encoded data shown in Figure 109(a). As shown in Figure 109(b), the three-dimensional data decoder decodes the encoded data of some of the slices and not the encoded data of the other slices. Alternatively, as shown in Figure 109(c), the three-dimensional data decoder performs decoding by rearranging the order of the encoded data.
[0527] Figure 110 shows an example of the configuration of a three-dimensional data decoding device. As shown in Figure 110, the three-dimensional data decoding device comprises an attribute information decoding unit 8731 and a random access control unit 8732. The attribute information decoding unit 8731 extracts bounding box information and normal vectors for each slice from the encoded data. The random access control unit 8732 determines the number and order of the slices to be decoded based on the bounding box information and normal vectors for each slice, and sensor information acquired from an external source, such as camera angle (camera orientation) and camera position.
[0528] Figures 111 and 112 show examples of processing by the random access control unit 8732. As shown in Figure 111, for example, the random access control unit 8732 may calculate distance information indicating the distance from the camera for each slice from the bounding box and camera position for each slice. Alternatively, as shown in Figure 112, the random access control unit 8732 may derive visibility information indicating whether or not an object is visible from the camera for each slice from the normal vector and camera angle for each slice. The random access control unit 8732 may calculate either the distance information or the visibility information, or it may calculate both.
[0529] The following explains the relationship between visible information and distance information. Figure 113 shows an example of the relationship between distance and resolution. For example, what is visible to the camera is decoded (frustum culling). Furthermore, the decoded resolution depends on the distance between the virtual camera and the point cloud data.
[0530] In other words, the 3D data decoder determines whether a slice is visible from the camera based on the normal vector of each slice and the camera viewpoint (camera angle), and decodes the slices that are visible from the camera. Furthermore, the 3D data decoder calculates the distance of the slice to be decoded from the camera, and if the distance from the camera is close, it may decode high-resolution data, and if the distance from the camera is far, it may decode low-resolution data.
[0531] In this case, the encoded data is encoded in a layered manner, and the three-dimensional data decoding device can decode the low-resolution data independently. Furthermore, when decoding high-resolution data, the three-dimensional data decoding device decodes the difference information between the low-resolution data and the high-resolution data, and adds the difference information to the low-resolution data to generate the high-resolution data. If the encoded data is not encoded in a layered manner, the three-dimensional data decoding device does not have to perform this process, or it may decide whether or not to perform this process depending on whether or not the data is layered.
[0532] Next, we will explain how to determine visibility using normal vectors. Figure 114 shows an example of bricks and normal vectors. In the example shown in Figure 114, the two bricks (e.g., slices) on the front facing the camera (viewing frustum), that is, the bricks whose normal vectors are pointing towards the camera, are decoded.
[0533] First, the 3D data decoding device determines whether, for each slice of data, there is one or more normal vectors in the metadata that have a normal vector opposite to the camera direction. If the slice data of the target slice contains a normal vector that has a normal vector opposite to the camera direction, the 3D data decoding device determines that the target slice is visible and selects the target slice for decoding.
[0534] Furthermore, the 3D data decoder may determine that a target slice is invisible (not visible) if another slice exists between the camera and the target slice. In addition, the 3D data decoder may determine whether a slice is visible or not by determining whether the relationship between the normal vector and the camera direction is within a predetermined angular range, rather than determining whether the normal vector and the camera direction are completely opposite.
[0535] Next, we will explain processing using LoD (Level of Detail). Below, we will explain an example of decoding processing according to layers with different resolutions.
[0536] Figure 115 shows an example of a Level of Data (LoD). Figure 116 shows an example of an octave tree structure. Each brick is divided into layers to control the level of resolution to be decoded. For example, a level is the depth of the division when dividing into an octave tree. As shown in Figure 115, the number of voxels in each level may be defined as 2 (3 × level). Note that a different definition may be used for the division method or the number of voxels depending on the level.
[0537] By using Level of Data (LoD), the 3D data decoder can achieve high-speed visibility determination and distance calculation. Decoding time affects real-time rendering. Using LoD allows for the display of intermediate bricks, enabling smoother integration with real-time rendering.
[0538] Figure 117 is a flowchart of the three-dimensional data decoding process using LoD. First, the three-dimensional data decoding device determines the level to be decoded according to the purpose (S8751). Next, the three-dimensional data decoding device decodes the first level (level 0) (S8752). Next, the three-dimensional data decoding device determines whether or not the decoding of all levels to be decoded has been completed (S8753). If the decoding of all levels has not been completed (No in S8753), the three-dimensional data decoding device decodes the next level (S8754). At this time, the three-dimensional data decoding device may decode the next level using the data from the previous level. If the decoding of all levels to be decoded has been completed (Yes in S8753), the three-dimensional data decoding device displays the decoded data (S8755).
[0539] Thus, the 3D data decoding device decodes data up to a determined level and does not decode data beyond that level. This reduces the processing load involved in decoding and improves processing speed. Furthermore, the 3D data decoding device displays data up to the determined level and does not display data beyond that level. This reduces the processing load involved in display and improves processing speed. The 3D data decoding device may determine the level to be decoded for a brick based on, for example, the distance of the brick from the camera, or whether the brick is visible from the camera or not.
[0540] Next, we will explain an example of implementing processing using LoD. Figure 118 is a flowchart of the three-dimensional data decoding process. First, the three-dimensional data decoding device acquires encoded data (S8761). For example, the encoded data is point cloud data that has been encoded and compressed using an arbitrary encoding scheme. The encoded data may be in bitstream format or file format format.
[0541] Next, the three-dimensional data decoding device obtains the normal vector and position information of the brick to be processed from the encoded data (S8762). For example, the three-dimensional data decoding device obtains the normal vector and position information of each brick from the metadata (SEI or data header) included in the encoded data. The three-dimensional data decoding device may also determine the distance between the brick and the camera from the brick's position information and the camera's position information. Furthermore, the three-dimensional data decoding device may determine the visibility of the brick (whether the brick is facing the camera or not) from the normal vector and the camera direction.
[0542] Next, the three-dimensional data decoding device determines which brick to decode and decodes the first level (level 0) of the determined brick (S8763). Figure 119 shows an example of bricks to be decoded. As shown in Figure 119, the three-dimensional data decoding device decodes all visible bricks at a resolution of level 0.
[0543] Next, the three-dimensional data decoding device determines, based on the position information, whether or not to decode the next level of each brick, and decodes the next level of the brick that it has determined to decode (S8764). This process is repeated until the decoding of all levels is completed (S8765). Specifically, the resolution of bricks closer to the virtual camera's position is set higher. For example, depending on resources such as memory, levels are gradually added to be decoded, prioritizing bricks closer to the camera.
[0544] Figure 120 shows an example of the level of decoding for each brick. As shown in Figure 120, the 3D data decoder decodes bricks closer to the camera with a higher resolution and bricks further away from the camera with a lower resolution, depending on their distance from the camera. The 3D data decoder also does not decode bricks that are not visible.
[0545] If decoding of all levels is complete (Yes in S8765), the 3D data decoder outputs the resulting 3D point cloud (S8766).
[0546] Up to this point, we have described a method in which normal vectors and bounding information for each slice data are calculated and encoded in a three-dimensional data encoding device, and visibility and distance information are calculated in a three-dimensional data decoding device based on this information and sensor input information, and the slice to be decoded is determined. Below, we will describe an example in which visibility and distance information according to the camera direction are pre-calculated and encoded in the three-dimensional data encoding device for each slice data.
[0547] Figure 121 shows an example of the syntax for a geometry slice header. The geometry slice header includes number_of_angle, view_angle, and visibility.
[0548] `number_of_angle` indicates the number of camera angles (camera directions). `view_angle` indicates the camera angle, for example, a vector of camera angles. `visibility` indicates whether the slice is visible from the corresponding camera angle. The number of `view_angle` may be variable or a predetermined fixed value. Furthermore, if the number and value of `view_angle` are predetermined, `view_angle` may be omitted.
[0549] Furthermore, while this example shows visibility according to the camera angle, another example is that the three-dimensional data encoding device may pre-calculate visibility according to the camera position or camera parameters and store the calculated visibility in the encoded data.
[0550] Figure 122 is a flowchart of the three-dimensional data encoding process. First, the three-dimensional data encoding device divides the point cloud data into segmented data (e.g., slices) (S8771). Next, the three-dimensional data encoding device encodes positional information and attribute information for each segmented data unit (S8772). The three-dimensional data encoding device also stores visibility information corresponding to the camera angle as metadata for each segmented data unit (S8773).
[0551] Figure 123 is a flowchart of the three-dimensional data decoding process. First, the three-dimensional data decoding device obtains visibility information corresponding to the camera angle from the metadata of each segmented data (S8776). Next, based on the visibility information, the three-dimensional data decoding device determines which segmented data is visible from the desired camera angle and decodes the segmented data that is visible (S8777).
[0552] Figures 124 and 125 show examples of point cloud data. In these figures, a, c, d, and e represent planes. Therefore, a three-dimensional data encoding device can perform slicing by utilizing the fact that the three-dimensional points of each slice have normal vectors in the same direction. A similar method can be applied to tiling.
[0553] Figures 126 to 129 show examples of system configurations including a three-dimensional data encoding device, a three-dimensional data decoding device, and a display device.
[0554] In the example shown in Figure 126, the three-dimensional data encoding device generates encoded data by encoding slice data, normal vectors for each slice, and bounding box information. The three-dimensional data decoding device identifies the data to be decoded from the encoded data and sensor information, and generates decoded slice data by decoding the identified data. The display device displays the decoded slice data. In this configuration, the three-dimensional data decoding device can flexibly decide whether to display information and whether to decode it.
[0555] In the example shown in Figure 127, the 3D data encoding device generates encoded data by encoding slice data, normal vectors for each slice, and bounding box information. The 3D data decoding device determines the data to be decoded and the order in which to decode from the encoded data and sensor information, and decodes the determined data in the determined order. In this configuration, the 3D data decoding device can decode the data that you want to display first (for example, 3, 4, and 5) first, thus improving the viewing experience.
[0556] In the example shown in Figure 128, the 3D data encoding device generates encoded data by encoding slice data and visibility information for each camera angle. The 3D data decoding device identifies the data to be decoded from the encoded data information and sensor information, and decodes the identified data. The 3D data decoding device may also determine the decoding order. In this configuration, the 3D data decoding device does not need to calculate visibility information, thus reducing the processing load on the 3D data decoding device.
[0557] In the example shown in Figure 129, the 3D data decoding device notifies the 3D data encoding device of the camera angle or camera position of the 3D data decoding device via communication or other means. The 3D data encoding device calculates the visible information for each slice, determines the data to be encoded and its order, and generates encoded data by encoding the determined data in the determined order. The 3D data decoding device decodes the transmitted slice data as is. In this configuration, by using an interactive configuration, the amount of processing and communication bandwidth can be reduced by encoding and decoding only the necessary parts.
[0558] Furthermore, if the camera position or camera angle changes, the 3D data decoding device may re-determine the slice to be decoded if the amount of change exceeds a predetermined value. In this case, high-speed decoding and display are possible by decoding the difference data other than the data that has already been decoded.
[0559] The following describes how to store encoded data in a file format such as ISOBMFF. Figure 130 shows an example of a bitstream configuration. Figure 131 shows an example of a three-dimensional data encoding device configuration. The three-dimensional data encoding device includes an encoding unit 8741 and a file conversion unit 8742. The encoding unit 8741 generates a bitstream containing encoded data and control information by encoding point cloud data. The file conversion unit 8742 converts the bitstream into a file format.
[0560] Figure 132 shows an example of the configuration of a three-dimensional data decoding device. The three-dimensional data decoding device includes a file inverse conversion unit 8751 and a decoding unit 8752. The file inverse conversion unit 8751 converts the file format into a bitstream containing encoded data and control information. The decoding unit 8752 generates point cloud data by decoding the bitstream.
[0561] Figure 133 shows the basic structure of ISOBMFF. Figure 134 is a protocol stack diagram when the NAL unit common to the PCC codec is stored in ISOBMFF. Here, the NAL unit of the PCC codec is stored in ISOBMFF.
[0562] NAL units include data NAL units and metadata NAL units. Data NAL units include geometry slice data and attribute slice data. Metadata NAL units include SPS, GPS, APS, and SEI.
[0563] ISOBMFF (ISO based media file format) is a file format standard defined in ISO / IEC 14496-12. It specifies a format that can store various media such as video, audio, and text in multiplexed formats, and is a media-independent standard.
[0564] The basic unit in ISOBMFF is the box. A box consists of type, length, and data, and a file is a collection of boxes of various types. A file mainly consists of boxes such as ftyp, which indicates the file's brand using 4CC, moov, which stores metadata such as control information, and mdat, which stores data.
[0565] The method for storing each type of media in ISOBMFF is specified separately; for example, the storage method for AVC video and HEVC video is specified in ISO / IEC 14496-15. Furthermore, it is conceivable to extend the functionality of ISOBMFF to store and transmit PCC encoded data.
[0566] When storing NAL units for metadata in ISOBMFF, the SEI may be stored in the "mdat box" along with the PCC data, or in the "track box" which contains control information about the stream. Furthermore, when transmitting data in packets, the SEI may be stored in the packet header. By indicating the SEI at the system layer, access to attribute information, tile, and slice data becomes easier, and access speed is improved.
[0567] Next, the method for generating a PCC random access table will be explained. The three-dimensional data encoding device generates a random access table using metadata that includes bounding box information and normal vector information for each slice. Figure 135 shows an example of converting a bitstream to a file format.
[0568] The 3D data encoding device stores each slice of data in the mdat file format. The 3D data encoding device calculates the memory location of the slice data as offset information at the beginning of the file (offsets 1-4 in Figure 135) and includes the calculated offset information in the random access table (PCC random access table).
[0569] Figure 136 shows an example of the syntax for slice information. Figures 137 to 139 show example of the syntax for a PCC random access table.
[0570] The PCC random access table includes bounding box information (bounding_box_info), normal vector information (normal_vector_info), and offset information (offset), which are stored in slice information (slice_information).
[0571] The 3D data decoding device analyzes the PCC random access table and identifies the slice to be decoded. The 3D data decoding device can access the desired data by obtaining offset information from the PCC random access table.
[0572] As described above, the three-dimensional data encoding device according to this embodiment performs the processing shown in Figure 140. The three-dimensional data encoding device generates a bitstream by encoding the positional information and one or more attribute information of each of the multiple three-dimensional points included in the point cloud data (S8781). In encoding (S8781), the normal vector of each of the multiple three-dimensional points is encoded as one attribute information included in one or more attribute information.
[0573] According to this, the three-dimensional data encoding device can process normal vectors in the same way as other attribute information by encoding them as attribute information. Therefore, the three-dimensional data encoding device can reduce the amount of processing required. In other words, the three-dimensional data encoding device can encode normal vectors as attribute information without changing the definition of attribute information, etc.
[0574] For example, in the encoding process (S8781), the three-dimensional data encoding device converts the normal vector, which is represented as a floating-point number, into an integer before encoding. This allows the three-dimensional data encoding device to process the normal vector in the same way as other attribute information, for example, when other attribute information is represented as an integer.
[0575] For example, a bitstream includes positional information and control information (e.g., SPS) common to one or more attribute pieces of information, and the control information (e.g., SPS) includes at least one of the following: information indicating that one of the one or more attribute pieces of information represents a normal vector (e.g., attribute_type=Normal Vector), or information indicating that the normal vector is data with three elements for each point.
[0576] For example, a three-dimensional data encoding device comprises a processor and memory, and the processor uses the memory to perform the above processing.
[0577] Furthermore, the three-dimensional data decoding device according to this embodiment performs the processing shown in Figure 141. The three-dimensional data decoding device obtains a bitstream generated by encoding the positional information and one or more attribute pieces of each of the multiple three-dimensional points included in the point cloud data, wherein the normal vector of each of the multiple three-dimensional points is encoded as one attribute piece included in one or more attribute pieces (S8786), and obtains the normal vector by decoding one attribute piece from the bitstream (S8787).
[0578] According to this, the 3D data decoding device can process the normal vector in the same way as other attribute information by decoding it as attribute information. Therefore, the 3D data decoding device can reduce the amount of processing required.
[0579] For example, in the acquisition of the normal vector (S8787), the three-dimensional data decoder obtains the normal vector, which is represented by an integer. This allows the three-dimensional data decoder to process the normal vector in the same way as other attribute information, for example, when other attribute information is represented by an integer.
[0580] For example, a bitstream includes positional information and control information (e.g., SPS) common to one or more attribute pieces of information, and the control information (e.g., SPS) includes at least one of the following: information indicating that one of the one or more attribute pieces of information represents a normal vector (e.g., attribute_type=Normal Vector), or information indicating that the normal vector is data with three elements for each point.
[0581] For example, a three-dimensional data decoding device comprises a processor and memory, and the processor uses the memory to perform the above processing.
[0582] Furthermore, the three-dimensional data encoding device according to this embodiment performs the processing shown in Figure 142. The three-dimensional data encoding device divides the point cloud data into a plurality of segmented data (e.g., bricks, slices, or tiles) (S8791), and generates a bitstream by encoding the plurality of segmented data (S8792). The bitstream includes information indicating the normal vector of each of the plurality of segmented data.
[0583] According to this, a three-dimensional data encoding device can reduce processing and coding requirements by encoding the normal vector for each segmented data, compared to encoding the normal vector for each point. For example, each of the multiple segmented data is a random access unit.
[0584] For example, a three-dimensional data encoding device comprises a processor and memory, and the processor uses the memory to perform the above processing.
[0585] Furthermore, the three-dimensional data decoding device according to this embodiment performs the processing shown in Figure 143. The three-dimensional data decoding device obtains a bitstream generated by encoding multiple divided data (e.g., bricks, slices, or tiles) generated by dividing the point cloud data (S8796), and obtains information indicating the normal vector of each of the multiple divided data from the bitstream (S8797).
[0586] According to this, a three-dimensional data decoding device can reduce processing load by decoding the normal vector for each divided data set compared to decoding the normal vector for each point. For example, each of the multiple divided data sets is a random access unit.
[0587] For example, a three-dimensional data decoding device further determines the target data segment to be decoded from multiple data segments based on the normal vector, and then decodes the target data segment.
[0588] For example, a three-dimensional data decoding device further determines the decoding order of multiple divided data based on the normal vector, and decodes the multiple divided data in the determined decoding order.
[0589] For example, a three-dimensional data decoding device comprises a processor and memory, and the processor uses the memory to perform the above processing.
[0590] (Embodiment 8) Next, the configuration of the three-dimensional data creation device 810 according to this embodiment will be described. Figure 144 is a block diagram showing an example of the configuration of the three-dimensional data creation device 810 according to this embodiment. This three-dimensional data creation device 810 is mounted on a vehicle, for example. The three-dimensional data creation device 810 transmits and receives three-dimensional data with an external traffic monitoring cloud, a preceding vehicle, or a following vehicle, and also creates and stores three-dimensional data.
[0591] The three-dimensional data creation device 810 includes a data receiving unit 811, a communication unit 812, a reception control unit 813, a format conversion unit 814, a plurality of sensors 815, a three-dimensional data creation unit 816, a three-dimensional data synthesis unit 817, a three-dimensional data storage unit 818, a communication unit 819, a transmission control unit 820, a format conversion unit 821, and a data transmission unit 822.
[0592] The data receiving unit 811 receives three-dimensional data 831 from a traffic monitoring cloud or a preceding vehicle. The three-dimensional data 831 includes information such as a point cloud, visible light images, depth information, sensor position information, or speed information, including areas that cannot be detected by the vehicle's sensors 815.
[0593] The communication unit 812 communicates with the traffic monitoring cloud or the preceding vehicle and sends data transmission requests and other messages to the traffic monitoring cloud or the preceding vehicle.
[0594] The receiving control unit 813 exchanges information such as the supported format with the communication destination via the communication unit 812 and establishes communication with the communication destination.
[0595] The format conversion unit 814 generates three-dimensional data 832 by performing format conversion on the three-dimensional data 831 received by the data reception unit 811. Furthermore, if the three-dimensional data 831 is compressed or encoded, the format conversion unit 814 performs decompression or decoding.
[0596] Multiple sensors 815 are a group of sensors that acquire information from outside the vehicle, such as LiDAR, visible light cameras, or infrared cameras, and generate sensor information 833. For example, if sensor 815 is a laser sensor such as LiDAR, the sensor information 833 is three-dimensional data such as a point cloud. Note that there are not necessarily multiple sensors 815.
[0597] The three-dimensional data creation unit 816 generates three-dimensional data 834 from the sensor information 833. The three-dimensional data 834 includes information such as a point cloud, visible light image, depth information, sensor position information, or velocity information.
[0598] The three-dimensional data synthesis unit 817 synthesizes three-dimensional data 835, which includes the space in front of the preceding vehicle that cannot be detected by the vehicle's sensors 815, by combining three-dimensional data 834 created based on the vehicle's sensor information 833 with three-dimensional data 832 created by the traffic monitoring cloud or the preceding vehicle.
[0599] The three-dimensional data storage unit 818 stores the generated three-dimensional data 835, etc.
[0600] The communication unit 819 communicates with the traffic monitoring cloud or following vehicles and sends data transmission requests, etc., to the traffic monitoring cloud or following vehicles.
[0601] The transmission control unit 820 exchanges information such as the supported format with the communication destination via the communication unit 819 and establishes communication with the communication destination. The transmission control unit 820 also determines the transmission area, which is the space of the three-dimensional data to be transmitted, based on the three-dimensional data construction information of the three-dimensional data 832 generated by the three-dimensional data synthesis unit 817 and the data transmission request from the communication destination.
[0602] Specifically, the transmission control unit 820 determines a transmission area that includes the space in front of its own vehicle that cannot be detected by the sensors of the following vehicle, in response to a data transmission request from the traffic monitoring cloud or a following vehicle. The transmission control unit 820 also determines the transmission area by determining whether the transmissionable space or the transmitted space has been updated based on the three-dimensional data construction information. For example, the transmission control unit 820 determines the transmission area to be the area specified in the data transmission request and in which the corresponding three-dimensional data 835 exists. The transmission control unit 820 then notifies the format conversion unit 821 of the format supported by the communication destination and the transmission area.
[0603] The format conversion unit 821 generates three-dimensional data 837 by converting the three-dimensional data 836 in the transmission area from the three-dimensional data 835 stored in the three-dimensional data storage unit 818 to a format supported by the receiving side. The format conversion unit 821 may also reduce the amount of data by compressing or encoding the three-dimensional data 837.
[0604] The data transmission unit 822 transmits three-dimensional data 837 to a traffic monitoring cloud or following vehicles. This three-dimensional data 837 includes, for example, information such as a point cloud in front of the vehicle, including areas that are blind spots for following vehicles, visible light images, depth information, or sensor position information.
[0605] Although this example describes a case where format conversion is performed by the format conversion units 814 and 821, format conversion is not required.
[0606] With this configuration, the three-dimensional data creation device 810 acquires three-dimensional data 831 from an external source for areas that cannot be detected by the vehicle's sensors 815, and generates three-dimensional data 835 by combining the three-dimensional data 831 with three-dimensional data 834 based on sensor information 833 detected by the vehicle's sensors 815. In this way, the three-dimensional data creation device 810 can generate three-dimensional data for areas that cannot be detected by the vehicle's sensors 815.
[0607] Furthermore, the three-dimensional data creation device 810 can transmit three-dimensional data, including the space in front of its own vehicle that cannot be detected by the sensors of the following vehicle, to the traffic monitoring cloud or following vehicle in response to a data transmission request from the traffic monitoring cloud or following vehicle.
[0608] Next, the procedure for transmitting three-dimensional data to a following vehicle using the three-dimensional data creation device 810 will be described. Figure 145 is a flowchart showing an example of the procedure for transmitting three-dimensional data to a traffic monitoring cloud or a following vehicle using the three-dimensional data creation device 810.
[0609] First, the three-dimensional data creation device 810 generates and updates three-dimensional data 835 of the space including the space on the road in front of the vehicle (S801). Specifically, the three-dimensional data creation device 810 constructs three-dimensional data 835 that includes the space in front of the vehicle ahead, which cannot be detected by the vehicle's sensors 815, by combining three-dimensional data 834 created based on the vehicle's sensor information 833 with three-dimensional data 831 created by the traffic monitoring cloud or the vehicle ahead.
[0610] Next, the three-dimensional data creation device 810 determines whether the three-dimensional data 835 contained in the transmitted space has changed (S802).
[0611] If a vehicle or person enters the transmitted space from the outside, causing a change in the three-dimensional data 835 contained in that space (Yes in S802), the three-dimensional data creation device 810 transmits the three-dimensional data, including the three-dimensional data 835 of the space where the change occurred, to the traffic monitoring cloud or the following vehicle (S803).
[0612] The 3D data creation device 810 may transmit the 3D data of the space where the change has occurred in accordance with the transmission timing of the 3D data transmitted at predetermined intervals, or it may transmit it immediately after detecting the change. In other words, the 3D data creation device 810 may transmit the 3D data of the space where the change has occurred with priority over the 3D data transmitted at predetermined intervals.
[0613] Furthermore, the three-dimensional data creation device 810 may transmit all of the three-dimensional data of the space in which the change occurred, or it may transmit only the difference in the three-dimensional data (for example, information on three-dimensional points that have appeared or disappeared, or displacement information of three-dimensional points).
[0614] Furthermore, the three-dimensional data creation device 810 may transmit metadata related to its own vehicle's hazard avoidance actions, such as sudden braking warnings, to following vehicles prior to the three-dimensional data of the space where the change has occurred. This allows following vehicles to recognize sudden braking by the preceding vehicle earlier and initiate hazard avoidance actions such as deceleration earlier.
[0615] If no change has occurred in the three-dimensional data 835 contained in the transmitted space (No in S802), or after step S803, the three-dimensional data creation device 810 transmits the three-dimensional data contained in a space of a predetermined shape located at a distance L in front of its own vehicle to the traffic monitoring cloud or a following vehicle (S804).
[0616] Furthermore, for example, the processes in steps S801 to S804 are repeated at predetermined time intervals.
[0617] Furthermore, if there is no difference between the three-dimensional data 835 of the space to be transmitted and the three-dimensional map, the three-dimensional data creation device 810 does not need to transmit the three-dimensional data 837 of the space.
[0618] In this embodiment, the client device transmits sensor information obtained from the sensor to the server or another client device.
[0619] First, the system configuration according to this embodiment will be described. Figure 146 is a diagram showing the configuration of the three-dimensional map and sensor information transmission and reception system according to this embodiment. This system includes a server 901 and client devices 902A and 902B. When client devices 902A and 902B are not specifically distinguished, they will also be referred to as client device 902.
[0620] The client device 902 is, for example, an in-vehicle device mounted on a moving object such as a vehicle. The server 901 is, for example, a traffic monitoring cloud and is capable of communicating with multiple client devices 902.
[0621] Server 901 transmits a three-dimensional map composed of point clouds to client device 902. Note that the composition of the three-dimensional map is not limited to point clouds; it may also represent other three-dimensional data, such as a mesh structure.
[0622] The client device 902 transmits sensor information acquired by the client device 902 to the server 901. The sensor information includes, for example, at least one of the following: LiDAR acquisition information, visible light image, infrared image, depth image, sensor position information, and velocity information.
[0623] The data transmitted and received between the server 901 and the client device 902 may be compressed to reduce data size, or it may be left uncompressed to maintain data accuracy. When data is compressed, a three-dimensional compression method based on an octave structure, for example, can be used for point clouds. In addition, a two-dimensional image compression method can be used for visible light images, infrared images, and depth images. A two-dimensional image compression method is, for example, MPEG-4 AVC or HEVC, which are standardized by MPEG.
[0624] Furthermore, in response to a request from the client device 902 to send a 3D map, the server 901 sends a 3D map managed by the server 901 to the client device 902. The server 901 may also send a 3D map without waiting for a request from the client device 902. For example, the server 901 may broadcast a 3D map to one or more client devices 902 located in a predetermined space. Alternatively, the server 901 may send a 3D map appropriate to the location of the client device 902 at regular intervals after receiving a transmission request from the client device 902. The server 901 may also send a 3D map to the client device 902 whenever the 3D map managed by the server 901 is updated.
[0625] The client device 902 sends a request to the server 901 to send a three-dimensional map. For example, if the client device 902 wants to perform self-position estimation while driving, the client device 902 sends a request to the server 901 to send a three-dimensional map.
[0626] Furthermore, the client device 902 may request the server 901 to send a 3D map in the following cases: If the 3D map held by the client device 902 is outdated, the client device 902 may request the server 901 to send a 3D map. For example, if a certain period of time has elapsed since the client device 902 acquired the 3D map, the client device 902 may request the server 901 to send a 3D map.
[0627] Client device 902 may request server 901 to send the three-dimensional map to the server 901 a certain time before client device 902 leaves the space represented by the three-dimensional map held by client device 902. For example, client device 902 may request server 901 to send the three-dimensional map to the server 901 if it is within a predetermined distance from the boundary of the space represented by the three-dimensional map held by client device 902. Furthermore, if the movement path and speed of client device 902 are known, the time when client device 902 leaves the space represented by the three-dimensional map held by client device 902 may be predicted based on these.
[0628] If the error in the alignment between the three-dimensional data created by the client device 902 from sensor information and the three-dimensional map exceeds a certain level, the client device 902 may request the server 901 to send the three-dimensional map.
[0629] The client device 902 transmits sensor information to the server 901 in response to a request for transmission of sensor information sent from the server 901. The client device 902 may also send sensor information to the server 901 without waiting for a request for transmission of sensor information from the server 901. For example, once the client device 902 receives a request for transmission of sensor information from the server 901, it may periodically transmit sensor information to the server 901 for a certain period. Furthermore, if the error in the alignment between the three-dimensional data created by the client device 902 based on the sensor information and the three-dimensional map obtained from the server 901 exceeds a certain level, the client device 902 may determine that a change has occurred in the three-dimensional map around the client device 902 and transmit this information, along with the sensor information, to the server 901.
[0630] Server 901 requests client device 902 to transmit sensor information. For example, Server 901 receives location information of client device 902, such as GPS, from client device 902. Based on the location information of client device 902, if Server 901 determines that client device 902 is approaching an area with little information on the three-dimensional map managed by Server 901, it requests client device 902 to transmit sensor information in order to generate a new three-dimensional map. Server 901 may also request sensor information transmission if it wants to update the three-dimensional map, check road conditions during snowfall or disasters, check traffic congestion, or check incidents and accidents.
[0631] Furthermore, the client device 902 may set the amount of sensor information data to send to the server 901 depending on the communication status or bandwidth at the time of receiving the sensor information transmission request from the server 901. Setting the amount of sensor information data to send to the server 901 means, for example, increasing or decreasing the data itself, or selecting an appropriate compression method.
[0632] Figure 147 is a block diagram showing an example configuration of the client device 902. The client device 902 receives a three-dimensional map composed of a point cloud, etc., from the server 901, and estimates its own position from the three-dimensional data created based on the sensor information of the client device 902. The client device 902 also transmits the acquired sensor information to the server 901.
[0633] The client device 902 includes a data receiving unit 1011, a communication unit 1012, a reception control unit 1013, a format conversion unit 1014, a plurality of sensors 1015, a three-dimensional data creation unit 1016, a three-dimensional image processing unit 1017, a three-dimensional data storage unit 1018, a format conversion unit 1019, a communication unit 1020, a transmission control unit 1021, and a data transmission unit 1022.
[0634] The data receiving unit 1011 receives the three-dimensional map 1031 from the server 901. The three-dimensional map 1031 is data that includes point clouds such as WLD or SWLD. The three-dimensional map 1031 may contain either compressed or uncompressed data.
[0635] The communication unit 1012 communicates with the server 901 and sends data transmission requests (for example, a request to transmit a 3D map) to the server 901.
[0636] The receiving control unit 1013 exchanges information such as the supported format with the communication destination via the communication unit 1012 and establishes communication with the communication destination.
[0637] The format conversion unit 1014 generates a three-dimensional map 1032 by performing format conversion on the three-dimensional map 1031 received by the data reception unit 1011. Furthermore, if the three-dimensional map 1031 is compressed or encoded, the format conversion unit 1014 performs decompression or decoding. However, if the three-dimensional map 1031 is uncompressed data, the format conversion unit 1014 does not perform decompression or decoding.
[0638] Multiple sensors 1015 are a group of sensors that acquire external information from the vehicle on which the client device 902 is installed, such as LiDAR, visible light cameras, infrared cameras, or depth sensors, and generate sensor information 1033. For example, if sensor 1015 is a laser sensor such as LiDAR, the sensor information 1033 is three-dimensional data such as a point cloud (point cloud data). Note that there are not necessarily multiple sensors 1015.
[0639] The three-dimensional data creation unit 1016 creates three-dimensional data 1034 of the vehicle's surroundings based on the sensor information 1033. For example, the three-dimensional data creation unit 1016 uses information acquired by LiDAR and visible light images obtained by a visible light camera to create point cloud data with color information of the vehicle's surroundings.
[0640] The three-dimensional image processing unit 1017 uses the received three-dimensional map 1032, such as a point cloud, and the three-dimensional data 1034 of the vehicle's surroundings generated from sensor information 1033 to perform self-position estimation processing for the vehicle. Alternatively, the three-dimensional image processing unit 1017 may create three-dimensional data 1035 of the vehicle's surroundings by combining the three-dimensional map 1032 and the three-dimensional data 1034, and then perform self-position estimation processing using the created three-dimensional data 1035.
[0641] The three-dimensional data storage unit 1018 stores the three-dimensional map 1032, three-dimensional data 1034, and three-dimensional data 1035, etc.
[0642] The format conversion unit 1019 generates sensor information 1037 by converting the sensor information 1033 to a format supported by the receiving side. The format conversion unit 1019 may also reduce the amount of data by compressing or encoding the sensor information 1037. Furthermore, the format conversion unit 1019 may omit processing if format conversion is not necessary. The format conversion unit 1019 may also control the amount of data transmitted according to the specified transmission range.
[0643] The communication unit 1020 communicates with the server 901 and receives data transmission requests (sensor information transmission requests), etc., from the server 901.
[0644] The transmission control unit 1021 exchanges information such as the supported format with the communication destination via the communication unit 1020 and establishes communication.
[0645] The data transmission unit 1022 transmits sensor information 1037 to the server 901. The sensor information 1037 includes information acquired by multiple sensors 1015, such as information acquired by LiDAR, brightness images acquired by a visible light camera, infrared images acquired by an infrared camera, depth images acquired by a depth sensor, sensor position information, and velocity information.
[0646] Next, the configuration of server 901 will be described. Figure 148 is a block diagram showing an example configuration of server 901. Server 901 receives sensor information transmitted from client device 902 and creates three-dimensional data based on the received sensor information. Server 901 updates the three-dimensional map it manages using the created three-dimensional data. In addition, in response to a request from client device 902 to transmit the three-dimensional map, server 901 transmits the updated three-dimensional map to client device 902.
[0647] Server 901 comprises a data receiving unit 1111, a communication unit 1112, a reception control unit 1113, a format conversion unit 1114, a three-dimensional data creation unit 1116, a three-dimensional data synthesis unit 1117, a three-dimensional data storage unit 1118, a format conversion unit 1119, a communication unit 1120, a transmission control unit 1121, and a data transmission unit 1122.
[0648] The data receiving unit 1111 receives sensor information 1037 from the client device 902. The sensor information 1037 includes, for example, information acquired by LiDAR, brightness images acquired by a visible light camera, infrared images acquired by an infrared camera, depth images acquired by a depth sensor, sensor position information, and velocity information.
[0649] The communication unit 1112 communicates with the client device 902 and sends data transmission requests (for example, requests to transmit sensor information) to the client device 902.
[0650] The receiving control unit 1113 exchanges information such as the supported format with the communication destination via the communication unit 1112 and establishes communication.
[0651] The format conversion unit 1114 generates sensor information 1132 by decompressing or decoding the received sensor information 1037 if it is compressed or encoded. However, the format conversion unit 1114 does not perform decompression or decoding if the sensor information 1037 is uncompressed data.
[0652] The three-dimensional data creation unit 1116 creates three-dimensional data 1134 of the area around the client device 902 based on the sensor information 1132. For example, the three-dimensional data creation unit 1116 uses information acquired by LiDAR and visible light images obtained by a visible light camera to create point cloud data with color information of the area around the client device 902.
[0653] The three-dimensional data synthesis unit 1117 updates the three-dimensional map 1135 managed by the server 901 by synthesizing the three-dimensional data 1134, which was created based on the sensor information 1132, with the three-dimensional map 1135.
[0654] The three-dimensional data storage unit 1118 stores three-dimensional maps 1135, etc.
[0655] The format conversion unit 1119 generates a three-dimensional map 1031 by converting the three-dimensional map 1135 to a format supported by the receiving side. The format conversion unit 1119 may also reduce the amount of data by compressing or encoding the three-dimensional map 1135. Furthermore, the format conversion unit 1119 may omit processing if format conversion is not necessary. The format conversion unit 1119 may also control the amount of data transmitted according to the specified transmission range.
[0656] The communication unit 1120 communicates with the client device 902 and receives data transmission requests (such as requests to transmit a three-dimensional map) from the client device 902.
[0657] The transmission control unit 1121 exchanges information such as the supported format with the communication destination via the communication unit 1120 and establishes communication.
[0658] The data transmission unit 1122 transmits the three-dimensional map 1031 to the client device 902. The three-dimensional map 1031 is data that includes point clouds such as WLD or SWLD. The three-dimensional map 1031 may contain either compressed or uncompressed data.
[0659] Next, we will describe the operation flow of the client device 902. Figure 149 is a flowchart showing the operation of the client device 902 when acquiring a three-dimensional map.
[0660] First, the client device 902 requests the server 901 to transmit a three-dimensional map (such as a point cloud) (S1001). At this time, the client device 902 may also transmit its own location information obtained by GPS or the like, and request the server 901 to transmit a three-dimensional map related to that location information.
[0661] Next, the client device 902 receives a three-dimensional map from the server 901 (S1002). If the received three-dimensional map is compressed data, the client device 902 decodes the received three-dimensional map to generate an uncompressed three-dimensional map (S1003).
[0662] Next, the client device 902 creates three-dimensional data 1034 of the area around the client device 902 from sensor information 1033 obtained from multiple sensors 1015 (S1004). Then, the client device 902 estimates its own position using the three-dimensional map 1032 received from the server 901 and the three-dimensional data 1034 created from the sensor information 1033 (S1005).
[0663] Figure 150 is a flowchart showing the operation of the client device 902 when transmitting sensor information. First, the client device 902 receives a request to transmit sensor information from the server 901 (S1011). Upon receiving the transmission request, the client device 902 transmits sensor information 1037 to the server 901 (S1012). If the sensor information 1033 includes multiple pieces of information obtained from multiple sensors 1015, the client device 902 may generate sensor information 1037 by compressing each piece of information using a compression method suitable for each piece of information.
[0664] Next, the operation flow of server 901 will be described. Figure 151 is a flowchart showing the operation of server 901 when acquiring sensor information. First, server 901 requests client device 902 to send sensor information (S1021). Next, server 901 receives sensor information 1037 sent from client device 902 in response to the request (S1022). Next, server 901 creates three-dimensional data 1134 using the received sensor information 1037 (S1023). Next, server 901 reflects the created three-dimensional data 1134 in three-dimensional map 1135 (S1024).
[0665] Figure 152 is a flowchart illustrating the operation of server 901 when transmitting a three-dimensional map. First, server 901 receives a request to transmit a three-dimensional map from client device 902 (S1031). Upon receiving the request to transmit a three-dimensional map, server 901 transmits the three-dimensional map 1031 to client device 902 (S1032). At this time, server 901 may extract a three-dimensional map of the vicinity of client device 902 according to its location information and transmit the extracted three-dimensional map. Alternatively, server 901 may compress the three-dimensional map composed of a point cloud using, for example, an octave tree compression method, and transmit the compressed three-dimensional map.
[0666] Modifications of this embodiment will be described below.
[0667] Server 901 uses sensor information 1037 received from client device 902 to create three-dimensional data 1134 of the area around client device 902. Next, server 901 calculates the difference between the created three-dimensional data 1134 and the three-dimensional map 1135 of the same area managed by server 901 by matching them. If the difference is greater than or equal to a predetermined threshold, server 901 determines that some kind of abnormality has occurred around client device 902. For example, when ground subsidence occurs due to a natural disaster such as an earthquake, a large difference may occur between the three-dimensional map 1135 managed by server 901 and the three-dimensional data 1134 created based on sensor information 1037.
[0668] The sensor information 1037 may include information indicating at least one of the following: the type of sensor, the performance of the sensor, and the model number of the sensor. Furthermore, a class ID corresponding to the sensor's performance may be added to the sensor information 1037. For example, if the sensor information 1037 is information acquired by a LiDAR, it is conceivable to assign identifiers to the sensor's performance, such as class 1 for sensors that can acquire information with accuracy in the millimeter range, class 2 for sensors that can acquire information with accuracy in the centimeter range, and class 3 for sensors that can acquire information with accuracy in the meter range. The server 901 may also estimate the sensor's performance information from the model number of the client device 902. For example, if the client device 902 is mounted in a vehicle, the server 901 may determine the sensor's specifications from the vehicle's make and model. In this case, the server 901 may have previously acquired information about the vehicle's make and model, or this information may be included in the sensor information. The server 901 may also use the acquired sensor information 1037 to switch the degree of correction applied to the three-dimensional data 1134 created using the sensor information 1037. For example, if the sensor performance is high precision (Class 1), the server 901 does not perform any correction on the three-dimensional data 1134. If the sensor performance is low precision (Class 3), the server 901 applies a correction to the three-dimensional data 1134 according to the accuracy of the sensor. For example, the lower the accuracy of the sensor, the stronger the degree (intensity) of the correction applied by the server 901.
[0669] Server 901 may simultaneously send requests for the transmission of sensor information to multiple client devices 902 located in a given space. When Server 901 receives multiple sensor information from multiple client devices 902, it is not necessary to use all of the sensor information to create the three-dimensional data 1134. For example, it may select which sensor information to use depending on the performance of the sensors. For example, when updating the three-dimensional map 1135, Server 901 may select high-precision sensor information (Class 1) from the multiple sensor information received and use the selected sensor information to create the three-dimensional data 1134.
[0670] Server 901 is not limited to servers such as traffic monitoring clouds, but may also be other client devices (in-vehicle). Figure 153 shows the system configuration in this case.
[0671] For example, client device 902C requests sensor information from a nearby client device 902A and obtains the sensor information from client device 902A. Then, client device 902C uses the obtained sensor information from client device 902A to create three-dimensional data and updates the three-dimensional map of client device 902C. In this way, client device 902C can generate a three-dimensional map of the space obtainable from client device 902A, taking advantage of the performance of client device 902C. For example, this case is likely to occur when client device 902C has high performance.
[0672] In this case, client device 902A, which provided the sensor information, is granted the right to acquire the high-precision three-dimensional map generated by client device 902C. Client device 902A receives the high-precision three-dimensional map from client device 902C in accordance with that right.
[0673] Furthermore, client device 902C may send requests for the transmission of sensor information to multiple nearby client devices 902 (client devices 902A and 902B). If the sensor of client device 902A or client device 902B is high-performance, client device 902C can create three-dimensional data using the sensor information obtained from this high-performance sensor.
[0674] Figure 154 is a block diagram showing the functional configuration of server 901 and client device 902. Server 901 includes, for example, a three-dimensional map compression / decoding processing unit 1201 that compresses and decodes three-dimensional maps, and a sensor information compression / decoding processing unit 1202 that compresses and decodes sensor information.
[0675] The client device 902 comprises a three-dimensional map decoding processing unit 1211 and a sensor information compression processing unit 1212. The three-dimensional map decoding processing unit 1211 receives encoded data of the compressed three-dimensional map, decodes the encoded data, and obtains the three-dimensional map. The sensor information compression processing unit 1212 compresses the sensor information itself instead of the three-dimensional data created from the acquired sensor information, and sends the encoded data of the compressed sensor information to the server 901. With this configuration, the client device 902 only needs to internally store a processing unit (device or LSI) that performs the processing of decoding the three-dimensional map (point cloud, etc.), and does not need to internally store a processing unit that performs the processing of compressing the three-dimensional data of the three-dimensional map (point cloud, etc.). This reduces the cost and power consumption of the client device 902.
[0676] As described above, the client device 902 according to this embodiment is mounted on a mobile body and creates three-dimensional data 1034 of the surrounding area of the mobile body from sensor information 1033 indicating the surrounding conditions of the mobile body obtained by a sensor 1015 mounted on the mobile body. The client device 902 estimates the self-position of the mobile body using the created three-dimensional data 1034. The client device 902 transmits the acquired sensor information 1033 to the server 901 o...
Claims
1. An encoding method performed by an encoding device for encoding three-dimensional data including multiple three-dimensional points arranged in multiple regions, A bitstream containing encoded data of the plurality of three-dimensional points is generated, The bitstream includes additional information, The additional information includes connection information corresponding to each of the multiple regions, Each of the aforementioned connection information includes one or more other region identification pieces that identify one or more other regions that are determined to be related to the region corresponding to the connection information, The determination is made based on visibility with respect to the viewpoint within the region corresponding to the connection information. Three-dimensional data encoding method.
2. Each of the connection pieces of information further includes information indicating the number of one or more other regions that are determined to be related to the region corresponding to the connection piece of information. The three-dimensional data encoding method according to claim 1.
3. Each of the connection information further includes information indicating the priority of each of the one or more other areas The three-dimensional data encoding method according to claim 1 or 2.
4. The determination is made based on whether or not other areas are within the field of view relative to the viewpoint. A method for encoding three-dimensional data according to any one of claims 1 to 3.
5. The determination is made based on at least one of the following: that another region overlaps with the region corresponding to the connection information, and that another region is adjacent to the region corresponding to the connection information. A method for encoding three-dimensional data according to any one of claims 1 to 3.
6. Each of the aforementioned connection pieces of information includes information indicating that the closer another region is to the region corresponding to the connection piece of information, the earlier that region will decode the encoded three-dimensional point from among the one or more other regions. A method for encoding three-dimensional data according to any one of claims 1 to 5.
7. A decoding method performed by a decoding device for decoding three-dimensional data including multiple three-dimensional points arranged in multiple regions, A bitstream containing encoded data of the plurality of three-dimensional points is obtained, The bitstream includes additional information, The additional information includes connection information corresponding to each of the multiple regions, Each of the aforementioned connection information includes one or more other region identification pieces that identify one or more other regions that are determined to be related to the region corresponding to the connection information, The decoding method further includes, Based on the connection information, the encoded data is decoded. The determination is made based on visibility with respect to the viewpoint within the region corresponding to the connection information. Three-dimensional data decoding method.
8. Each of the connection pieces of information further includes information indicating the number of one or more other regions that are determined to be related to the region corresponding to the connection piece of information. The method for decoding three-dimensional data according to claim 7.
9. Each of the connection information further includes information indicating the priority of each of the one or more other areas The method for decoding three-dimensional data according to claim 7 or 8.
10. The determination is made based on whether or not other areas are within the field of view relative to the viewpoint. A method for decoding three-dimensional data according to any one of claims 7 to 9.
11. The determination is made based on at least one of the following: that the other region overlaps with the region corresponding to the connection information, and that it is adjacent to the region corresponding to the connection information. A method for decoding three-dimensional data according to any one of claims 7 to 9.
12. Each of the aforementioned connection pieces of information includes information indicating that the closer the other region is to the region corresponding to the connection piece of information, the earlier the region will decode the encoded three-dimensional point from among the one or more other regions. A method for decoding three-dimensional data according to any one of claims 7 to 11.
13. An encoding device for encoding three-dimensional data including a plurality of three-dimensional points arranged in a plurality of regions, Processor and Equipped with memory, The processor uses the memory to: A bitstream containing encoded data of the plurality of three-dimensional points is generated, The bitstream includes additional information, The additional information includes connection information corresponding to each of the multiple regions, Each of the aforementioned connection information includes one or more other region identification pieces that identify one or more other regions that are determined to be related to the region corresponding to the connection information, The determination is made based on visibility with respect to the viewpoint within the region corresponding to the connection information. Three-dimensional data encoding device.
14. A decoding device for decoding three-dimensional data including a plurality of three-dimensional points arranged in a plurality of regions, Processor and Equipped with memory, The processor uses the memory to: A bitstream containing encoded data of the plurality of three-dimensional points is obtained, The bitstream includes additional information, The additional information includes connection information corresponding to each of the multiple regions, Each of the aforementioned connection information includes one or more other region identification pieces that identify one or more other regions that are determined to be related to the region corresponding to the connection information, The processor uses the memory to further, Based on the connection information, the encoded data is decoded. The determination is made based on visibility with respect to the viewpoint within the region corresponding to the connection information. Three-dimensional data decoding device.