Point cloud semantic segmentation method, device and equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By performing multi-scale semantic segmentation and fusion of point clouds and optical images, the problem of low efficiency in 3D feature annotation in traditional high-precision maps has been solved, achieving high-precision, efficient automated annotation and large-scale mass production.

CN115512115BActive Publication Date: 2026-06-23BEIJING BAIDU NETCOM SCI & TECH CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING BAIDU NETCOM SCI & TECH CO LTD
Filing Date: 2022-10-31
Publication Date: 2026-06-23

AI Technical Summary

Technical Problem

Traditional high-precision maps rely on manual operation for 3D feature annotation, which is inefficient and cannot be mass-produced.

Method used

A point cloud semantic segmentation method is adopted, which automatically labels the elements in the point cloud by performing multi-scale semantic segmentation and optical image segmentation on the point cloud to be segmented, combined with multi-scale feature fusion.

Benefits of technology

It improves the accuracy of point cloud semantic segmentation, ensures the precision of high-precision map production, and enables automated and large-scale mass production.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115512115B_ABST

Patent Text Reader

Abstract

The present disclosure provides a point cloud semantic segmentation method and device, equipment and storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of semantic segmentation and deep learning. The specific implementation scheme is: performing semantic segmentation on a to-be-segmented point cloud to obtain point cloud segmentation results of at least two scales; fusing the segmentation results of the at least two scales to obtain a first segmentation result of the to-be-segmented point cloud; performing semantic segmentation on an optical image to obtain an image segmentation result, the optical image being an image obtained by photographing a scanning region corresponding to the point cloud by an optical imaging device; determining a second segmentation result of the to-be-segmented point cloud according to the to-be-segmented point cloud and the image segmentation result; and determining a target semantic segmentation result of the to-be-segmented point cloud according to the first segmentation result and the second segmentation result. The present disclosure can automatically segment the point cloud, improve the labeling efficiency of elements in the point cloud, and facilitate large-scale production.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, and in particular to the fields of semantic segmentation and deep learning technology, specifically to a point cloud semantic segmentation method, apparatus, device and storage medium. Background Technology

[0002] With the continuous development of autonomous driving and advanced driver assistance systems, high-precision maps are becoming an indispensable component. Relying on high-precision maps, the intelligent capabilities of vehicles can be further enhanced. For example, on snow-covered roads, where visibility is obstructed by other vehicles, the capabilities of visual or lidar systems are greatly reduced. With high-precision maps, vehicle safety is significantly improved.

[0003] Traffic lights, streetlights, and other 3D elements are the most important parts of high-precision maps. Vehicles can achieve precise positioning based on these high-precision 3D elements, which is significant in ensuring driving safety and efficiency.

[0004] Currently, the traditional method for creating 3D feature annotations in high-precision maps typically relies on high-precision data acquisition vehicles to collect images and point cloud information, followed by manual annotation of features in the point cloud based on the images. This method is inefficient and cannot be scaled up for mass production. Summary of the Invention

[0005] This disclosure provides a point cloud semantic segmentation method, apparatus, device, and storage medium, which can perform automated semantic segmentation of point clouds, improve the annotation efficiency of elements in point clouds, and facilitate large-scale mass production.

[0006] According to a first aspect of this disclosure, a point cloud semantic segmentation method is provided, comprising: performing semantic segmentation on a point cloud to be segmented to obtain point cloud segmentation results at at least two scales; fusing the segmentation results at at least two scales to obtain a first segmentation result of the point cloud to be segmented; performing semantic segmentation on an optical image to obtain an image segmentation result, wherein the optical image is an image obtained by capturing a scanning area corresponding to the point cloud through an optical imaging device; determining a second segmentation result of the point cloud to be segmented based on the point cloud to be segmented and the image segmentation result; and determining a target semantic segmentation result of the point cloud to be segmented based on the first segmentation result and the second segmentation result.

[0007] According to a second aspect of this disclosure, a point cloud semantic segmentation apparatus is provided, comprising: a first segmentation module, configured to perform semantic segmentation on a point cloud to be segmented to obtain point cloud segmentation results at at least two scales; to fuse the segmentation results at at least two scales to obtain a first segmentation result of the point cloud to be segmented; a second segmentation module, configured to perform semantic segmentation on an optical image to obtain an image segmentation result, wherein the optical image is an image obtained by capturing a scanning area corresponding to the point cloud through an optical imaging device; to determine a second segmentation result of the point cloud to be segmented based on the point cloud to be segmented and the image segmentation result; and a fusion module, configured to determine a target semantic segmentation result of the point cloud to be segmented based on the first segmentation result and the second segmentation result.

[0008] According to a third aspect of this disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method provided in the first aspect.

[0009] According to a fourth aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions for causing a computer to perform the method provided according to the first aspect.

[0010] According to a fifth aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method provided according to the first aspect.

[0011] This disclosure enables multi-scale semantic segmentation of point cloud data and optical images of a specific area, and obtains corresponding segmentation results (i.e., the first segmentation result) through multi-scale special fusion. It also allows association of the point cloud with the semantic segmentation results of the optical images to obtain corresponding point cloud segmentation results (the second segmentation result). The final target semantic segmentation result is then determined based on the first and second segmentation results. This allows for automated semantic segmentation of point clouds and annotation of various elements within the point cloud. Furthermore, the fusion of the first segmentation result obtained through direct semantic segmentation and the second segmentation result obtained by associating semantic segmentation of the optical images with the point cloud results makes the target semantic segmentation result more accurate, thereby improving the precision of the semantic segmentation results obtained from point cloud semantic segmentation and resulting in higher accuracy of the high-precision map ultimately produced based on these segmentation results.

[0012] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0013] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:

[0014] Figure 1 This is one of the flowcharts illustrating the point cloud semantic segmentation method provided in this embodiment of the disclosure;

[0015] Figure 2 This is the second schematic diagram of the point cloud semantic segmentation process provided in the embodiments of this disclosure;

[0016] Figure 3 This is a schematic diagram of the composition of the point cloud semantic segmentation device provided in the embodiments of this disclosure;

[0017] Figure 4 A schematic block diagram of an example electronic device 400 that can be used to implement embodiments of the present disclosure is shown. Detailed Implementation

[0018] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0019] The point cloud semantic segmentation method and apparatus disclosed herein are applicable to situations where point clouds are semantically segmented to label different features within the point cloud. The point cloud semantic segmentation method provided herein can be executed by the point cloud semantic segmentation apparatus, which can be implemented in software and / or hardware and specifically configured in an electronic device. This electronic device can be a mobile terminal (such as a mobile phone, tablet, etc.), a server, a computer, in-vehicle equipment, a microcontroller, or other computing devices; no limitations are imposed here.

[0020] The point cloud semantic segmentation method provided in this disclosure will be described in detail below.

[0021] With the continuous development of autonomous driving and advanced driver assistance systems, high-precision maps are becoming an indispensable component. Relying on high-precision maps, the intelligent capabilities of vehicles can be further enhanced. For example, on snow-covered roads, where visibility is obstructed by other vehicles, the capabilities of visual or lidar systems are greatly reduced. With high-precision maps, vehicle safety is significantly improved.

[0022] Traffic lights, streetlights, and other 3D elements are the most important parts of high-precision maps. Vehicles can achieve precise positioning based on these high-precision 3D elements, which is significant in ensuring driving safety and efficiency.

[0023] Currently, the traditional method for creating 3D feature annotations in high-precision maps typically relies on high-precision data acquisition vehicles to collect images and point cloud information, followed by manual annotation of features in the point cloud based on the images. This method is inefficient and cannot be scaled up for mass production.

[0024] To address this, this disclosure provides a point cloud semantic segmentation method, comprising: performing semantic segmentation on the point cloud to be segmented to obtain point cloud segmentation results at at least two scales; fusing the point cloud segmentation results at at least two scales to obtain a first segmentation result of the point cloud to be segmented; performing semantic segmentation on an optical image to obtain an image segmentation result, wherein the optical image is an image obtained by capturing a scanning area corresponding to the point cloud through an optical imaging device; determining a second segmentation result of the point cloud to be segmented based on the point cloud to be segmented and the image segmentation result; and determining a target semantic segmentation result of the point cloud to be segmented based on the first segmentation result and the second segmentation result.

[0025] This disclosure enables multi-scale semantic segmentation of point cloud data and optical images of a specific area, and obtains corresponding segmentation results (i.e., the first segmentation result) through multi-scale special fusion. It also allows association of the point cloud with the semantic segmentation results of the optical images to obtain corresponding point cloud segmentation results (the second segmentation result). The final target semantic segmentation result is then determined based on the first and second segmentation results. This allows for automated semantic segmentation of point clouds and annotation of various elements within the point cloud. Furthermore, the fusion of the first segmentation result obtained through direct semantic segmentation and the second segmentation result obtained by associating semantic segmentation of the optical images with the point cloud results makes the target semantic segmentation result more accurate, thereby improving the precision of the semantic segmentation results obtained from point cloud semantic segmentation and resulting in higher accuracy of the high-precision map ultimately produced based on these segmentation results.

[0026] Figure 1 This is a flowchart illustrating the point cloud semantic segmentation method provided in this embodiment of the disclosure. Figure 1 As shown, the method may include the following S101-S105.

[0027] S101. Perform semantic segmentation on the point cloud to be segmented to obtain point cloud segmentation results at least two scales.

[0028] Semantic segmentation of the point cloud to be segmented can be performed by multi-level semantic segmentation (or multi-scale semantic segmentation). That is, multiple downsampling and semantic segmentation are performed on the point cloud to be segmented.

[0029] For example, a first semantic segmentation can be performed on the point cloud to be segmented to obtain a first-scale point cloud segmentation result. Then, based on the first-scale segmentation result, it can be downsampled once, followed by a second semantic segmentation to obtain a second-scale point cloud segmentation result. Thus, point cloud segmentation results at two scales can be obtained.

[0030] For example, a first semantic segmentation can be performed on the point cloud to be segmented, yielding a first-scale point cloud segmentation result. Then, based on this first-scale segmentation result, a downsampling is performed, followed by a second semantic segmentation, yielding a second-scale point cloud segmentation result. This second-scale segmentation result is then downsampled again, followed by a third semantic segmentation, yielding a third-scale point cloud segmentation result. Finally, based on this third-scale segmentation result, a downsampling is performed, followed by a fourth semantic segmentation, yielding a fourth-scale point cloud segmentation result, thus obtaining point cloud segmentation results at four scales.

[0031] Alternatively, based on the aforementioned example, before performing the first semantic segmentation, the point cloud to be segmented can be downsampled once, and then the first semantic segmentation can be performed.

[0032] In the examples above, the downsampling factor can be the same or different each time; there is no restriction here. For example, it can be double downsampling each time, or a different downsampling factor can be used each time, or some downsampling can use the same downsampling factor while others use different downsampling factors, etc.

[0033] Of course, the above is just an example. In practical applications, there are no restrictions on the number of semantic segmentation steps and downsampling. That is, the final number of point cloud segmentation results at several scales can be set according to the actual situation. As long as point cloud segmentation results at at least two scales can be obtained, it is acceptable.

[0034] Optionally, semantic segmentation of the point cloud to be segmented can be achieved using a semantic segmentation model. This semantic segmentation model can be trained based on a multi-level semantic segmentation neural network. For example, based on a basic semantic segmentation model, a semantic segmentation model capable of achieving the above functions can be obtained by training on a training set consisting of point clouds labeled with various elements.

[0035] Optionally, when performing semantic segmentation on the point cloud to be segmented, the point cloud can first be voxelized, thus dividing it into cubes of a certain size (e.g., 0.2m × 0.2m × 0.2m). This allows subsequent semantic segmentation to be performed using each cube as the smallest dimension, thereby reducing the amount of data processing required for semantic segmentation and improving its efficiency. The features of each cube (or voxel) can be defined as the geometric information of the point cloud within the voxel, such as the positional offset of a point relative to the voxel center, the number of point points in the voxel, and the average position coordinates of the point cloud within the voxel.

[0036] S102. Fuse the point cloud segmentation results at least two scales to obtain the first segmentation result of the point cloud to be segmented.

[0037] For example, fusing the point cloud segmentation results obtained at least two scales can be achieved by first aligning the point cloud segmentation results at each scale, and then weighting and fusing the aligned point cloud segmentation results at each scale to obtain the first segmentation result.

[0038] For example, consider a point cloud segmentation result comprising a first-scale segmentation result, a second-scale segmentation result obtained by downsampling and semantic segmentation based on the first-scale segmentation result, and a third-scale segmentation result obtained by downsampling and semantic segmentation based on the second-scale segmentation result. To fuse these three scales, the first and second-scale segmentation results can be downsampled separately to align their scales with the third-scale segmentation result. Then, the aligned segmentation results can be fused using the following formula.

[0039] f(x1,x2,x3)=a×x1+b×x2+c×x3

[0040] Where x1, x2, and x3 are the scale-aligned results of point cloud segmentation at different scales, a, b, and c are the coefficient weights, and f(x1, x2, x3) is the final fused segmentation result.

[0041] In practical applications, the coefficient weights corresponding to the point cloud segmentation results at each scale can be fixed values set in advance, or they can be values that are adaptively adjusted based on the point cloud segmentation results at each scale (such as adaptively allocating the coefficient weights corresponding to the point cloud segmentation results at each scale based on the attention weight allocation mechanism). There are no restrictions here.

[0042] For example, the point cloud segmentation results obtained at least two scales can be fused. Alternatively, the point cloud segmentation result at the smallest scale can be started, upsampled to be scale-aligned with the point cloud segmentation result at the previous scale, and then weighted and fused with the point cloud segmentation result at the previous scale until it is finally fused with the point cloud segmentation result at the largest scale to obtain the final first segmentation result.

[0043] For example, consider a point cloud segmentation result comprising a first-scale segmentation result, a second-scale segmentation result obtained by downsampling and semantic segmentation based on the first-scale segmentation result, and a third-scale segmentation result obtained by downsampling and semantic segmentation based on the second-scale segmentation result. The fusion of these three scales can be achieved by first upsampling the third-scale segmentation result to align it with the scale of the second-scale segmentation result, then performing a weighted fusion with the second-scale segmentation result (which can be called the original point cloud segmentation result). Next, the weighted fusion result is upsampled to align with the scale of the first-scale segmentation result, and then this upsampled result is weighted and fused with the first-scale segmentation result (which can be called the original point cloud segmentation result) to finally obtain the first segmentation result.

[0044] The weighted fusion of the upsampled point cloud segmentation result and the corresponding original point cloud segmentation result can be performed using the following formula:

[0045] f(x m ,x o )=α×x m +(1-α)×x o

[0046] Where, x m x represents the original point cloud segmentation result corresponding to the upsampled point cloud segmentation result. o The result is the upsampled point cloud segmentation, where α is the weight.

[0047] In practical applications, the aforementioned weight α can be a pre-set fixed value or a preset initial weight value, and it will be adaptively updated in the continuous feedback of the convolutional network. There are no restrictions here.

[0048] As another example, the point cloud segmentation results obtained at least two scales can be fused, or the two examples mentioned above can be combined. For instance, the point cloud segmentation results at each scale can be first scale-aligned based on the smallest scale segmentation result, and then the aligned point cloud segmentation results at each scale can be weighted and fused. Then, starting from the fused point cloud segmentation result, it is upsampled to be scale-aligned with the point cloud segmentation result at the previous scale, and then weighted and fused with the point cloud segmentation result at the previous scale, until finally fused with the largest scale point cloud segmentation result, thereby obtaining the first segmentation result.

[0049] For example, continuing with the example of point cloud segmentation results including a first-scale point cloud segmentation result, a second-scale point cloud segmentation result obtained by downsampling and semantic segmentation based on the first-scale point cloud segmentation result, and a third-scale point cloud segmentation result obtained by downsampling and semantic segmentation based on the second-scale point cloud segmentation result, the fusion of these three scales of point cloud segmentation results can be achieved by first downsampling the first-scale and second-scale point cloud segmentation results respectively to align their scales with the third-scale point cloud segmentation result. Then, the aligned point cloud segmentation results are fused. Next, the fused point cloud segmentation result is upsampled to align with the scale of the second-scale point cloud segmentation result. This upsampled point cloud segmentation result is then weighted and fused with the second-scale point cloud segmentation result (which can be called the original point cloud segmentation result). Finally, the weighted fused result is upsampled to align with the scale of the first-scale point cloud segmentation result. This upsampled point cloud segmentation result is then weighted and fused with the first-scale point cloud segmentation result (which can be called the original point cloud segmentation result) to obtain the first segmentation result. The fusion of point cloud segmentation results at the first, second, and third scales, as well as the weighted fusion of the upsampled point cloud segmentation results and the corresponding original point cloud segmentation results, can be referred to the description in the above example, and are not limited here.

[0050] S103. Perform semantic segmentation on the optical image to obtain the image segmentation result. The optical image is an image obtained by taking pictures of the scanning area corresponding to the point cloud through an optical imaging device.

[0051] Among them, optical imaging equipment can be devices such as cameras, video cameras, and photosensitive devices that can obtain optical images of corresponding areas through optical imaging.

[0052] Semantic segmentation of optical images can be achieved using image semantic segmentation models from related technologies, and no restrictions are imposed here.

[0053] S104. Based on the point cloud to be segmented and the image segmentation results, determine the second segmentation result of the point cloud to be segmented.

[0054] For example, a depth map can be generated based on the point cloud to be segmented, the optical image, and the positional relationship between the point cloud scanning device (such as LiDAR) and the optical imaging device, including the correspondence between the point cloud and the optical image. Then, the image segmentation result obtained by semantic segmentation of the optical image is mapped to the depth map, thereby determining the second segmentation result of the point cloud to be segmented.

[0055] S105. Based on the first segmentation result and the second segmentation result, determine the target semantic segmentation result of the point cloud to be segmented.

[0056] Based on the first segmentation result and the second segmentation result, the target semantic segmentation result of the point cloud to be segmented is determined. This can be done by determining the bounding box with the highest point cloud density in the bounding boxes of the corresponding positions in the first segmentation result and the second segmentation result as the bounding box at that position in the target semantic segmentation result.

[0057] Alternatively, the first segmentation result and the second segmentation result can be fused to obtain the target semantic segmentation result.

[0058] For example, if the first segmentation result and the second segmentation result each include bounding boxes labeled with point clouds, then the bounding boxes at corresponding positions in the two segmentation results can be merged to obtain the target semantic segmentation result.

[0059] For example, given the first bounding box in the first segmentation result and the corresponding second bounding box in the second segmentation result; based on the first and second segmentation results, determine the target semantic segmentation result of the point cloud to be segmented, such as... Figure 2 As shown, it may include:

[0060] S201. Determine the fused bounding box based on the overlapping area of the first bounding box in the first segmentation result and the second bounding box in the second segmentation result.

[0061] The position of the first bounding box in the first segmentation result corresponds to the position of the second bounding box in the second segmentation result.

[0062] S202. By adjusting the orientation of the fusion bounding box, determine the target orientation corresponding to the maximum number of point cloud points enclosed by the fusion bounding box.

[0063] S203. Use the adjusted fused bounding box as the bounding box in the target semantic segmentation result.

[0064] That is, the adjusted fused bounding box is used as the bounding box at the corresponding position in the target semantic segmentation result.

[0065] When adjusting the orientation of the blending bounding box, it can be done by rotating the bounding box. The center of rotation can be the average coordinates of all point cloud points enclosed within the bounding box. That is, the x-coordinate of the rotation center is the average of the x-coordinates of all point cloud points enclosed within the bounding box, and the y-coordinate of the rotation center is the average of the y-coordinates of all point cloud points enclosed within the bounding box. Of course, the geometric center of the bounding box can also be used as the center of rotation.

[0066] In this way, the bounding boxes at corresponding positions in the first and second segmentation results can be fused separately, thereby improving the accuracy of the bounding boxes in the final semantic segmentation result and thus improving the accuracy of semantic segmentation of point clouds.

[0067] Optionally, before using the adjusted fused bounding box as the bounding box in the target semantic segmentation result, the method further includes:

[0068] Based on the point cloud density in the fused bounding box, adjust the size of the fused bounding box to the target size. The point cloud density in the fused bounding box of the target size is greater than or equal to the point cloud density in the fused bounding box before adjustment.

[0069] The process involves adjusting the size of the fusion bounding box to the target size based on the point cloud density within it. This can be done by using the point cloud density (e.g., the average number of points within a 0.1m x 0.1m area within the bounding box). While maintaining the aspect ratio, if expanding the bounding box results in a point cloud density that is similar to or exceeds the threshold before adjustment (e.g., the absolute value of the difference is less than a preset threshold), the size of the bounding box is adjusted back to the current size. This process is repeated until the point cloud density of the bounding box decreases by more than the threshold after expansion. Conversely, if shrinking the bounding box while maintaining the aspect ratio results in a point cloud density that is similar to or exceeds the threshold before adjustment, the size of the bounding box is adjusted back to the current size. This process is repeated until the point cloud density of the bounding box decreases by more than the threshold after shrinking.

[0070] Alternatively, the size of the fusion bounding box can be adjusted to the target size based on the point cloud density within it. This can be done by adjusting the width and height of the fusion bounding box based on the point cloud density in the width direction (e.g., the average number of point clouds in a 0.1-meter area along the width direction) and the height direction (e.g., the average number of point clouds in a 0.1-meter area along the height direction). For example, in the height direction, if the point cloud density in the height direction is similar to the original value (e.g., the absolute value of the difference is less than a preset threshold) or increases by more than a threshold when the height direction is extended by a preset length, such as 0.1 meters, then the height of the fusion bounding box is adjusted to the current size. This process is repeated until the point cloud density in the height direction decreases by more than a threshold after the height is extended. When the height direction is shrunk by a preset length, such as 0.1 meters, if the point cloud density in the height direction is similar to that before adjustment (e.g., the absolute value of the difference is less than a preset threshold) or increases beyond the threshold, then the height of the fused bounding box is adjusted to the current size. The process of shrinking the height of the fused bounding box is repeated until, after shrinking the height, the point cloud density in the height direction of the fused bounding box decreases beyond the threshold compared to before adjustment. The adjustment of the width direction of the fused bounding box is similar to the adjustment in the height direction and will not be elaborated here.

[0071] In this way, the size of the fusion bounding box can be adjusted, thereby increasing the density of the point cloud enclosed by the fusion bounding box, making the fusion bounding box enclose point clouds with the same content as much as possible, improving the accuracy of the fusion bounding box, and thus improving the accuracy of the target semantic segmentation result.

[0072] Optionally, before using the adjusted fused bounding box as the bounding box in the target semantic segmentation result, the method further includes:

[0073] Adjust the dimensions of the merged bounding box according to the actual aspect ratio of the feature type annotated by the merged bounding box, so that the aspect ratio of the merged bounding box is the same as the actual aspect ratio.

[0074] Optionally, the actual aspect ratio of an element type can be preset according to each element type.

[0075] Specifically, adjusting the scale of the merge bounding box based on the actual aspect ratio of the feature type it labels can be achieved as follows: When the height of the merge bounding box exceeds an adjustment threshold (which can be set based on the actual height of the corresponding feature type), the width is kept constant while the height is adjusted to match the actual aspect ratio (this matching can be exact or close within a certain threshold range). Similarly, when the width of the merge bounding box exceeds an adjustment threshold (which can be set based on the actual width of the corresponding feature type), the height is kept constant while the width is adjusted to match the actual aspect ratio (this matching can be exact or close within a certain threshold range).

[0076] In this way, by adjusting the aspect ratio of the fused bounding box according to the actual aspect ratio, the size of the fused bounding box can be prevented from being abnormal and deviating from the actual size of the corresponding element, thereby improving the accuracy of the fused bounding box and thus improving the accuracy of the target semantic segmentation result.

[0077] In an exemplary embodiment, this disclosure also provides a point cloud semantic segmentation apparatus, which can be used to implement the point cloud semantic segmentation method as described in the foregoing embodiments.

[0078] Figure 3 This is a schematic diagram of the composition of a point cloud semantic segmentation device provided in an embodiment of this disclosure.

[0079] like Figure 3 As shown, the point cloud semantic segmentation device includes:

[0080] The first segmentation module 301 is used to perform semantic segmentation on the point cloud to be segmented, and obtain point cloud segmentation results at at least two scales; and to fuse the point cloud segmentation results at at least two scales to obtain the first segmentation result of the point cloud to be segmented.

[0081] The second segmentation module 302 is used to perform semantic segmentation on the optical image to obtain an image segmentation result. The optical image is an image obtained by taking a picture of the scanning area corresponding to the point cloud through an optical imaging device. Based on the point cloud to be segmented and the image segmentation result, the second segmentation result of the point cloud to be segmented is determined.

[0082] The fusion module 303 is used to determine the target semantic segmentation result of the point cloud to be segmented based on the first segmentation result and the second segmentation result.

[0083] In some possible implementations, the first segmentation result and the second segmentation result each include bounding boxes annotated with point clouds; the fusion module 303 is specifically used to determine a fused bounding box based on the overlapping area of the first bounding box in the first segmentation result and the second bounding box in the second segmentation result, wherein the position of the first bounding box in the first segmentation result corresponds to the position of the second bounding box in the second segmentation result; by adjusting the orientation of the fused bounding box, the target orientation corresponding to the maximum number of point cloud points enclosed by the fused bounding box is determined; and the adjusted fused bounding box is used as the bounding box in the target semantic segmentation result.

[0084] In some possible implementations, the fusion module 303 is further configured to adjust the size of the fusion bounding box to a target size based on the point cloud density in the fusion bounding box, wherein the point cloud density in the fusion bounding box of the target size is greater than or equal to the point cloud density in the fusion bounding box before adjustment.

[0085] In some possible implementations, the fusion module 303 is also used to adjust the size of the adjusted fusion bounding box according to the actual aspect ratio of the feature type marked by the fusion bounding box, so that the aspect ratio of the fusion bounding box is the same as the actual aspect ratio.

[0086] The acquisition, storage, and application of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0087] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.

[0088] In an exemplary embodiment, an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method as described in the above embodiments.

[0089] In an exemplary embodiment, the readable storage medium may be a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method described in the above embodiments.

[0090] In an exemplary embodiment, the computer program product includes a computer program that, when executed by a processor, implements the method described in the above embodiments.

[0091] Figure 4A schematic block diagram of an example electronic device 400 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, in-vehicle devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0092] like Figure 4 As shown, device 400 includes a computing unit 401, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 402 or a computer program loaded from storage unit 408 into random access memory (RAM) 403. RAM 403 may also store various programs and data required for the operation of device 400. The computing unit 401, ROM 402, and RAM 403 are interconnected via bus 404. Input / output (I / O) interface 405 is also connected to bus 404.

[0093] Multiple components in device 400 are connected to I / O interface 405, including: input unit 406, such as keyboard, mouse, etc.; output unit 407, such as various types of monitors, speakers, etc.; storage unit 408, such as disk, optical disk, etc.; and communication unit 409, such as network card, modem, wireless transceiver, etc. Communication unit 409 allows device 400 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0094] The computing unit 401 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the various methods and processes described above, such as point cloud semantic segmentation methods. For example, in some embodiments, the point cloud semantic segmentation method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and / or installed on device 400 via ROM 402 and / or communication unit 409. When the computer program is loaded into RAM 403 and executed by the computing unit 401, one or more steps of the point cloud semantic segmentation method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform point cloud semantic segmentation methods by any other suitable means (e.g., by means of firmware).

[0095] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0096] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0097] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0098] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0099] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0100] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.

[0101] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0102] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A point cloud semantic segmentation method, characterized in that, include: Multi-scale semantic segmentation is performed on the point cloud to be segmented to obtain point cloud segmentation results at at least two scales; wherein, the multi-scale semantic segmentation includes voxelization of the point cloud to be segmented, and multiple downsampling and semantic segmentation based on the features of each voxel; The point cloud segmentation results at at least two scales are scale aligned, and the aligned point cloud segmentation results at each scale are weighted and fused to obtain the first segmentation result of the point cloud to be segmented; the first segmentation result includes a first bounding box that annotates the features in the point cloud. Semantic segmentation is performed on the optical image to obtain the image segmentation result. The optical image is an image obtained by taking a picture of the scanning area corresponding to the point cloud through an optical imaging device. A depth map is generated based on the spatial correspondence between the point cloud to be segmented and the optical image. The image segmentation result is mapped onto the depth map, and a second segmentation result of the point cloud to be segmented is determined based on the mapping result. The second segmentation result includes a second bounding box that annotates the corresponding elements in the point cloud. Based on the first segmentation result and the second segmentation result, the target semantic segmentation result of the point cloud to be segmented is determined; wherein, determining the target semantic segmentation result includes: fusing the first bounding box in the first segmentation result and the second bounding box at the corresponding position in the second segmentation result to obtain an optimized bounding box as the target semantic segmentation result.

2. The method according to claim 1, characterized in that, The first segmentation result and the second segmentation result each include bounding boxes annotated with the point cloud; The step of determining the target semantic segmentation result of the point cloud to be segmented based on the first segmentation result and the second segmentation result includes: Based on the overlapping area of the first bounding box in the first segmentation result and the second bounding box in the second segmentation result, a fused bounding box is determined, wherein the position of the first bounding box in the first segmentation result corresponds to the position of the second bounding box in the second segmentation result; By adjusting the orientation of the fused bounding box, the target orientation corresponding to the maximum number of point cloud points enclosed by the fused bounding box is determined. The adjusted fused bounding box is used as the bounding box in the target semantic segmentation result.

3. The method according to claim 2, characterized in that, Before using the adjusted fused bounding box as the bounding box in the target semantic segmentation result, the method further includes: Based on the point cloud density in the fused bounding box, the size of the fused bounding box is adjusted to a target size, wherein the point cloud density in the fused bounding box of the target size is greater than or equal to the point cloud density in the fused bounding box before adjustment.

4. The method according to claim 2 or 3, characterized in that, Before using the adjusted fused bounding box as the bounding box in the target semantic segmentation result, the method further includes: Based on the actual aspect ratio of the feature type marked in the merge bounding box, the size of the merge bounding box is adjusted so that the aspect ratio of the merge bounding box is the same as the actual aspect ratio.

5. A point cloud semantic segmentation device, characterized in that, include: The first segmentation module is used to perform multi-scale semantic segmentation on the point cloud to be segmented, and obtain point cloud segmentation results at at least two scales. The point cloud segmentation results at at least two scales are scale aligned, and the aligned point cloud segmentation results at each scale are weighted and fused to obtain the first segmentation result of the point cloud to be segmented; wherein, the multi-scale semantic segmentation includes voxelizing the point cloud to be segmented, and performing multiple downsampling and semantic segmentation based on the features of each voxel; the first segmentation result includes a first bounding box annotating the features in the point cloud. The second segmentation module is used to perform semantic segmentation on the optical image to obtain an image segmentation result. The optical image is an image obtained by taking a picture of the scanning area corresponding to the point cloud through an optical imaging device. A depth map is generated based on the spatial correspondence between the point cloud to be segmented and the optical image. The image segmentation result is mapped to the depth map, and a second segmentation result of the point cloud to be segmented is determined according to the mapping result. The second segmentation result includes a second bounding box that annotates the corresponding elements in the point cloud. The fusion module is used to determine the target semantic segmentation result of the point cloud to be segmented based on the first segmentation result and the second segmentation result; wherein, determining the target semantic segmentation result includes: fusing the first bounding box in the first segmentation result and the second bounding box at the corresponding position in the second segmentation result to obtain an optimized bounding box as the target semantic segmentation result.

6. The apparatus according to claim 5, characterized in that, The first segmentation result and the second segmentation result each include bounding boxes that annotate the point cloud; the fusion module is specifically used to determine a fused bounding box based on the overlapping area of the first bounding box in the first segmentation result and the second bounding box in the second segmentation result, wherein the position of the first bounding box in the first segmentation result corresponds to the position of the second bounding box in the second segmentation result; by adjusting the orientation of the fused bounding box, the target orientation corresponding to the maximum number of point cloud points enclosed by the fused bounding box is determined; The adjusted fused bounding box is used as the bounding box in the target semantic segmentation result.

7. The apparatus according to claim 6, characterized in that, The fusion module is further configured to adjust the size of the fusion bounding box to a target size based on the point cloud density in the fusion bounding box, wherein the point cloud density in the fusion bounding box of the target size is greater than or equal to the point cloud density in the fusion bounding box before adjustment.

8. The apparatus according to claim 6 or 7, characterized in that, The fusion module is further configured to adjust the size of the adjusted fusion bounding box according to the actual aspect ratio of the feature type marked by the fusion bounding box, so that the aspect ratio of the fusion bounding box is the same as the actual aspect ratio.

9. An electronic device, comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-4.

11. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1-4.