Devices, systems, methods, and media for adaptive augmentation of point cloud datasets for training

By injecting appropriate point cloud object instances into the point cloud dataset using adaptive augmentation techniques, the problems of insufficient sparsity and diversity in the point cloud dataset are solved, and the accuracy and generalization ability of the model in segmentation and object detection tasks are improved.

CN117015813BActive Publication Date: 2026-06-12HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2021-07-22
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing point cloud datasets suffer from sparsity, disorder, and lack of diversity when training segmentation and object detection models, resulting in insufficient accuracy of the models in prediction tasks, especially in the recognition of some uncommon object classes and scenes.

Method used

Adaptive enhancement techniques are employed, which selectively inject reasonable point cloud object instances into point cloud frames through confusion matrix analysis and scene dictionary generation strategies. Enhanced point cloud frames are generated by utilizing ground truth augmentation, random flipping, world scaling, and global translation noise techniques to improve the model's prediction accuracy.

🎯Benefits of technology

It improves the model's accuracy in recognizing diverse scenes and object categories, especially for the recognition of uncommon objects and scenes, reduces computational costs and time complexity, and enhances the model's generalization ability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117015813B_ABST
    Figure CN117015813B_ABST
Patent Text Reader

Abstract

Devices, systems, methods, and media are described for adaptive scene augmentation of point cloud frames contained in labeled point cloud datasets used to train machine learning models for prediction tasks such as object detection or segmentation of point cloud frames. A formalized approach is described for generating new point cloud frames from pre-existing annotated large-scale labeled point cloud frames contained in point cloud datasets to generate new augmented point cloud frames. Detailed quantitative metrics such as confusion matrices are used to generate a strategy for large-scale data augmentation. The strategy is a set of detailed, sequential rules, processes, and / or conditions that can be used to generate augmentation data specifically to reduce existing inaccuracies in a trained model. The augmented point cloud frames can then be used to further train the model to improve the prediction accuracy of the model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application generally relates to data augmentation for machine learning, and more particularly to devices, systems, methods, and media for adaptive augmentation of point cloud frames used to train models for prediction tasks such as segmentation and object detection. Background Technology

[0002] LiDAR (Light Detection and Ranging, also referred to herein as "Lidar" or "LIDAR") sensors generate point cloud data representing a three-dimensional (3D) environment (also called a "scene") scanned by the LiDAR sensor. A single scan of a LiDAR sensor generates a "frame" (hereinafter referred to as a "point cloud frame") of point cloud data, consisting of a set of points scanned by the laser from one or more points in space, over a period of time representing the time taken for the LiDAR sensor to perform one scan. Some LiDAR sensors, such as rotating scanning LiDAR sensors, include an array of lasers emitting light in an arc, with the LiDAR sensor rotating around a single location to generate point cloud frames; other LiDAR sensors, such as solid-state LiDAR, emit light from one or more locations and integrate the reflected light from each location to form a point cloud frame. Each laser in the laser array is used to generate multiple points during each scan, and each point corresponds to an object whose reflected light was emitted by the laser at a certain point in space within the environment. Each point is typically stored as a set of spatial coordinates (X, Y, Z) and other data indicating the intensity (i.e., the reflectivity of the object that caused the laser to reflect). In some implementations, the other data may be represented as an array of values. In a rotating scan LiDAR sensor, the Z-axis of the point cloud frame is typically defined by the rotation axis of the rotating scan LiDAR sensor; in most cases, this rotation axis is approximately orthogonal to the orientation of each laser (although some LiDAR sensors may tilt some lasers slightly upward or downward relative to a plane orthogonal to the rotation axis).

[0003] Point cloud frames can also be generated using other scanning technologies such as high-definition radar; theoretically, any technology that uses energy scanning beams, such as electromagnetic energy or acoustic energy, can be used to generate point cloud data.

[0004] LiDAR sensors are among the primary sensors used in autonomous vehicles to sense the surrounding environment (i.e., the scene) of the autonomous vehicle. Autonomous vehicles typically include an automated driving system (ADS) or an advanced driver-assistance system (ADAS). The ADS or ADAS includes a perception subsystem that processes point cloud frames to generate predictions that can be used by other modules or subsystems of the ADS or ADAS for vehicle localization, route planning, motion planning, or trajectory generation. However, due to the sparsity and unordered nature of point cloud frames, collecting and labeling them at the point level is both time-consuming and costly. Points in the point cloud frames must be aggregated, segmented, or grouped (e.g., using object detection, semantic segmentation, instance segmentation, or panoptic segmentation) so that the set of points in the point cloud frame can be labeled using object classes (e.g., "pedestrian" or "motorcycle") or instances of object classes (e.g., "pedestrian #3"). These labeled point cloud frames are used to train models for prediction tasks such as object detection or various types of segmentation. The cumbersome process of labeling point cloud frames results in limited availability of labeled point cloud frames that represent the diverse road and traffic scenarios required for training highly accurate models for prediction tasks using machine learning.

[0005] Examples of point cloud datasets that include labeled point cloud frames used to train models for prediction tasks such as segmentation and object detection include: the SemanticKITTI dataset (as described by J. Behley et al., “SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,” IEEE / CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 2019, pp. 9296-9306, doi:10.1109 / ICCV.2019.00939); and the KITTI360 dataset (as described by J. Xie, M. Kiefel, M. Sun, and A. Geiger, “Semantic Instance Annotation of Street Scenes by 3D to 2D Label”). Transfer), IEEE Computer Vision and Pattern Recognition (CVPR) Conference 2016, Las Vegas, Nevada, USA, 2016, pp. 3688-3697, doi:10.1109 / CVPR.2016.401.; Nuscenes-lidarseg dataset (as described by H. Caesar et al., “nuScenes: A Multimodal Dataset for Autonomous Driving”, IEEE / CVF Computer Vision and Pattern Recognition (CVPR) Conference 2020, Seattle, Washington, USA, 2020, pp. 11618-11628, doi:10.1109 / CVPR42600.2020.01164.). These datasets may be the only available datasets of point cloud frames with semantic information, i.e. point cloud frames labeled with semantic data, for training models for prediction tasks such as segmentation and object detection.

[0006] However, these available labeled point cloud datasets typically do not include enough point cloud frames to encompass all objects or classes used to train models for prediction tasks such as segmentation and object prediction, and these labeled point cloud datasets exhibit a lack of diversity in their point cloud frames. Some object classes appear only a limited number of times in the point cloud frames included in these labeled point cloud datasets, and these object classes typically do not appear simultaneously in a single point cloud frame. For example, in the SemanticKITTI dataset, cyclists and motorcyclists appear only simultaneously in a few point cloud frames out of 19,000 points used to train segmentation or object detection models using the SemanticKITTI dataset. Because the model does not receive enough point cloud frames including certain combinations of object classes during model training for prediction tasks, there is a lack of co-occurrence in the point cloud frames included in the SemanticKITTI dataset for class combinations, especially for dynamic objects such as vehicles and pedestrians. This negatively impacts the performance of models trained on the SemanticKITTI dataset for prediction tasks such as segmentation and object detection. Models for prediction tasks such as segmentation and object detection are trained on point cloud datasets that lack the required generality and diversity of intra-frame object class combinations. Such models tend to learn very similar features to identify two object classes (e.g., "bicycle" and "motorcycle") because object classes can have many common features. This limits the ability of such models to learn the discriminative features needed to classify objects into two distinct object classes.

[0007] Traditionally, the problem of sparse point cloud datasets for training models in prediction tasks such as segmentation and object detection has been addressed through data augmentation. Data augmentation can be viewed as the process of generating new training samples (e.g., new labeled point cloud frames) from labeled point cloud frames contained in an existing training dataset using any technique that can help improve model training for prediction tasks to achieve higher model accuracy (e.g., better predictions generated by the model). Existing data augmentation methods tend to focus on inserting images of objects into existing video frames (i.e., 2D images) and may include tasks such as rotation, translation, scaling, adding noise, and offsetting the image before insertion into the video frame. These tasks can be formalized and defined by "augmentation parameters," which are values ​​used to define the transformations performed on the image of the object when inserting it into an existing video frame. In the case of point cloud datasets, only a limited number of data augmentation techniques are known. Point augmentation (as described by Li, Ruihui, Li, Xianzhi, Heng, Pheng-Ann and Fu, Chi-Wing, 2020, “PointAugment: An Auto-Augmentation Framework for Point Cloud Classification”, pp. 6377-6386, 10.1109 / CVPR42600.2020.00641) aims to augment point cloud datasets used to train prediction tasks by generating new data samples (i.e., new point cloud frames) that are more difficult to detect at the feature level than the original samples (i.e., point cloud frames contained in the point cloud dataset). Another paper by Cheng et al. (Cheng, S., Leng, Z., Cubuk, ED., Zoph, B., Bai, C., Ngiam, J., Song, Y., Caine, B., Vasudevan, V., Li, C., Le, QV, Shlens, J., and Anguellov, D., (2020), "Improving 3DObject Detection through Progressive Population Based Augmentation", ArXiv, abs / 2004.00831) describes data augmentation for large-scale point cloud datasets. It utilizes an optimization task to find the optimal augmentation parameters to inject point cloud object instances into randomly selected point cloud frames, at the cost of training a large number of models to explore and utilize different augmentation parameter values.

[0008] These existing data augmentation techniques focus on optimizing data augmentation parameters when injecting new objects into large-scale point clouds (i.e., point cloud frames). These techniques do not adequately consider other metrics such as scene understanding (i.e., the model's accuracy in understanding and deciphering the scene) or the presence of scenes and objects; the choice of which objects to inject into which point cloud frames and where to inject them within those frames is largely independent of the specific needs of the model being trained. Furthermore, the techniques described by Cheng et al. are highly computationally intensive because they require training a large number of models to optimize the augmentation parameters.

[0009] Therefore, data augmentation techniques for point cloud datasets used to train models for prediction tasks are needed to overcome one or more limitations of the existing methods mentioned above. Summary of the Invention

[0010] This invention describes devices, systems, methods, and media for adaptively enhancing point cloud frames used to train machine learning models for prediction tasks such as segmentation and object detection. The examples described herein provide a formal method for generating enhanced point cloud frames from a pre-existing, annotated, large-scale point cloud dataset, which includes point cloud frames labeled with scene information and / or containing labeled point cloud object instances. The enhanced point cloud frames can then be used to train models for prediction tasks such as segmentation or object detection to improve the accuracy of the models.

[0011] Examples of the methods and systems described in this paper can use detailed quantitative metrics such as confusion matrices to generate strategies for large-scale data augmentation of point cloud frames. The strategy (referred to as the “master strategy”) can be a detailed, progressive set of rules, procedures, and / or conditions used to generate augmented point cloud frames specifically designed to reduce existing inaccuracies in predictions output by a trained model (i.e., a model trained using a point cloud dataset). An exemplary method may include the following general steps:

[0012] 1. Perform confusion analysis on the predictions of the model using quantitative analysis (e.g., based on a confusion matrix) to identify which object classes and scene types may confuse the model due to the lack of labeled point cloud frames that include the object classes and / or scene types.

[0013] 2. Using the information from step 1 and other relevant prior knowledge, generate three sub-policies of the main policy to answer the following questions:

[0014] 2a. Which object instances should be injected into the point cloud frames of the point cloud dataset?

[0015] 2b. In which scene types should the object instance be injected? At what location within that scene should the object instance be injected?

[0016] 2c. Which other object classes (if any) should be injected along with the injected instance?

[0017] 3. The generation system uses the information from steps 1 and 2 above to apply a set of object instance transformation techniques (referred to as "sub-strategies") to inject selected object instances into selected point cloud frames, thereby generating enhanced point cloud frames. These enhanced point cloud frames can be added to the point cloud dataset used to retrain the prediction task model, thereby reducing existing prediction inaccuracies exhibited by predictions output by the model trained using a previous point cloud dataset (e.g., the point cloud dataset excluding the enhanced point cloud frames).

[0018] As described above, existing data augmentation techniques focus on optimizing augmentation parameters, which define the transformations performed on the object when it is inserted into an existing point cloud frame, such as rotation, translation, and scaling. Object instance transformation techniques (i.e., the sub-strategies used in the examples described herein) may include:

[0019] 1. Ground Truth Enhancement: Adds two or more point cloud object instances of the same object together.

[0020] 2. Random Flip: Flip point cloud object instances, for example, flip them horizontally.

[0021] 3. World Scaling: Scaling the size of the point cloud object instance.

[0022] 4. Global translation noise: The point cloud object instance is translated to different positions.

[0023] 5. Frustum Loss: Removes the visible surface area of ​​a point cloud object instance, for example, to simulate partial occlusion.

[0024] 6. Frustum noise: Randomly perturbs the position of points in the point cloud object instance, for example, to simulate slightly different surface details.

[0025] 7. Random Rotation: Rotate the point cloud object instance around a certain axis.

[0026] 8. Randomly discarded points of point cloud object instances: Delete a randomly selected subset of points from the point cloud object instance, for example, to simulate low-resolution scanning.

[0027] Existing data augmentation techniques for point cloud datasets typically generate object instances with limited variability, use augmentation parameters to transform the object instances, and inject the object instances into random locations within the point cloud dataset (e.g., into random locations in and / or within randomly selected frames).

[0028] The examples described herein can use existing techniques (e.g., secondary strategies) to generate plausible, realistic point cloud object instances for injection into point cloud frames. However, the methods and systems described herein provide techniques for generating a master strategy that applies existing secondary strategies in a systematic, global manner to address data augmentation problems by identifying specific model inaccuracies and generating augmented point cloud data designed to reduce these inaccuracies.

[0029] When collecting and labeling large-scale point cloud datasets to train models for prediction tasks such as segmentation or object detection, it is difficult to collect point cloud frames representing all possible scenes in the real world. This is mainly because point cloud frames are recorded for a limited time and remain unchanged for a long time after labeling. However, the essence of a good model is its ability to generalize to scenarios, such as performing segmentation or object detection on as many real-world scenes as possible. The examples described in this paper can alleviate the following problems, for example:

[0030] 1. Point cloud frames lacking labels for special situations (i.e., situations that are infrequent and centrally missing from the point cloud dataset).

[0031] 2. There are missing point cloud frames in which multiple objects appear simultaneously (e.g., cars and animals do not appear simultaneously in any point cloud frame in most existing labeled point cloud datasets).

[0032] 3. Most point cloud frames lack specific objects (e.g., in point cloud datasets used to train models for prediction tasks in autonomous driving applications, there are not many instances of trucks or buses).

[0033] 4. It is difficult to train generalized machine learning models for prediction tasks such as segmentation or object detection (e.g., using deep learning).

[0034] Furthermore, compared to existing data augmentation techniques, some of the examples described in this paper can demonstrate the following advantages, for example:

[0035] 1. The exemplary embodiments of the method of the present invention may operate using only pre-existing labeled point cloud frames (e.g., frames and object instances generated by LiDAR sensors), but may also be used to augment existing point cloud frames with additional sources of point cloud object instances from other point cloud datasets when needed. For example, point cloud object instances obtained by LiDAR sensors outside of an autonomous driving environment (e.g., point cloud object instances of an animal body) may be used to generate additional point cloud object instances for injection to augment the autonomous driving dataset, thereby further improving the model's accuracy in identifying objects that are not frequently encountered in a given application domain (e.g., autonomous driving).

[0036] 2. Unlike some existing data augmentation techniques that use graphical simulation to generate the entire scene, the example of the method of the present invention can reuse scene and background information contained in existing point cloud frames. By using pre-existing point cloud frames based on actual scene scans performed by sensors (e.g., LiDAR sensors), a better world model can be provided, thereby improving the predictive accuracy of the model.

[0037] 3. By adjusting the primary and secondary strategies, the examples of the methods of the present invention described herein can handle many extreme cases that are critical to the prediction task of the model (e.g., placing a car at an intersection, modeling a distant scene, etc.).

[0038] 4. Examples of the methods of the present invention described herein can provide a low-cost, fast, and simple technique to enhance point cloud frames used for training prediction tasks using available quantitative metrics (e.g., confusion matrices).

[0039] 5. A model can be trained using a point cloud dataset including augmented point cloud frames through the exemplary method described herein; the model can be improved not only in terms of the overall prediction accuracy of the model, but also in the prediction domain identified as having problems or defects, by using a specific uncertainty in the predictions of the model's output, rather than an average metric (e.g., the global mean intersection-over-union (mIoU)) that does not identify specific potential defects in the point cloud dataset used to train the model.

[0040] In this invention, the term "LIDAR" (also known as "LiDAR" or "Lidar") refers to a lidar, a sensing technology in which a sensor emits a light beam and determines the location and potential other features of a light-reflecting object in the surrounding environment based on the reflected light received from the object.

[0041] In this invention, the term "point cloud object instance," or simply "object instance" or "instance," refers to a point cloud containing a single definable instance of an object such as a car, house, or pedestrian that can be defined as a single object. For example, a road is typically not an object instance; instead, a road can be defined within a point cloud frame as a scene type or region that defines the frame.

[0042] In this invention, the term "injection" refers to the process of adding a point cloud object instance to a point cloud frame.

[0043] In this invention, unless otherwise stated, the term "frame" refers to a point cloud frame.

[0044] In some aspects, the present invention describes a non-transitory processor-readable medium having instructions tangibly stored thereon. When executed by a processor device, the instructions cause the processor device to perform the method steps described above.

[0045] According to one aspect of the present invention, a method for enhancing a point cloud dataset is provided, the point cloud dataset comprising a plurality of point cloud frames and a plurality of labeled point cloud object instances. The method includes several steps: obtaining prediction accuracy information of a machine learning model trained to perform a prediction task using the point cloud dataset; the prediction accuracy information indicating: the prediction accuracy generated by the machine learning model; and a defect object class; obtaining a scene dictionary based on a plurality of labels of the plurality of labeled point cloud object instances and the plurality of point cloud frames; processing the prediction accuracy information and the scene dictionary to generate a master policy; applying the master policy to the point cloud dataset to identify: a target point cloud frame among the plurality of point cloud frames; and a target point cloud object instance selected from the plurality of labeled point cloud object instances, the target point cloud object instance being labeled using the defect object class; and generating an enhanced point cloud frame by injecting the target point cloud object instance into the target point cloud frame.

[0046] According to another aspect of the present invention, a system for enhancing a point cloud dataset is provided, the point cloud dataset comprising a plurality of point cloud frames and a plurality of labeled point cloud object instances. The system includes: a processor device; and a memory storing machine-executable instructions, which, when executed by the processor device, cause the system to perform a plurality of steps. These steps include: acquiring prediction accuracy information of a machine learning model trained to perform a prediction task using the point cloud dataset; the prediction accuracy information indicating: the prediction accuracy generated by the machine learning model; and a defect object class; acquiring a scene dictionary based on a plurality of labels of the plurality of labeled point cloud object instances and the plurality of point cloud frames; processing the prediction accuracy information and the scene dictionary to generate a master policy; applying the master policy to the point cloud dataset to identify: a target point cloud frame among the plurality of point cloud frames; and a target point cloud object instance selected from the plurality of labeled point cloud object instances, the target point cloud object instance being labeled using the defect object class; and generating an enhanced point cloud frame by injecting the target point cloud object instance into the target point cloud frame.

[0047] In some exemplary aspects of the method and the system, the plurality of point cloud frames may include instances of the plurality of marked point cloud objects.

[0048] In some exemplary aspects of the method and system, obtaining the prediction accuracy information may include: obtaining a plurality of ground truth object class labels as input to the machine learning model; and for each of the plurality of ground truth object class labels, determining the distribution of the predicted object class labels. The prediction accuracy information may include the distribution of the predicted object class labels for each ground truth object class label.

[0049] In some exemplary aspects of the method and the system, the prediction accuracy information may include at least one confusion matrix indicating the distribution of the predicted object class label for at least one ground truth object class label among the plurality of ground truth object class labels.

[0050] In some exemplary aspects of the method and system, the master policy may also indicate a second defect object class. Applying the master policy may also identify a second target object instance selected from the plurality of labeled point cloud object instances. The second target object instance may be labeled using the second defect object class. Generating the enhanced point cloud frame may include injecting the target object instance and the second target object instance into the target point cloud frame.

[0051] In some exemplary aspects of the method and the system, injecting the target point cloud object instance into the target point cloud frame includes: using the main strategy to identify the position in the target point cloud frame; and injecting the target point cloud object instance into the target point cloud frame at the position.

[0052] In some exemplary aspects of the method and the system, injecting the target point cloud object instance into the target point cloud frame includes: using one or more sub-strategies to transform the target point cloud object instance to generate a transformed point cloud object instance; and injecting the transformed point cloud object instance into the target point cloud frame.

[0053] In some exemplary aspects of the method and system, processing the prediction accuracy information and the scene dictionary to generate the master policy includes: processing the class-specific confusion matrix and the scene dictionary to generate the master policy. The master policy may indicate multiple defect object classes; for each of the multiple defect object classes, the master policy may include an injection condition under which a point cloud object instance of the defect object class should be injected into a given point cloud frame. Applying the master policy to the point cloud dataset to identify the target point cloud frame includes: determining whether the target point cloud frame satisfies the injection condition of the master policy relative to the target object class.

[0054] In some examples, the method may further include: after using the enhanced point cloud frame to further train the machine learning model, repeating the following steps once or more: obtaining the prediction accuracy information; generating the master policy; identifying the target point cloud frame; identifying the target object instance; generating the enhanced point cloud frame; and further training the machine learning model.

[0055] In some exemplary aspects of the method and system, after the enhanced point cloud frame is generated, the enhanced point cloud frame can be used to further train the machine learning model.

[0056] In some exemplary aspects of the method and system, after training the machine learning model using the enhanced point cloud frame, the following steps may be repeated once or multiple times: obtaining the prediction accuracy information; generating the master policy; identifying the target point cloud frame; identifying the target object instance; generating the enhanced point cloud frame; and further training the machine learning model.

[0057] According to another aspect of the invention, a non-transient processor-readable medium is provided on which a point cloud dataset is stored, the point cloud dataset comprising one or more enhanced point cloud frames generated by various aspects of the methods described above.

[0058] According to another aspect of the present invention, a non-transitory processor-readable medium is provided, on which machine-executable instructions are stored, which, when executed by a processor device of a device, cause the device to perform multiple steps. These steps include: acquiring prediction accuracy information of a machine learning model trained to perform a prediction task using a point cloud dataset; the prediction accuracy information indicating: the prediction accuracy generated by the machine learning model; and a defect object class; acquiring a scene dictionary based on multiple labels of multiple labeled point cloud object instances and the multiple point cloud frames; processing the prediction accuracy information and the scene dictionary to generate a master policy; applying the master policy to the point cloud dataset to identify: a target point cloud frame among the multiple point cloud frames; and a target point cloud object instance selected from the multiple labeled point cloud object instances, the target point cloud object instance being labeled using the defect object class; and generating an enhanced point cloud frame by injecting the target point cloud object instance into the target point cloud frame. Attached Figure Description

[0059] The accompanying drawings, which now illustrate exemplary embodiments of this application, are shown by way of example, wherein:

[0060] Figure 1 A top-right front perspective view of an exemplary simplified point cloud frame is shown, providing an operational context for the embodiments described herein;

[0061] Figure 2 A block diagram of some components of an exemplary system for augmenting labeled point cloud data, as described in the examples herein, is shown.

[0062] Figure 3 It shows Figure 2 A flowchart illustrating the operation of the scene enhancement module shown.

[0063] Figure 4 It shows that it can be made by Figure 3 A flowchart illustrating the steps of an exemplary method for enhancing labeled point cloud data executed by the scene enhancement module;

[0064] Figure 5 It shows Figure 4 The flowchart shows the sub-step of the main strategy generation step 414.

[0065] The same reference numerals may be used to denote the same components in different figures. Detailed Implementation

[0066] This invention describes exemplary devices, systems, methods, and media for training machine learning models to perform adaptive scene enhancement for point cloud segmentation and / or object detection.

[0067] Figure 1An exemplary simplified point cloud frame 100 is shown, in which points are mapped to a three-dimensional coordinate system 102X,Y, andZ, where the Z dimension extends upwards and is typically defined by the rotation axis of the LIDAR sensor or other panoramic sensor that generated the point cloud frame 100. The point cloud frame 100 includes a plurality of points, each of which can be represented by a vector of a set of coordinates (x, y, z) within the frame 100 and other values ​​(e.g., intensity values ​​indicating the reflectivity of the object corresponding to the point). Each point represents the reflection of a point of the laser in space relative to the LIDAR sensor corresponding to the point coordinates. Although the exemplary point cloud frame 100 is shown as a frame or rectangular prism, it should be understood that a typical point cloud frame captured by a panoramic LIDAR sensor is typically a 360-degree panoramic view of the environment surrounding the LIDAR sensor, extending to the entire detection range of the LIDAR sensor's laser. Therefore, more typically, the exemplary point cloud frame 100 is a small fraction of the actual point cloud frame generated by the LIDAR sensor and is used for illustrative purposes.

[0068] The points in the point cloud frame 100 are clustered in space, where objects in the environment reflect the laser of the LIDAR sensor, thereby forming point clusters corresponding to the surfaces of objects visible to the LIDAR. A first point cluster 112 corresponds to the reflection of a car. In the exemplary point cloud frame 100, the first point cluster 112 is surrounded by a bounding box 122 and associated with an object class label (in this case, label "car" 132). A second point cluster 114 is surrounded by a bounding box 122 and associated with an object class label "cyclist" 134; a third point cluster 116 is surrounded by a bounding box 122 and associated with an object class label "pedestrian" 136. Therefore, each point cluster (112, 114, 116) corresponds to an object instance: the object class instances "car," "cyclist," and "pedestrian," respectively. The entire point cloud frame 100 is associated with scene type label 140 "intersection", which indicates that the point cloud frame 100 as a whole corresponds to the environment near the intersection (therefore, cars, pedestrians and cyclists are close to each other).

[0069] In some examples, a single point cloud frame may include multiple scenes, each of which may be associated with a different scene type label 140. Therefore, a single point cloud frame can be segmented into multiple scenes, each associated with its own scene type label 140. This document will generally describe exemplary embodiments in conjunction with a single frame associated only with a single scene type; however, it should be understood that some embodiments may use the data augmentation methods and systems described herein to consider each scene in a frame used to inject point cloud object instances individually.

[0070] The size and position of each bounding box 122 are determined, each object label (132, 134, 136) is associated with each point cluster (i.e., a cluster of points), and scene labels are associated with the point cloud frame 100 using data labeling techniques known in the field of machine learning to generate labeled point cloud frames, which may be included in a point cloud dataset used to train a model for a prediction task (e.g., segmentation or object detection) using machine learning. As mentioned above, these labeling techniques are typically time- and resource-intensive; in some examples, the data augmentation methods and systems described herein can be used to increase the number of labeled point cloud object instances within the point cloud frame 100, thereby reducing the time and resources required to manually identify and label point cloud object instances in the point cloud frame.

[0071] Figure 1 The labels and bounding boxes of the exemplary point cloud frame 100 shown correspond to labels applied in the context of prediction tasks such as instance segmentation or object detection. Therefore, the point cloud frame 100 can be included in a point cloud dataset used to train an object detection model (e.g., a model that receives point cloud frames as input and predicts the object class of objects in the point cloud frames). However, the data augmentation methods and systems described herein are equally applicable to point cloud frames included in point cloud datasets used not only to train object detection models but also to train segmentation models, including semantic segmentation models, instance segmentation models, or panoptic segmentation models.

[0072] Figure 2 A block diagram of a computing system 200 (hereinafter referred to as system 200) for enhancing a point cloud dataset including point cloud frames is shown. Although exemplary embodiments of system 200 are shown and discussed below, other embodiments may be used to implement the examples disclosed herein, which may include components different from those shown. Figure 2 A single instance of each component of the system 200 is shown, but each component shown may have multiple instances.

[0073] The system 200 includes one or more processors 202, such as a central processing unit, microprocessor, application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), dedicated logic circuit, tensor processing unit, neural processing unit, dedicated artificial intelligence processing unit, or a combination thereof. The one or more processors 202 may be collectively referred to as a "processor device" or "processor 202".

[0074] The system 200 includes one or more memories 208 (collectively referred to as "memory 208"), which may include volatile or non-volatile memories (e.g., flash memory, random access memory (RAM), and / or read-only memory (ROM)). The non-volatile memory 208 may store machine-executable instructions that are executed by the processor 202. A set of machine-executable instructions 220 defining the scene enhancement module 300 is shown stored in the memory 208, and this set of machine-executable instructions 220 can be executed by the processor 202 to perform the steps of the method for data augmentation of a point cloud dataset described herein. The following is in conjunction with... Figure 3 The system 200 is described as performing operations defined by the set of machine-executable instructions 220 defining the scene enhancement module 300. The set of machine-executable instructions 220 defining the scene enhancement module 300 can be executed by the processor 202 to perform the functions of each of its subsystems (312, 314, 316, 318, 320, 322). The memory 208 may include other machine-executable instructions executed by the processor 202, such as machine-executable instructions for implementing operating systems and other applications or functions.

[0075] The memory 208 stores the point cloud dataset 210. The point cloud dataset 210 includes multiple point cloud frames 212 and multiple marked point cloud object instances 214, as described above. Figure 1 In some embodiments, some or all of the marked point cloud object instances 214 are included in and / or derived from the point cloud frame 212: for example, each point cloud frame 212 may include zero or more marked point cloud object instances 214, as described above. Figure 1 In some embodiments, some or all of the marked point cloud object instances 214 are stored separately from the point cloud frame 212, and each marked point cloud object instance 214 may or may not originate from one of the point cloud frames 212.

[0076] The memory 208 may also store other data, information, rules, strategies, and machine-executable instructions described herein, including: a machine learning model 224 for prediction tasks such as segmentation or object detection, a main strategy 222, a target point cloud frame 226, a target point cloud object instance 228, a transformed point cloud object instance 232, and an enhanced point cloud frame 230; and the following in conjunction with... Figure 3 The data and information described Figure 2 (Not shown in the image), including prediction data 302, prediction accuracy information 304, and scene dictionary 306.

[0077] In some examples, the system 200 may further include one or more electronic storage units (not shown), such as solid-state drives, hard disk drives, disk drives, and / or optical disk drives. In some examples, one or more datasets and / or modules may be provided by external memory (e.g., an external drive that communicates with the system 200 via wired or wireless communication) or by transient or non-transient computer-readable media. Examples of non-transient computer-readable media include RAM, ROM, erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, CD-ROM, or other portable storage. The storage units and / or external memory may be used in conjunction with the memory 208 to implement the data storage, retrieval, and caching functions of the system 200.

[0078] For example, components of system 200 may communicate with each other via a bus. In some embodiments, system 200 is a distributed computing system and may include multiple computing devices communicating with each other via a network, as well as (optionally) one or more additional components. In some embodiments, the various operations described herein may be performed by different computing devices of the distributed system. In some embodiments, system 200 is a virtual machine provided by a cloud computing platform.

[0079] Figure 3 A block diagram illustrating an exemplary embodiment of the scene enhancement module 300 is shown. Machine-executable instructions defining the scene enhancement module 300 are executable by the processor 202 of the system 200. The illustrated embodiment of the scene enhancement module 300 includes several functional submodules or subsystems: a confusion analysis subsystem 312, a policy generation subsystem 314, a sample selection subsystem 316, a transformation subsystem 318, an instance injection subsystem 320, and a dictionary generation subsystem 322. In other exemplary embodiments of the scene enhancement module 300, one or more of the subsystems (312, 314, 316, 318, 320, and 322) may be combined, split into multiple subsystems, and / or one or more of their functions or operations may be redistributed among other subsystems. Some exemplary embodiments of the scene enhancement module 300 may include additional operations or submodules, or one or more of the illustrated subsystems (312, 314, 316, 318, 320, and 322) may be omitted.

[0080] Now we will combine Figure 4 The exemplary point cloud data augmentation method 400 shown is described below. Figure 3The operation of the various subsystems (312, 314, 316, 318, 320 and 322) of the scene enhancement module 300 shown.

[0081] Figure 4 A flowchart illustrating an example of the steps of the adaptive data augmentation method 400 (hereinafter referred to as method 400) proposed in this invention is shown. As described above, the steps of method 400 are comprised of various subsystems of the scene augmentation module 300 and Figure 3 The other subsystems shown are executed. However, it should be understood that the method 400 can be executed using any suitable information processing technology.

[0082] The method 400 begins at step 402. A dictionary generation subsystem 322 generates a scene dictionary 306 based on the point cloud dataset 210. In some embodiments, the scene dictionary 306 is a list of scene types, each scene type being associated with a list of objects that may or should appear in a given scene type to ensure appropriate diversity in the training point cloud dataset. The content of the scene dictionary 306 can be generated by the dictionary generation subsystem 322 by compiling a list of object classes represented in the tagged point cloud object instances 214 present in each point cloud frame 212 of the given scene type for each scene type. Table 1 below shows a simplified truncated example of the scene dictionary 306 generated from the point cloud dataset 210, where the object list for a given scene type is used for an autonomous vehicle environment:

[0083] Table 1: Example Scene Dictionary

[0084] Scene type Dynamic object classes that may appear in the scene Freedom Highway All objects intersection Pedestrians, cyclists, cars, trucks Free pedestrian walkway Cyclists, pedestrians Two-way highway Cars, trucks, buses parking spaces All objects highway Cars and trucks …… ……

[0085] In step 404, the obfuscation analysis subsystem 312 performs obfuscation analysis on the machine learning model 224.

[0086] The machine learning model 224 can be an artificial neural network or other models trained using a training dataset and machine learning techniques (e.g., supervised learning) to perform a prediction task on a point cloud frame. The prediction task can be any prediction task, such as object detection (e.g., identifying objects in the point cloud frame and classifying the identified objects (i.e., predicting the object class label for each identified object in the point cloud frame)) or segmentation (e.g., segmenting the point cloud frame by object class), including semantic segmentation, instance segmentation, or panoptic segmentation. In some embodiments, the machine learning model 224 is trained using the point cloud dataset 210 as a training dataset: that is, the machine learning model 224 is trained using the point cloud dataset 210 and a supervised learning algorithm to perform a prediction task for point cloud frames, such as performing object detection or segmentation on the point cloud frames 212, thereby generating a prediction 302 for each object identified in the point cloud frame 212, including a predicted object class label and / or a predicted instance label, to be associated with zero or more subsets or clusters of points within each point cloud frame 212, wherein the predicted object class and / or instance label is associated with a point cloud object instance 214 for each label in a given point cloud frame 212, which is used to generate a strategy that will be described in further detail below. In other embodiments, the machine learning model 224 is trained using different training datasets and supervised learning algorithms. However, it should be understood that the systems and methods described herein for augmenting labeled point cloud frames can exhibit unique advantages when used to augment point cloud frames 212 contained in the point cloud dataset 210 (used to train the machine learning model 224): the data augmentation systems and methods described herein can augment point cloud frames 212 to address specific defects identified in the point cloud dataset 210 that cause specific inaccuracies in the performance of the machine learning model 224, which has been trained using the point cloud dataset 210, such that retraining the machine learning model 224 using augmented point cloud frames (or augmented point cloud datasets that include point cloud frames 212 and the augmented point cloud frames) can compensate for those specific inaccuracies in the accuracy of the machine learning model 224 in predicting labels (i.e., object class labels and / or instance labels).

[0087] Step 404 includes sub-step 406, wherein the confusion analysis subsystem 312 generates the prediction accuracy information 304, indicating the accuracy of the predictions (i.e., prediction labels) generated by the machine learning model 224, and indicating at least one defective object class. A defective object class refers to the object class for which the machine learning model 224 did not generate an accurate prediction 302, for example, because the machine learning model 224 generated too many false negative predictions and / or too many false positive predictions when identifying or otherwise identifying instances of the object class.

[0088] Sub-step 406 includes further sub-steps 408 and 410. In 408, the machine learning model 224 generates a prediction 302, which the confusion analysis subsystem 312 uses to generate the prediction accuracy information 304. In some embodiments, in sub-step 408, the prediction 302 may be retrieved or otherwise obtained (e.g., from the memory 208) without being generated by the machine learning model 224 as part of the method 400. For example, when the prediction task of the machine learning model 224 is object detection, the prediction 302 may be a predicted object class label, an instance class label of each object detected in a point cloud frame, or a 3D bounding box associated with the object class label (e.g., ...). Figure 1 (Example of object detection shown). When the prediction task of the machine learning model 224 is segmentation, the prediction 302 can be a point cloud frame segmentation prediction, such as object class tags associated with each point in the point cloud frame (for semantic segmentation), object class tags and instance identifier tags associated with each point in the point cloud frame (for panoptic segmentation), or instance identifier tags associated with each point in the point cloud frame (for instance segmentation). The confusion analysis subsystem 312 compares the prediction data 302 generated by the machine learning model 224 for each corresponding point cloud frame 212 input to the machine learning model 224 with the ground truth labels of the corresponding point cloud frame 212 to generate a confusion matrix and other prediction accuracy information 304, which indicates the accuracy of the prediction 302 generated by the machine learning model 224. The confusion matrix is ​​an error matrix that cross-references the predicted object class tags and pre-existing ground truth labels associated with point cloud object instances in the point cloud frame 212 contained in the prediction 302 generated by the machine learning model 224 on one or more point cloud frames 212. Therefore, the confusion matrix indicates the distribution of predicted object class labels for each ground truth object class label (i.e., the ground truth label of the point cloud frame 212) input to the machine learning model 224. An exemplary confusion matrix is ​​shown below:

[0089] Table 2: Exemplary Confusion Matrix

[0090]

[0091] For a fully accurate machine learning model 224, the confusion analysis subsystem 312 generates prediction information 304 corresponding to a confusion matrix, wherein only the diagonal cells of the matrix have non-zero values; that is, for each object class, all ground truth "car" tags are predicted as "car" tags, and so on. However, in the example shown above, the machine learning model 224 exhibits poor accuracy: for example, out of 81 "car" ground truth tags associated with point cloud object instances, the machine learning model 224 only predicts 75 points in the point cloud frame as "car" tags, 2 points as "cyclist" tags, and 4 points as "motorcycle rider" tags. However, it can be observed that the greatest degree of inaccuracy and confusion exists between the "cyclist" and "motorcycle rider" classes, with the machine learning model 224 typically misidentifying both. The probability that the object instance "cyclist" is incorrectly identified as "motorcycle rider" is (8 / 23 = 34.7%), and the probability that the object instance "motorcycle rider" is incorrectly identified as "cyclist" is (4 / 10 = 40%). This inaccuracy may be due to the following reasons: the training dataset used to train the machine learning model 224 contains too few instances of the object classes "cyclist" and / or "motorcycle rider"; or instances of the two object classes ("cyclist" and "motorcycle rider") appear in too few point cloud frames in the context that helps train the machine learning model 224 to distinguish between the two object classes. The adaptive data augmentation method and system described herein enable the systematic and automatic analysis of specific obfuscation patterns exhibited by a machine learning model 224 trained on a specific point cloud dataset, thereby generating a master policy to augment specific point cloud frames in the point cloud dataset by injecting one or more instances of a specific object class into the point cloud frame at a specific location within the point cloud frame using a specific transformation of injected instances of a specific object class. This reduces or eliminates the obfuscation patterns exhibited by the machine learning model 224 after further training the machine learning model 224 using the augmented point cloud frame (or augmented training dataset, including the point cloud dataset 212 and the augmented point cloud frame).

[0092] The number of instances of each object class shown in the confusion matrix above is for illustrative purposes only. This number is greater than the number of instances of each object class typically found in a single point cloud frame, and less than the number typically found in the entire point cloud dataset used to train the machine learning model 224. Instead, the number of instances can be considered as representing a portion of the point cloud frames in the point cloud dataset used to train the machine learning model 224, such as a single driving sequence (i.e., a sequence of point cloud frames captured by an onboard LiDAR sensor during a driving session).

[0093] In step 410, the confusion analysis subsystem 312 identifies one or more defective object classes and indicates the defective object classes identified in the prediction accuracy information 304. The confusion analysis subsystem 312 can use the overall confusion matrix of the machine learning model 224 to identify the one or more defective object classes, generate a class-specific confusion matrix for each defective object class based on the overall confusion matrix, and include the class-specific confusion matrix as part of the prediction accuracy information 304. In some embodiments, a class-specific confusion matrix M is generated for each object class whose prediction accuracy is below a precision threshold. cTherefore, the confusion analysis subsystem 312 identifies the class-specific confusion matrix as a defective object class. Similarly, the defective object class can be identified based on the elements of the confusion matrix, i.e., the distribution of predicted object class labels corresponding to the ground truth object class labels of the defective object class. Prediction accuracy indicates the accuracy of the predictions generated by the machine learning model 224. Prediction accuracy can be measured by the mean intersection over union (mIoU) for a given object class, which can be calculated as the quotient of the intersection of ground truth labels and predicted labels (i.e., the total number of object instances in the point cloud dataset 210 whose ground truth labels and predicted labels are of the same object class) and the intersection of ground truth labels and predicted labels (i.e., the total number of object instances in the point cloud dataset 210 with predicted labels or ground truth labels of the same object class). The mIoU can also be expressed as (the number of true positive predictions for the object class) / (the number of true positive predictions + the number of false positive predictions for the object class + the number of false negative predictions for the object class). Therefore, in the simplified exemplary confusion matrix in Table 2, the mIoU of the object class "cyclist" is (15 / (15+2+4+1+8)) = 0.50, while the mIoU of the object class "motorcycle rider" is (6 / (6+8+4+4+0)) = 0.27, and the mIoU of the object class "car" is (75 / (75+1+0+2+4)) = 0.91. The average IoU can be the average intersection-union ratio calculated over multiple point cloud datasets or subsets of point cloud datasets used for verification, for example, the average intersection-union ratio calculated for each point cloud frame 212 in the point cloud dataset 210.

[0094] For each class c∈C in the overall confusion matrix CM, the confusion analysis subsystem 312 can determine whether to generate the class-specific confusion matrix CM. c (i.e., the confusion matrix for a specific object class). If the mIoU or other accuracy measure of the class is... c Below the accuracy threshold th c Then a class-specific confusion matrix CM is generated. c As the top N elements of the confusion matrix associated with object class c. Therefore, in the simplified exemplary confusion matrix in Table 2, the accuracy threshold th c The value can be defined as 0.80, and the number of elements N in the confusion matrix can be defined as 1. In this example, class-specific confusion matrices M will be generated for the object classes "cyclist" and "motorcycle rider" respectively. bicyclist and M motorcyclistThis is because the mIoU of both object classes is below 0.80. The top confusion matrix element for the object class "cyclist," calculated based on the number of false negative and / or false positive predictions between "cyclist" and other object classes, will be "motorcycle rider" (with 8 false negative and 4 false positive predictions, more than the object class "car"); similarly, the top confusion matrix element for the object class "motorcycle rider" will be "cyclist" (with 4 false negative and 8 false positive predictions). Therefore, the class-specific confusion matrix of the defective object class can indicate the distribution of the predicted object class labels corresponding to the ground truth object class labels of the defective object class.

[0095] After generating the class-specific confusion matrix, the confusion analysis subsystem 312 can store each defective object class c and its corresponding class-specific confusion matrix CM in the memory 208. c .

[0096] In some embodiments, the prediction accuracy information 304 may include other information indicating the accuracy of the prediction 302 generated by the machine learning model 224.

[0097] See you again Figure 4 and Figure 3 In step 412, the strategy generation subsystem 314 processes the prediction accuracy information 304 and the scene dictionary 306 to generate the main strategy 222. In step 414, the main strategy 222 is generated, including... Figure 5 The details of the multiple sub-steps are shown in the diagram.

[0098] Figure 5 The sub-steps of the main policy generation step 414 in method 400 are shown. The main policy 222 is generated through three steps, each generating a different sub-policy. In this document, the main policy 222 can be referred to as π, the first sub-policy (referred to as the target object sub-policy) can be referred to as π1, the second sub-policy (referred to as the target frame sub-policy) can be referred to as π2, and the third sub-policy (referred to as the additional target object sub-policy) can be referred to as π3.

[0099] First, in step 502, the policy generation subsystem 314 generates the target object sub-policy π1. Step 502 includes sub-steps 504 and 506. In 504, defective object classes are identified in the prediction accuracy information 304. If the accuracy (e.g., mIoU) of an object class is lower than an accuracy threshold, the target object sub-policy identifies the object class as a defective object class and indicates this by setting a policy parameter value: for example, if mIoU... c <th c Then π1,c =True. In 506, for each defective object class identified in 504, obtain the class-specific obfuscation matrix CM of said defective object class c. c For example, it can be retrieved from the memory 208. It should be understood that in some embodiments, sub-steps 504 and 410 of method 400 can be combined or otherwise coordinated. For example, a list of defective object classes can be identified in step 410 and obtained in sub-step 504 to avoid recalculating the accuracy of each object class. c .

[0100] Secondly, in step 508, the policy generation subsystem 314 generates the target frame sub-policy π2. For each defect object class (e.g., mIoU) c <th c (object class), the strategy generation subsystem 314 executes sub-steps 510, 512, and 514. In 510, the strategy generation subsystem 314 generates scene conditions cond of scene type from the scene dictionary 306. c The scene conditions indicate the conditions under which a point cloud object instance of the current defective object class should potentially be injected into a given point cloud frame. For example, referring to the exemplary scene dictionary 306 shown in Table 1 above, the scene conditions for the object class "pedestrian" could be the following conditions: the scene type associated with the considered point cloud frame is tagged as scene type "intersection", "free pedestrian crossing", or "parking space". The scene conditions can be considered as "injection conditions", that is, necessary conditions for performing injection.

[0101] In 512, the strategy generation subsystem 314 generates the injection probability p. inject,c The injection probability indicates that when cond is satisfied... c Under the given conditions, the probability of injecting the point cloud object instance of the current defective object class into the considered point cloud frame is defined as follows: The injection probability p inject,c This can be viewed as a further injection condition, namely, that a point cloud object instance of the current defect object class will only be injected if, in a random subset of the frame (e.g., point cloud frame 212), the size of the subset is proportional to the size of the entire point cloud dataset 210, according to the definition of the injection probability.

[0102] In 514, the strategy generation subsystem 314 generates the range R. c Instances of point cloud objects of the defect object class c should be injected within the specified range. The range indicates a spatial region, such as the distance range from an observer (e.g., the LiDAR sensor used to generate the point cloud frame). In embodiments using point cloud frames containing multiple scenes, each point cloud frame may have a different scene type, and the range R... cFurther injection conditions can be set by: restricting the scene type within the point cloud frame, allowing the point cloud object instance of the current defect object class to be injected into the point cloud frame; and preventing the injection of the point cloud object instance of the current defect object class into a given point cloud frame if no scene of the required scene type (i.e., satisfying the scene condition) exists within the required range.

[0103] Therefore, for each defect object class, the strategy generation subsystem 314 can generate three injection conditions: the scene conditions, the injection probability of the point cloud object instance, and the range; under the three injection conditions, the point cloud object instance of the defect object class should be injected into a given point cloud frame.

[0104] In 516, the strategy generation subsystem 314 generates the additional target object sub-strategy π3. For each defect object class (e.g., mIoU) c <th c (Object class), the strategy generation subsystem 314 executes sub-steps 518 and 520. In 518, occurrence information is generated based on false negative values. The strategy generation subsystem 314 uses the class-specific confusion matrix CM of the current defect class c. c To identify all object classes c with FN (false negative) values ​​greater than a specific threshold. FN Consider an object class c whose identifier can be called a highly obfuscated additional object class. FN This is to potentially inject point cloud object instances along with the current defective object class into the point cloud frame; the highly obfuscated additional object class c identified in step 516 can be used. FN The list is represented as {c FN1 ,c FN2 ,…}。 Add an additional object class c for each highly obfuscated object. FN Assigning expected probability p occurrence The instruction indicates that a given highly obfuscated additional object class c should be injected along with a point cloud object instance of the current defective object class. FN The frequency of point cloud object instances. Generally, the higher the degree of obfuscation between the current defective object class and a given highly obfuscated additional object class (e.g., the larger the false negative value), the more likely it is to be assigned to p. occurrence The higher the value, the more likely point cloud object instances of the additional object class, exhibiting more false negative values, are to be injected into the point cloud frame relative to the current defective object class. This allows point cloud object instances of both object classes to coexist in a single enhanced point cloud frame, thereby improving the ability of the machine learning model 224 to distinguish between the two object classes when retraining using the enhanced point cloud frame. The list of highly obfuscated additional object classes and their corresponding occurrence probabilities can be referred to as occurrence information. Each highly obfuscated additional object class can be referred to as an additional target object class.

[0105] In step 520, additional target object class acceptance information is generated. The strategy generation subsystem 314 generates the list containing the defective object class c. The policy generation subsystem 314 assigns an expected acceptance probability p to each subset of the object class combination list, such that each subset in the object class combination list includes zero or more combinations of the defective object class c and its corresponding highly obfuscated additional object classes. accept This indicates the probability that point cloud object instances of the given object class combination should be injected into the point cloud frame. In some examples, for instance, by using p for each highly obfuscated additional object class in the combination. occurrence Value multiplication can be performed using p for each highly obfuscated additional object class in the combination. occurrence To generate the expected acceptance probability p accept In other examples, the expected acceptance probability can be generated based on the accuracy (e.g., mIoU) of each highly obfuscated appended object class. The lower the mIoU of each highly obfuscated appended object class, the higher the acceptance probability; this can be expressed as:

[0106] p accept ∝G(mIoU c1 ,mIoU c2 ,…)

[0107] Where G(.) is an arbitrary function, for example:

[0108]

[0109] Where a, b, and c are the mIoU values ​​for each highly obfuscated appended object class, and are numbers between 0 and 1.

[0110] Therefore, the lower the mIoU of a given highly confusing additional object class, the more important it is in calculating the expected acceptance probability of the object class combination. If the machine learning model 224 predicts each object class in a given combination with low accuracy (i.e., low mIoU), more point cloud frames can be augmented by injecting point cloud object instances of the given combination. The list of object class combinations and the corresponding expected acceptance probability p for each combination are provided. accept This can be referred to as receiving information from the additional target object class.

[0111] See you again Figure 4Following the policy generation step 412, the method 400 proceeds to the sample selection step 420. In 420, the sample selection subsystem 316 applies the main policy 222 to the point cloud dataset 210 to select one or more point cloud frames from the point cloud dataset 210 for enhancement. Step 420 includes sub-steps 422 and 424.

[0112] In step 422, the sample selection subsystem 316 identifies target point cloud frames 226 for data augmentation from the plurality of point cloud frames 212. In some embodiments, step 422 can be performed by applying the target frame sub-policy to the plurality of point cloud frames 212 or a subset of the plurality of point cloud frames 212 to select one or more target point cloud frames 226. In some examples, all point cloud frames 212 may be used to identify target point cloud frames 226 for data augmentation, while in other examples, only a subset of the point cloud frames 212 may be used to identify target point cloud frames 226 for data augmentation. For example, some embodiments may perform method 400 alternately, where half of the point cloud frames 212 are considered for data augmentation, the augmented point cloud frames being used to retrain the machine learning model 224, and then the process is repeated for the other half of the point cloud frames 212. It should be understood that in any given iteration of method 400, other methods may be used to consider only a subset of the point cloud frames 212 for data augmentation.

[0113] For each point cloud frame considered for data augmentation, a series of operations are performed to apply the main policy 222 to identify the target point cloud frame 226 for data augmentation. First, the defect object class identified by the main policy 222 is determined by referring to the target object sub-policy π1. Then, relative to the current frame being considered, for each defect object class c, the following operations of step 422 are performed.

[0114] For each defect object class c, refer to the target frame sub-policy π2 to determine whether a point cloud object instance of the current defect object class should be injected into the current point cloud frame 226. If, for the current defect object class and the current point cloud frame (e.g., based on the scene type marker 140 of the current point cloud frame, or for each scene type in the current point cloud frame), the scene condition cond c If the value is false, then consider the next defect object class. However, if the current defect object class and the scene condition of the current point cloud frame are both false, then... c If true, then consider the injection probability p. inject,c and the range R cThis is to determine whether a point cloud object instance of the current defect object class should be injected into the current point cloud frame. For example, a random or pseudo-random number generator can generate random values ​​between 0 and 1, used in conjunction with the injection probability p. inject,c A comparison is made to determine whether the point cloud object instance of the current defect object class should continue to be injected into the current point cloud frame; then the range R is... c The spatial characteristics and / or component scenes of the current point cloud frame are compared to determine whether the current point cloud frame contains a location where a point cloud object instance can be injected into the current defect object class. If all three injection conditions (i.e., scene condition cond) are met... c Injection probability p inject,c and range R c If the current point cloud frame is selected as the target point cloud frame 226, then the current defect object class will be selected as the target object class. In the following step 424, the point cloud object instance of the target object class (i.e., the target point cloud object instance 228) will be identified.

[0115] However, before proceeding to step 424, the sample selection subsystem 316 applies the additional target object sub-policy π3 to identify any additional target object class that should be injected into the target point cloud frame 226 along with the identified target object class. Combining the same injection conditions described above, the sample selection subsystem 316 applies the additional target object sub-policy to determine whether any subset of list L (i.e., object class combinations) can be used to inject point cloud object instances into the current point cloud frame. For example, for a combination of the defective object class c (i.e., the target object class) and a single highly obfuscated additional object class... The first object class composition is examined, and the highly obfuscated additional object class is checked. The injection condition (i.e., the scene condition cond) c Injection probability p inject,c and range R c If these injection conditions are met, then check the combination. The expected acceptance probability is calculated (e.g., using a random number generator as described above); if the expected acceptance probability is met, method 400 proceeds to step 424 to identify the two classes. The point cloud object instance is injected into the current point cloud frame; the combination of object classes selected for injection can be referred to as the selected object class combination. Otherwise, the next object class combination is considered for joint injection. If no combination of object classes other than the defective object class c is satisfied, method 400 proceeds to step 424 to identify only the point cloud object instance of the defective object class c for injection.

[0116] In some examples, method 400 may determine in step 422 that multiple different point cloud object instances of the object class combination should be injected into a single point cloud frame, thereby generating multiple enhanced point cloud frames. For example, if the defective object class c and the first object class combination are satisfied... Combination of the second object class If all conditions are met, the remaining steps of method 400 can be executed in three branches relative to three different injection operations, thereby generating three different enhanced point cloud frames.

[0117] In step 424, the sample selection subsystem 316 identifies one or more target point cloud object instances 228 selected from the plurality of labeled point cloud object instances 214. Each target point cloud object instance 228 is a labeled point cloud object instance 214 that is labeled using one of the object classes included in the selected object class combination in step 422. Therefore, one of the target point cloud object instances 228 is labeled using the defect object class c, and in some examples, additional target point cloud object instances 228 may use the selected object class combination (e.g., Highly obfuscated additional object classes contained in ) are marked.

[0118] The target point cloud object instance 228 can be identified using any suitable technique, such as randomly selecting any tagged point cloud object instance 214 with the correct object class label. In some embodiments, other sources can be used to select the target point cloud object instance 228, such as user-specified additional point cloud datasets, simulation datasets, etc., of the system 200.

[0119] In step 426, the transformation subsystem 318 transforms each target point cloud object instance 228 using a sub-strategy. The transformation subsystem 318 uses several different types of information from other subsystems or other sources to determine the transformation (i.e., the sub-strategy) to be applied to each target point cloud object instance 228, and values ​​for various enhancement parameters. This information may include: the object class of each target point cloud object instance 228; the range R of each object class. c The scene type within the target point cloud frame 226, into which each target point cloud object instance 228 is injected; other object classes included in the selected object class combination; and the expected acceptance probability of the selected object class combination. The conversion subsystem 318 can use this information to set enhancement parameters and apply a secondary strategy based on known point cloud frame data augmentation techniques. The secondary strategy is applied to each target point cloud object instance 228 to generate a corresponding converted point cloud object instance 232 for each target point cloud object instance 228.

[0120] In step 428, each transformed point cloud object instance 232 is injected into the target point cloud frame 226 to generate an enhanced point cloud frame 230 using known point cloud data augmentation techniques.

[0121] After generating the enhanced point cloud frame 230 in step 428, method 400 may return to step 420 to perform one or more iterations of steps 420 to 428, thereby generating one or more additional enhanced point cloud frames 230 using the current target point cloud frame 226 or other target point cloud frames. For example, the sample selection subsystem 316 may continue to loop among the defective object classes by referring to the target object sub-strategy of the main strategy 222, continue to loop among the point cloud frames 212 by referring to the target frame sub-strategy, and continue to loop among the object class combinations by referring to the additional target object sub-strategy, until all defective object classes, frames, and object class combinations have been considered for injection and appropriately used to generate the enhanced point cloud frame 230.

[0122] In step 430, the augmented point cloud frames 230 are used to retrain the machine learning model 224. In some embodiments, each augmented point cloud frame 230 is used to retrain the machine learning model 224 when it is generated. In other embodiments, a master policy is applied to generate multiple augmented point cloud frames 230 before using the generated set of augmented point cloud frames 230 to retrain the machine learning model 224. In some embodiments, the augmented point cloud frames 230 are added to the point cloud dataset 210 to form an augmented labeled dataset, and the augmented labeled point cloud dataset is used to retrain (or further train) the machine learning model 224.

[0123] After step 430, method 400 may return to step 404 once or more to identify inaccuracies in the currently retrained machine learning model 224 and generate additional augmented point cloud frames to address these remaining inaccuracies. In some embodiments, method 400 may instead return to step 402 to regenerate the scene dictionary 306 before performing obfuscation analysis; however, in the embodiments described herein, this is unnecessary because the augmented point cloud frame 230 is generated based on the scene dictionary 306, so the content of the scene dictionary 306 will not be changed even after the augmented point cloud frame 230 is added to the point cloud dataset 210.

[0124] Although the present invention describes methods and processes using steps in a certain order, one or more steps of the methods and processes may be omitted or modified as appropriate. One or more steps may be performed in an order other than that described, as appropriate.

[0125] Although the invention has been described at least partially in relation to methods, those skilled in the art will understand that the invention also relates to various components for performing at least some aspects and features of the described methods, whether by hardware components, software, or any combination of both. Therefore, the technical solutions of the invention can be embodied in the form of a software product. Suitable software products can be stored in pre-recorded storage devices or other similar non-volatile or non-transitory computer-readable media, including DVDs, CD-ROMs, USB flash drives, portable hard drives, or other storage media. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, server, or network device) to perform the method examples disclosed herein.

[0126] This invention may be embodied in other specific forms without departing from the subject matter of the claims. The exemplary embodiments described are to be regarded in all respects as illustrative rather than restrictive. Selected features from one or more of the foregoing embodiments may be combined to create alternative embodiments not explicitly described, and features suitable for such combinations are to be understood to fall within the scope of this invention.

[0127] All values ​​and sub-ranges within the scope of the disclosure are also disclosed. Furthermore, while the systems, devices, and processes disclosed and illustrated herein may include a specific number of elements / components, these systems, devices, and components may be modified to include more or fewer such elements / components. For example, while any element / component disclosed may be a single quantity, embodiments disclosed herein may be modified to include multiple such elements / components. The subject matter described herein is intended to cover and include all suitable technical modifications.

Claims

1. A method for enhancing point cloud datasets, characterized in that, The point cloud dataset includes multiple point cloud frames and multiple labeled point cloud object instances, and the method includes: Obtain prediction accuracy information from a machine learning model trained to perform a prediction task using the point cloud dataset. The prediction accuracy information indicates: The accuracy of the predictions generated by the trained machine learning model; Defect object class; A scene dictionary is obtained based on multiple tags of the multiple marked point cloud object instances and the multiple point cloud frames; The prediction accuracy information and the scene dictionary are processed to generate the main strategy; The main strategy is applied to the point cloud dataset for identification: The target point cloud frame among the plurality of point cloud frames; A target point cloud object instance selected from the plurality of marked point cloud object instances, the target point cloud object instance being marked using the defect object class; An enhanced point cloud frame is generated by injecting the target point cloud object instance into the target point cloud frame.

2. The method according to claim 1, characterized in that, The plurality of point cloud frames include the plurality of marked point cloud object instances.

3. The method according to claim 1 or 2, characterized in that, Obtaining the prediction accuracy information includes: Obtain multiple ground truth object class tags from the input of the machine learning model; For each ground truth object class label among the plurality of ground truth object class labels, determine the distribution of the predicted prediction object class labels; The prediction accuracy information includes the distribution of the predicted object class labels for each ground truth object class label.

4. The method according to claim 3, characterized in that: The prediction accuracy information includes at least one confusion matrix, which indicates the distribution of the predicted object class label of at least one ground truth object class label among the plurality of ground truth object class labels.

5. The method according to claim 3, characterized in that, Obtaining the prediction accuracy information also includes: The defective object class is identified based on the distribution of the predicted object class labels corresponding to the ground truth object class labels of the defective object class; Generate a class-specific confusion matrix for the defective object class, including the distribution of the predicted object class labels corresponding to the ground truth object class labels of the defective object class.

6. The method according to any one of claims 1 to 2, 4 to 5, characterized in that: The main strategy also specifies a second defective object class; The main strategy is applied to further identify a second target point cloud object instance selected from the plurality of marked point cloud object instances, the second target point cloud object instance being marked using the second defect object class; Generating the enhanced point cloud frame includes injecting the target point cloud object instance and the second target point cloud object instance into the target point cloud frame.

7. The method according to any one of claims 1 to 2, 4 to 5, characterized in that, Injecting the target point cloud object instance into the target point cloud frame includes: The main strategy is used to identify the location in the target point cloud frame; The target point cloud object instance is injected into the target point cloud frame at the specified location.

8. The method according to any one of claims 1 to 2, 4 to 5, characterized in that, Injecting the target point cloud object instance into the target point cloud frame includes: One or more sub-strategies are used to transform the target point cloud object instance to generate a transformed point cloud object instance; The transformed point cloud object instance is injected into the target point cloud frame.

9. The method according to claim 5, characterized in that, Processing the prediction accuracy information and the scene dictionary to generate the main policy includes: processing the class-specific confusion matrix and the scene dictionary to generate the main policy, wherein, The main strategy indicates multiple defect object classes; For each of the plurality of defect object classes, the main strategy includes an injection condition under which a point cloud object instance of the defect object class should be injected into a given point cloud frame. Applying the main strategy to the point cloud dataset to identify the target point cloud frame includes: determining whether the target point cloud frame satisfies the injection condition of the main strategy relative to the target object class.

10. The method according to claim 9, characterized in that, Also includes: After generating the enhanced point cloud frame, the enhanced point cloud frame is used to further train the machine learning model.

11. The method according to claim 10, characterized in that, Also includes: After using the enhanced point cloud frame to further train the machine learning model, repeat the following steps once or multiple times: obtain the prediction accuracy information; Generate the main strategy; identify the target point cloud frame; Identify the target object instance; generate the enhanced point cloud frame; further train the machine learning model.

12. A system for enhancing point cloud datasets, characterized in that, The point cloud dataset includes multiple point cloud frames and multiple labeled point cloud object instances, and the system includes: Processor device; A memory having stored machine-executable instructions that, when executed by the processor device, cause the system to perform the method according to any one of claims 1 to 11.

13. A system, characterized in that, The system includes means for performing the method according to any one of claims 1 to 11.

14. A computer-readable medium, characterized in that, The computer-readable medium includes instructions stored thereon that, when executed by a processor device of the device, cause the device to perform the method according to any one of claims 1 to 11.

15. A computer program, characterized in that, The computer program includes instructions stored thereon, which, when executed by a processor device of the device, cause the device to perform the method according to any one of claims 1 to 11.