Method and device for generating training data for a recognition model for recognizing objects in sensor data, training method and manipulation method

By combining automated recognition models and simulated training data, the problem of time-consuming and inconsistent manual labeling of sensor data is solved, enabling the efficient generation of high-quality training data and improving the recognition accuracy of autonomous robots in environmental perception.

CN112668603BActive Publication Date: 2026-06-19ROBERT BOSCH GMBH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ROBERT BOSCH GMBH
Filing Date
2020-10-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, manually labeling sensor data to train recognition models is time-consuming and inconsistent, resulting in unstable training data quality that is difficult to meet the environmental perception needs of autonomous robots.

Method used

By using automated methods and employing recognition models with additional sensors, objects can be identified in overlapping areas and sensor data can be synchronized. By combining simulated training data and actual sensor data, high-quality training data can be generated, reducing the reliance on manual labeling.

Benefits of technology

It enables the efficient and low-cost generation of large amounts of high-quality training data, improving the training efficiency and recognition accuracy of the recognition model, and reducing the workload of manual labeling and data inconsistency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN112668603B_ABST
    Figure CN112668603B_ABST
Patent Text Reader

Abstract

The present invention relates to a method for generating training data (126) for a recognition model (124) for recognizing an object (120) in sensor data (108) of a sensor (104), wherein, in additional sensor data (114) of an additional sensor (110) mapping at least one overlapping region (116), the object (120) and object attributes (122) are recognized using a trained additional recognition model (118), and the object attributes (122) of the object (120) identified in the overlapping region (116) are transferred to the sensor data (108) mapping at least the overlapping region (116) to generate training data (126).
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a method and apparatus for generating training data for a recognition model used to identify objects in sensor data, particularly from sensors of a vehicle. The invention also relates to a method for training the recognition model and a method for manipulating an autonomous robot. Background Technology

[0002] A recognition model for automatically identifying objects in sensor data can be trained using training data. Training data can be generated manually by having a human consider the recorded sensor data and search for objects within it. Labels can then be assigned to the found objects. These labels can contain information about the object; this information can be called object attributes. The training data can be called labeled samples. Summary of the Invention

[0003] Against this backdrop, by means of the solutions proposed herein, according to the technical solutions of the present invention, a method and corresponding apparatus for generating training data for a recognition model used to identify objects in sensor data are proposed. A method for training the recognition model and a method for manipulating an autonomous robot are also proposed. Finally, a corresponding computer program product and a machine-readable storage medium are proposed. Advantageous extensions and improvements to the solutions proposed herein are derived from the specification and described in the extensions of the invention.

[0004] Advantages of the present invention

[0005] The embodiments of the present invention can advantageously achieve the automatic generation of training data for training the recognition model. Automatic generation enables the production of a large amount of training data, thereby improving the training of the recognition model. It also reduces the overhead of manual work. Because the same recognition criteria are always used and inconsistencies are avoided through automatic generation, consistent quality of the training data can be achieved.

[0006] A method is proposed for generating training data for a recognition model used to identify objects in sensor data of a sensor, wherein, in additional sensor data mapping at least one overlapping region of an additional sensor, objects and object attributes are identified using a trained additional recognition model, and the object attributes of the objects identified in the overlapping region are transferred to the sensor data mapping the at least one overlapping region in order to generate training data.

[0007] Furthermore, the ideas behind the embodiments of the present invention can be considered as being based on the concepts and understanding described below.

[0008] The recognition model can be called a pattern recognition algorithm. It can be trained using training data to identify objects in sensor data and assign object attributes to these objects. The training data can be called labeled samples. The training data can be based on the recorded sensor data. To convert sensor data into training data, objects mapped in the sensor data can be labeled. Object attributes can be assigned to the labeled objects. Object attributes can be called labels. In the proposed scheme, objects and object attributes in other sensor data are automatically identified using a trained recognition model. The object attributes are transferred to the recorded sensor data. Other sensor data is detected using additional sensors, and this additional sensor data is called additional sensor data. Therefore, the trained recognition model is called an additional recognition model. Object attributes can also be transferred from the additional sensor data to other sensor data to obtain additional training data.

[0009] Sensors and additional sensors can share a common operating principle. However, they can also have different operating principles. This operating principle can be referred to as a mode or modality. Depending on their operating principles, sensors and additional sensors can be referred to as multimodal sensors. Sensors and additional sensors can detect real objects in a real-world environment. In overlapping areas, sensors and additional sensors can detect objects essentially simultaneously. Both sensors and additional sensors can be implemented as sensor systems and can fuse data from multiple sensor units of the same type. Sensors and additional sensors can be mechanically coupled to each other. For example, sensors and additional sensors can be arranged on the same robot. The robot can be designed as an autonomous robot. The robot can be a vehicle, especially an autonomous or partially autonomous vehicle. Additional sensors can be temporarily arranged on the robot and pointed to the overlapping area.

[0010] To transmit object attributes, corresponding additional data points within the location tolerance of the sensor data can be assigned object attributes to data points in the sensor data. Data points can be, for example, image points, and can have image coordinates and intensity values. Data points can also be spatial points, and have orientation and distance values. Multiple data points within a region can represent a common object. A trained recognition model can identify the dependency relationship between data points and the same object.

[0011] Objects and object attributes can be synchronized with sensor data. Additional sensor data and sensor data can be detected at different sampling frequencies. Different sampling frequencies may result in different detection times at the sensor and the additional sensor. When the object moves relative to the sensor between detection times, the object can be detected at different locations in the overlapping region. During synchronization, the recording times of the two sensors can be synchronized. Synchronization is then performed before recording. Then, the object and object attributes are also synchronized, as they correspond to the recording time of the additional sensor.

[0012] To achieve synchronization, sensor motion information can be used to compensate for the sensor motion between the additional detection time of the object (via an additional sensor) and the detection time of the object (via a sensor). For example, sensor motion information can be provided by the robot's control equipment. Similarly, sensor motion information can be provided by motion sensors, such as inertial sensors.

[0013] Alternatively or supplementarily, for synchronization, the object's motion between the object's additional detection moment (via an additional sensor) and the object's detection moment (via a sensor) can be compensated using object motion attributes. The object's trajectory can be obtained from at least two time-staggered detection moments and / or additional detection moments. Interpolation can be performed on the object's motion up to the detection moment or the additional detection moment.

[0014] Both sensor data and additional sensor data can be detected using a common sensor platform. The sensor and additional sensor can be mechanically coupled to each other. In this way, the sensor and additional sensor can move in a substantially synchronized manner. Through the common sensor platform, the sensor and additional sensor can have substantially the same field of view in the overlapping area. Thus, object properties can be easily transmitted.

[0015] Sensor data can be read by the sensor of the first mode. Additional sensor data can be read by the additional sensor of the second mode. The sensor and the additional sensor can have different operating principles. The sensor can be, for example, a radar sensor. The additional sensor can be, for example, a lidar sensor. An additional recognition model can be easily trained for the additional sensor.

[0016] An additional recognition model can be trained using simulated training data before recognizing object attributes. Simulated training data can be artificially generated. It can map typical situations and objects. Defined boundary conditions can be selected within the simulated training data.

[0017] Simulated training data can be generated using a generative model. The generative model can include a sensor model for additional sensors, a propagation model, and an object model for at least one virtual object. Using the sensor model, wave emission from the additional sensors can be simulated. Using the propagation model, the transmission of the emitted wave through the virtual environment to the object can be simulated as an arriving wave. Using the object model, the reflection of the transmitted wave emission at the object can be simulated. Using the propagation model, the transmission of this reflection through the virtual environment to the additional sensors can be simulated as an arriving wave. Using the sensor model, the detection of the transmitted reflection by the additional sensors can be simulated. At least one object attribute provided by the object model of the virtual object can be assigned to the detected reflection to generate simulated training data. The generative model can be a highly realistic mapping of reality. Situations that can only occur randomly in reality can be generated in the generative model. This allows for the generation of simulated training data for challenging situations. This improves the recognition quality of the additional recognition model.

[0018] This method can be implemented, for example, in software or hardware, or in a hybrid form of software and hardware, such as in a control device.

[0019] The proposed solution also implements an apparatus configured to perform, manipulate, or implement variations of the proposed method in a corresponding device.

[0020] The device may be an electrical device having at least one computing unit for processing signals or data, at least one storage unit for storing signals or data, and at least one interface and / or communication interface for reading or outputting data embedded in a communication protocol. The computing unit may be, for example, a signal processor, a so-called system ASIC, or a microcontroller, for processing sensor signals and outputting data signals based on the sensor signals. The storage unit may be, for example, flash memory, EPROM, or magnetic storage. The interface may be configured as a sensor interface for reading sensor signals from a sensor and / or as an actuator interface for outputting data signals and / or control signals to an actuator. The communication interface may be configured for wirelessly and / or wiredly reading or outputting data. The interface may also be a software module, for example, existing on the microcontroller along with other software modules.

[0021] It is also advantageous to have a computer program product or computer program having program code that can be stored on a machine-readable carrier or storage medium (e.g., semiconductor memory, hard disk memory or optical memory), and for performing, implementing and / or manipulating the steps of the method according to any one of the above embodiments, especially when the program product or program is implemented on a computer or device.

[0022] Furthermore, a method for training a recognition model is proposed, wherein the recognition model is implemented based on training data generated using one of the methods described above.

[0023] Furthermore, a method for controlling autonomous robots, particularly at least partially automated vehicles, is proposed, wherein the robot is controlled based on data generated by a recognition model trained in this way.

[0024] In manipulation, for example, the robot's longitudinal and / or lateral dynamics can be matched. For instance, evasive maneuvers and / or emergency stops can be taken based on the identification of objects along the vehicle's planned trajectory.

[0025] It should be noted that several possible features and advantages of the invention are described herein with reference to different embodiments. Those skilled in the art will recognize that features of the devices and methods can be combined, matched, or interchanged in a suitable manner to obtain other embodiments of the invention. Attached Figure Description

[0026] Embodiments of the present invention are described below with reference to the accompanying drawings, which should not be construed as limiting the invention.

[0027] Figure 1 An illustration shows a vehicle having a device according to one embodiment;

[0028] Figure 2 A flowchart illustrating a method according to one embodiment is shown;

[0029] Figure 3a An illustration shows a model for generating a method according to one embodiment;

[0030] Figure 3b A flowchart is shown for generating a model according to one embodiment of the method.

[0031] The accompanying drawings are schematic only and not to scale. In the drawings, the same reference numerals denote the same or equivalent features. Detailed Implementation

[0032] Figure 1An illustration shows a vehicle 100 having a device 102 according to one embodiment. The vehicle 100 may be, for example, an autonomous or partially autonomous motor vehicle. The vehicle 100 has a sensor 104 that detects a detection area 106 in front of the vehicle and maps the detection area to sensor data 108. The vehicle 100 also has an additional auxiliary sensor 110. The auxiliary sensor 110 similarly detects an additional detection area 112 in front of the vehicle 100 and maps the additional detection area 112 to additional sensor data 114. The detection area 106 and the additional detection area 112 overlap in an overlap area 116. The auxiliary sensor 110 may be temporarily disposed on the vehicle 100. The auxiliary sensor 110 may also be part of the vehicle 100's sensor system.

[0033] The trained additional recognition model 118 for the additional sensor 110 identifies objects 120 mapped in the additional sensor data 114 within the additional detection region 112 and assigns object attributes 122 to the objects 120. Here, the additional recognition model 118 has an algorithm that has been trained to recognize objects 120 in the additional sensor data 114. This training is implemented using labeled samples. Regions with identifiable objects are labeled in the labeled samples, and the corresponding object attributes are additionally stored. The additional recognition model has learned to recognize objects 120 based on the labeled samples.

[0034] A recognition model 124 is assigned to sensor 104 for recognizing object 120 in sensor data 108. Device 100 is configured to generate training data 126 for recognition model 124. Training data 126 may be referred to as labeled samples. After collecting training data 126, it can be used to train recognition model 124. For this purpose, multiple entities of device 100 can also be used to first generate multiple variations of training data 126, and then these multiple variations can be merged into a larger labeled sample of training data 126. For this purpose, device 102 has a transmission device 128. Transmission device 128 transmits object attributes 122 or tags of object 120 recognized in overlapping region 116 using additional recognition model 118 to sensor data 108 to generate training data 126 for recognition model 124. The training data or labeled samples 126 can also be used for purposes other than training the recognition model 124, such as evaluating or optimizing or adjusting an untrained or trained recognition model for sensor 108 using other means.

[0035] Sensor 104 can be, for example, a radar sensor, while the additional sensor 110 can be a lidar sensor. Sensor 104 and additional sensor 110 can therefore have different operating principles.

[0036] Typically, samples are first recorded using one vehicle or a group of vehicles. Then, training is performed outside the vehicle, for example, on a server. Recognition using an additional model 118 and the transmission or interpolation of labels onto the samples using a transmission device 128 can also be performed "afterward" outside the vehicle.

[0037] While training the recognition model 124 is the most important application of automatically generated labeled samples, it is not the only application.

[0038] Figure 2 A flowchart illustrating a method according to one embodiment is shown. This method can be, for example, as in... Figure 1 The method is implemented on a device. It includes an identification step 200 and a transmission step 202. In the identification step 200, an object 120 and its object attributes 122 are identified in the additional sensor data 114 of an additional sensor, using an additional identification model 118. In the transmission step 202, the object attributes 122 of the objects identified in the overlapping region are automatically transmitted to the sensor data 108 of the sensor to generate training data 126 for the identification model 124 of the sensor.

[0039] In transmission step 202, the coordinates of data points in the additional sensor data 114 (where the object 120 is identified by the additional recognition model 118) are used to assign the object attribute 122 of the identified object 120 to data points in the sensor data 108 that have substantially the same coordinates. In other words, the object attribute 122 of the identified object 120 is assigned to substantially the same area in the sensor data 108 where the object 120 is identified in the additional sensor data 114.

[0040] In one embodiment, the additional sensor data 114 is synchronized with the sensor data 108 to reduce displacement caused by different detection times at different sensors when there is relative movement between the object and the sensors.

[0041] In the preceding detection step 204, sensor data 108 and additional sensor data 114 are detected using both the sensor and additional sensors. In the subsequent training step 206, the recognition model 124 is trained using training data 126.

[0042] In one embodiment, an additional recognition model 118 is trained in advance using simulated training data 210 in an additional training step 208. The simulated training data 210 can be synthetically generated in a generation step 212 using a generated model.

[0043] Figure 3aAn illustration is shown of a generation model 300 for a method according to one embodiment. The generation model 300 may be used, for example, in a generation step to generate additional training data. The generation model 300 includes a sensor model 302, a propagation model 304, and an object model 306.

[0044] Sensor model 302 virtually maps to the attached sensor 110. Propagation model 304 virtually maps to the wave emission 308 of sensor model 302. Object model 306 virtually maps to at least one object 310 marked by its object attributes.

[0045] Wave emission 308 at the additional sensor 110 is simulated using sensor model 302. Here, optical effects at optical elements are simulated in the transmission path of the additional sensor 110, or electromagnetic effects at elements that affect electromagnetic waves are generally simulated, or acoustic effects at acoustic elements are simulated, for example, in the case of an ultrasonic sensor. Optical effects or generally electromagnetic or acoustic effects are, for example, attenuation, refraction, scattering, and / or reflection.

[0046] Using propagation model 304, the propagation of wave emission 308 through the virtual environment to the virtual object 310 is simulated. Here, for example, optical effects at air components (e.g., particles and aerosols) between the additional sensor 110 and the virtual object 310 are simulated. Here, the optical effects are defined as one of the object properties by using the distance between the object 310 and the additional sensor 110.

[0047] The reflection 312 of the arriving wave emission 308 is simulated using object model 306. A virtual object 310 is defined by pre-defined object attributes. These attributes include, for example, the type of object 310, the color of object 310, the surface structure of object 310, the orientation of object 310 relative to the additional sensor 110, and / or the velocity of object 310 relative to the additional sensor 110. Optical effects on the surface of object 310 are simulated here, for example.

[0048] On the opposite path, the propagation of the reflection 312 through the virtual environment to the additional sensor 110 is simulated using propagation model 304.

[0049] Then, the reception of the reflection 312 arriving via the additional sensor 110 is simulated using sensor model 302. Here, for example, the optical effects at the optical elements in the reception path of the additional sensor 110 are simulated. Therefore, the output of model 300 substantially corresponds to the original data actually generated. Additionally, the supplementary training data generated by model 300 includes at least one object attribute of object 310.

[0050] Figure 3bA flowchart illustrating a generation model for a method according to one embodiment is shown. Here, generation model 300 substantially corresponds to... Figure 3a The generation model is as follows: Sensor model 302 calculates wave 308 emitted by the additional sensor. Propagation model 304 thereby calculates wave 309 arriving at the object. Object model 306 calculates wave 312 reflected by the object. Propagation model 304 calculates wave 313 arriving at the additional sensor, and sensor model 302 calculates raw data, such as point cloud, output by the sensor.

[0051] Further detailed description of the invention

[0052] In other words, a method is proposed to train a sensor modality recognition model for autonomous robots using synthetic data from other modalities.

[0053] A key issue in robotics is environmental perception, or cognition. This involves using sensors to detect and pattern recognition methods to identify the environment surrounding an autonomous or partially autonomous machine—essentially converting sensor data into a symbolic description of relevant aspects of the environment. This symbolic description then forms the basis for actions taken within the environment, corresponding to the machine's application or intended use. The machine could be, for example, an autonomous or partially autonomous vehicle, or more generally, a robot capable of autonomous or partially autonomous action. A typical example of symbolic description of the environment is using attributes to describe static and dynamic objects, characterizing, for example, the position, shape, size, and / or velocity of individual objects. For example, objects might involve obstacles, with which collisions should be avoided.

[0054] This kind of environmental perception is typically based on data provided by a single sensor or a combination of multiple sensors. For example, these sensors—cameras, radar, lidar, and ultrasonic sensors—are combined into multi-mode sensor arrays.

[0055] Processing this sensor data to generate symbolic representations of the surrounding environment is a complex problem in pattern recognition. The best recognition performance (i.e., the lowest probability of error) is usually achieved with the help of trained methods, especially with artificial “deep” neural networks (e.g., deep neural networks / deep learning), whose architectures have a larger number of hidden layers (versteckter Schichten or hidden layers).

[0056] To train this method and achieve good recognition performance, a defined size of labeled samples is needed. These samples consist of recorded sensor measurements and their associated labels (i.e., symbolic descriptions of the objects detected by the sensors). Furthermore, labeled samples are required to safeguard, evaluate, and validate this environmental recognition method.

[0057] To date, manual labeling methods have been commonly used, in which a human operator generates reference labels for the surrounding environment based on image data and / or visualizations of non-image-based sensor data from their own vehicle. These manual methods are both time-consuming and expensive. Consequently, the amount of labeled sensor data that can be generated in this way is limited. Furthermore, manually labeled sensor data is prone to inaccuracies due to human operator errors, and inconsistent due to manual labeling being performed differently by different human operators.

[0058] The proposed scheme can improve the recognition performance of trained methods.

[0059] To this end, a symbolic representation, or labeling, of the robot's surrounding environment is automatically generated (i.e., without human operator intervention or manual labeling) using a two-stage method. By assigning labels to recorded sensor data, labeled samples of the sensor data can be generated automatically.

[0060] This method is based on the following: In a first stage, a model for pattern recognition (hereinafter referred to as the "Stage One Recognition Model") is generated from sensor data of a first modality through training. After training, if sensor data of the first modality is available, the Stage One Recognition Model allows for the automatic generation of labels for symbolic descriptions of the surrounding environment. Here, training data (i.e., labeled samples of sensor data of the first modality) is synthesized, and the training data thus represents the result of the simulation. The model of the first modality is used within the simulation to synthesize sensor data of that modality from the simulated representation of the environment. This model is called the "Generation Model". The first sensor modality may, for example, involve LiDAR.

[0061] In the second stage of this method, real, non-analog sensor data is recorded using a vehicle or robot equipped with sensors. A sensor of the first modality and another sensor of a different second modality are used. The field of view of the first modality should be at least as large as that of the second modality. The fields of view of multiple sensors of the same modality can be combined. The recognition model from stage one is then used to process the sensor data of the first modality to generate markers of the surrounding environment. Because the field of view of the second sensor is no larger than that of the first sensor, these markers can be transmitted to the data of the second sensor. Temporal interpolation of the markers may be necessary at this stage.

[0062] By transmitting these tags to the data recorded by sensor two, a tag sample for the second modality is generated. This tag sample can be used to train a model for recognizing sensor data in the second modality ("the recognition model of stage two"). Therefore, for this second model, only real or non-synthetic sensor data and its tags are used.

[0063] The second modality could involve, for example, radar, while the phase two recognition model could involve, for example, a deep neural network.

[0064] The advantage of this method is that it can generate labeled training data quickly and at a relatively low cost, while still achieving high-quality labeling of these training samples. This results in the generation of a relatively rich pool of training samples. Furthermore, the recognition model (e.g., a deep neural network (DNN)) generated using these labeled samples can achieve high recognition accuracy and reliability.

[0065] The advantage of being able to generate labeled training data quickly and cost-effectively applies not only to the first phase of this method but also to the second phase. Because labeled sensor data is generated through simulation in the first phase, neither vehicles equipped with sensors and devices for recording sensor data nor human drivers are required. Manual labeling by human operators is also unnecessary.

[0066] The quality of the sensor data in the first stage is based on the following: A sensor modality is selected for which a generative model can be defined that closely approximates the physical characteristics of the sensor modality. When using a real physical sensor of this modality, this results in good quality synthesized sensor data as long as the synthesized data largely matches the real data. The quality of the labels is also high because the actual properties of the static and dynamic objects simulated in the simulation are directly available. Additionally, if necessary or helpful for training the recognition model for modality one, the associations between the sensor data and the objects or their properties can also be used, as these associations can also be provided by the simulation.

[0067] Similarly, in the second stage, the advantage of this method is that it can generate labeled samples for the second sensor modality relatively quickly and cost-effectively. While it is necessary to equip one or more vehicles or robots with sensors for both modalities and devices for recording this sensor data in the second stage, the recorded data does not require costly manual labeling. This is because the recognition model (i.e., the result of the first stage) can be applied to the sensor data of the first modality to generate labels. These labels are then transferred to the second modality.

[0068] Another advantage of the two-stage approach is that the choice of sensor mode in the second stage is unrestricted, provided a generation model that approximates normal operation can be achieved. This is only necessary for the mode in the first stage. This is a significant advantage because for practical use in series vehicles, the following modes are typically preferred: for which an accurate generation model cannot be achieved or can only be achieved at high cost. For example, lidar is recommended for the first stage because lidar laser beams reflect off the surface of the object, thus lidar point clouds can be synthetically calculated relatively easily from the simulated vehicle surrounding environment. In contrast, in the simulation of radar sensors, relatively complex physical effects and object characteristics (including their material properties) must be considered because radar waves do not simply reflect off surfaces. However, this does not pose a problem when using the method for radar in the second stage, because there is no need for simulation in the second stage; instead, labeled samples of radar data (in this example) are obtained by identifying parallel-recorded lidar data. On the other hand, using radar in series vehicles has the advantage of being able to provide radar sensors that have already been tested and established in practice at a relatively favorable cost compared to lidar.

[0069] Furthermore, one advantage of this method is its ability to extend the second stage to other sensor modalities. This can be a significant advantage because, even when the field of view overlaps with that of the first modality, automatically generated markers can be transferred to other sensor modalities without requiring manual marking. Multi-modal sensor arrays are desirable for autonomous or partially autonomous driving applications because the redundancy achieved in this way improves the robustness and reliability of the system, especially when conditions in one modality are poor and this can be compensated for by another modality.

[0070] exist Figure 2 The diagram illustrates the process of the method with the two phases already mentioned. In phase one, synthetic labeled samples of sensor data for modality one are generated through simulation using simulation tools, including a model generation tool. Phase two can be implemented after training the recognition model for the first modality.

[0071] Unlabeled samples of sensor data for both modes are recorded using a device (e.g., a vehicle equipped with sensors and a device for recording and storing sensor data) that records unlabeled samples of multi-mode sensor data. Labeled samples of sensor data for both modes are then automatically generated using a recognition model for mode one.

[0072] By training the recognition model for modality two, the recognition model for modality two can also recognize objects.

[0073] In the first stage, synthetic labeled samples of sensor data for the first modality are generated. This is done using a simulation tool that simulates not only the vehicle's own motion but also the motion of other vehicles in its surrounding environment. Additionally, a static surrounding environment is simulated, thus generating both static and dynamic surrounding environments for the vehicle at any given time, where object attributes can be appropriately selected and thus relevant labels for the objects can be derived. Synthetic sensor data for these objects is generated by a generative model, which is part of the simulation tool.

[0074] The model is based on: accurately describing the physical characteristics of the first sensor mode through mathematics and algorithms, and implementing a software module on this basis. This software module calculates the expected sensor measurement data by taking the properties of the simulated object, the characteristics of various implementations of the physical sensor, and the position of the virtual sensor in the simulation.

[0075] When generating a model, different sub-models or corresponding software components can be distinguished.

[0076] Here, the lidar is described as an analog sensor. First, the sensor model describes and calculates the emission of sensor waves, taking into account the sensor characteristics. Here, the sensor characteristics are mode-specific and also depend on the individual sensor structure type and variants. Second, the sensor model describes and calculates the reception of sensor waves reflected by the object.

[0077] Here, laser light is described as a sensor wave. The sensor wave propagation model calculates the propagation of the sensor wave from its emission by the sensor until it illuminates the relevant object (e.g., scattering, attenuation), and similarly, the propagation of the sensor wave reflected by the object until it is detected by the sensor.

[0078] The simulated dynamic objects can be, for example, vehicles or pedestrians. The simulated static objects can be, for example, obstacles, guardrails, or traffic signs. The object model is calculated, for example, by tracing the laser beam in a LiDAR system based on object characteristics such as surface properties to determine the behavior of the sensor wave and its reflection when it strikes the object.

[0079] The sensor model depends on the mode used (e.g., lidar). However, it is also particularly specific to the individual structural types and (if necessary) hardware and software versions or configurations of the sensors actually used in the second phase. For example, taking into account the specific characteristics of the physical lidar sensor used in phase two of the method, the lidar sensor model simulates the laser beam emitted by various implementations of the lidar sensor. These characteristics include, for example, the number of lidar layers (i.e., vertical resolution), horizontal resolution, rotation speed (if a rotating lidar is involved) or frequency, and horizontal and vertical emission angles or fields of view. The sensor model also simulates the detection of sensor waves reflected from an object, which ultimately leads to sensor measurements.

[0080] The sensor wave propagation model is also part of the generated model. This model describes and calculates the changes in the sensor wave, both along the path from the sensor to the relevant object and along the path back from the object to the sensor's detection unit. Physical effects, such as attenuation depending on the path traveled or scattering depending on the characteristics of the surrounding environment, are considered here.

[0081] Finally, the generated model also includes at least one object model, whose task is to calculate the varying sensor waves from the sensor waves arriving at each relevant object. These varying sensor waves are generated by a portion of the wave emitted by the sensor being reflected back by the object. The object model considers object properties that affect the reflection of the sensor waves. In the LiDAR example, surface characteristics such as color are important, or the object shape that determines the angle of laser illumination is also important.

[0082] The descriptions of these components apply to sensor modes based on the fact that sensors such as lidar, radar, or ultrasonic sensors actively emit sensor waves. In the case of passive sensor modes (e.g., cameras), the generation model can also be subdivided into the described components, but the computation differs somewhat. For example, wave generation is omitted in the sensor model and replaced instead by a model for generating ambient waves.

[0083] The recognition model in Phase 1 could, for example, involve a deep neural network (DNN) for recognizing LiDAR point clouds (“point clouds”). The attribute to be recognized for dynamic objects is typically their location, which changes over time; this can also be understood as the object's trajectory. Additionally, attributes describing the object's size are often recognized, where, as a simplifying approximation, a definite shape of the object is frequently assumed (“bounding box”).

[0084] To specifically identify the data of the first modality, one obvious possibility is to use a "single-frame" based DNN to detect objects. This means that instead of initially accumulating sensor data over a defined time period, data from a single frame (e.g., a single lidar scan) is provided to the DNN as input. The detected objects can then be associated with previously detected objects (if any), and the temporal changes in the trajectory can be determined using established object tracking methods (e.g., Kalman filtering).

[0085] Alternatively, the learned methods can also be used to perform tracking. For example, a single-frame DNN can be linked with a recurrent neural network (RNN), so that for determining the state of an object at a given moment, information from the past can also be taken into account by the deep neural network.

[0086] In the second phase, multimodal, real-world sensor data is recorded. Labels for this data are generated by a recognition model for the first modality, which was trained in the first phase using synthetic labeled samples. Although this recognition is performed on the first modality data, if the field of view of the second modality is a subset of the field of view of the first modality, the label can be transferred to, or applied to, the second modality data.

[0087] If the sensors of different modalities measure at inconsistent frequencies, or if the sensors are out of sync during recording, then time interpolation of the markers is required in this transmission.

[0088] A typical application of the labeled samples generated in Phase Two is training a recognition model, such as a deep neural network, that takes sensor data from the second modality as input and identifies static and dynamic objects, thereby outputting estimates of important and relevant object attributes. Similar to the recognition model in Phase One described above, this model could involve a “single-frame” DNN and perform Kalman tracking independently. Likewise, similar to the recognition model in Phase One, alternatively, the entire recognition process, including tracking, could be performed using a trained method.

[0089] Another application of these labeled samples is to evaluate the recognition accuracy of software modules used for environmental perception based on sensor data from mode two, for example, if the module has not used any method for training on samples. If the module to be evaluated is trained using labeled samples generated by the method proposed herein, the evaluation is still meaningful if it can be demonstrated that the labels generated by the recognition model from stage one are of higher quality than the results from the recognition model from stage two in terms of important relevant metrics. In this case, the sensors of the first mode and the recognition model from stage one can be regarded as a reference system.

[0090] In summary, a method for generating synthetic labeled samples in a first stage and an application of the resulting recognition model in a second stage are proposed. The method can be implemented on a device that first generates labeled samples of synthetic data and then, through training, generates a recognition model for the first modality. Using this device or a separate device, real data can then be recorded, and sensor data for the second modality can be labeled using the recognition model for modality one.

[0091] Finally, it should be noted that terms such as "having" or "comprising" do not exclude other elements or steps, and terms such as "a" or "one" do not exclude multiple. Reference numerals in the claims should not be considered limiting.

Claims

1. A method for generating training data (126) for a recognition model (124), said recognition model for recognizing objects (120) in sensor data (108) of sensors (104) of a robot that is at least partially autonomous, wherein, In the additional sensor data (114) of the additional sensors (110) of the robot mapping at least one overlapping region (116), an object (120) and object attributes (122) are identified using a trained additional recognition model (118), and the object attributes (122) of the object (120) identified in the overlapping region (116) are transferred to the sensor data (108) mapped at least the overlapping region (116) to generate the training data (126), wherein the sensor (104) detects and maps a detection region (106) in the environment surrounding the robot, and the additional sensor (110) detects and maps an additional detection region (112) in the environment surrounding the robot. The detection area (106) of the sensor (104) and the additional detection area (112) of the additional sensor (110) overlap in the overlapping area (116). The additional recognition model (118) has an algorithm that has been trained to recognize objects (120) in the additional sensor data (114). The training is performed using labeled samples, in which regions with identified objects are labeled and corresponding object attributes of the objects are additionally stored. The additional recognition model has learned to recognize the objects (120) based on the labeled samples. Prior to recognizing the object attributes (122), the additional recognition model (118) is trained using simulated training data (210), wherein the simulated training data (210) is generated using a generation model (300), wherein the generation model (300) includes a sensor model (302) of the additional sensor (110), a propagation model (304) and an object model (306) of at least one virtual object (310), wherein the generation model (300) is a highly realistic mapping of the actual object.

2. The method according to claim 1, wherein the object (120) and the object attributes (122) are synchronized with the sensor data (108).

3. The method according to claim 2, wherein, in order to synchronize, the sensor motion of the sensor (104) is compensated between the additional detection time of the object (120) passing through the additional sensor (110) and the detection time of the object (120) passing through the sensor (104) when sensor motion information is used.

4. The method according to any one of claims 2 to 3, wherein, in order to synchronize, the object motion of the object (120) between the additional detection time of the object (120) passing through the additional sensor (110) and the detection time of the object (120) passing through the sensor (104) is compensated using object motion attributes.

5. The method according to any one of claims 1 to 3, wherein the sensor data (108) and the additional sensor data (114) are detected by a common sensor platform.

6. The method according to any one of claims 1 to 3, wherein the sensor data (108) is read by a sensor (104) of a first mode and the additional sensor data (114) is read by an additional sensor (110) of a second mode.

7. The method according to any one of claims 1 to 3, wherein, when using the sensor model (302), the wave emission (308) of the additional sensor (110) is simulated; when using the propagation model (304), the transmission of the wave emission (308) through the virtual surrounding environment to the object (310) is simulated as an arriving wave (309); when using the object model (306), the reflection (312) of the transmitted wave emission (308) at the object (310) is simulated; when using the propagation model (304), the transmission of the reflection (312) through the virtual surrounding environment to the additional sensor (110) is simulated as an arriving wave (313); and when using the sensor model (302), the detection of the transmitted reflection (312) through the additional sensor (110) is simulated, wherein, At least one object attribute (122) of the virtual object (310) provided by the object model (306) is assigned to the detected reflection (312) in order to generate the simulated training data (210).

8. The method of any one of claims 1 to 3, wherein, The sensor data (108) and / or the additional sensor data (114) are detected using a camera, radar, lidar and / or ultrasonic sensor.

9. A method for training a recognition model (124), characterized by, The recognition model (124) is trained using training data (126) generated by the method according to any one of claims 1 to 8.

10. A method for maneuvering an autonomous robot, characterized in that, The robot is manipulated using the output of the recognition model (124) trained according to claim 9.

11. The method of claim 10, wherein, The autonomous robot is constructed as a vehicle (100) that is at least partially automated.

12. An apparatus (102), wherein, The device (102) is configured to implement, realize and / or manipulate the method according to any one of claims 1 to 11 in a corresponding apparatus.

13. A machine-readable storage medium storing a computer program product thereon, the computer program product being configured to implement, perform, and / or manipulate the method according to any one of claims 1 to 11.