Proactive machine learning for mobile object control

By combining vehicle trajectory and feature-based starting data with active learning technology, and selecting mispredicted image data for labeling and training, the problem of high training resource consumption is solved, and the accuracy of machine learning models and the reliability of autonomous vehicle operation are improved.

CN122242623APending Publication Date: 2026-06-19FORD GLOBAL TECH LLC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FORD GLOBAL TECH LLC
Filing Date
2025-12-01
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies require a large amount of computing resources and image data when training machine learning models, and it is difficult to efficiently identify and correct errors in model predictions, which affects the accuracy of autonomous vehicle operation.

Method used

By employing active learning technology, the model combines the output of a machine learning model with vehicle trajectory and feature-based starting data, and selects mispredicted image data for labeling and training, thereby reducing computational resource consumption and improving model performance.

Benefits of technology

It improves the accuracy and efficiency of machine learning models, reduces the need for training datasets, and enhances the safety and reliability of autonomous vehicle operation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242623A_ABST
    Figure CN122242623A_ABST
Patent Text Reader

Abstract

This disclosure provides "Active Machine Learning for Controlling Moving Objects". A computer may include a processor and a memory. The memory may include instructions executable by the processor to determine a depth map based on an image received by a machine learning model trained on a first training dataset. The computer may determine an object trajectory of an object included in the image. The image may be selected based on determining interference between the object and the depth map according to the object trajectory. A second training dataset may be determined by adding the selected image to the first training dataset. The machine learning model may be trained using the second training dataset.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to active machine learning for mobile object control. Background Technology

[0002] Computers can operate systems and devices including vehicle, robot, drone, and / or object tracking systems. Data, including images, can be acquired by sensors and processed by the computer to determine the trajectory of the system relative to its environment and to objects within that environment. The computer can use these trajectories to operate the system or system components within the environment. Summary of the Invention

[0003] Systems that move and / or have moving parts (including vehicles, robots, drones, mobile phones, etc.) can determine the identity and location of objects in the environment surrounding the system by acquiring sensor data (including data about the environment around the system) and processing said sensor data. The determined identity and location data can be processed to determine the operation of the system or a part of the system. For example, a robot can determine the location of the arm of another nearby robot. The robot can use the determined robot arm location to determine a path to move a gripper on it to grasp a workpiece without encountering the arm of another robot. In another example, a vehicle can determine its location relative to its environment and the locations of objects such as roads and other vehicles in the environment. The vehicle can use its determined location and the determined identity and location of the objects to determine a path to operate on it while maintaining a predetermined relationship with the objects. In the description below, vehicle operation will be used herein as a non-limiting example of object identity and location determination.

[0004] Machine learning models can be trained on a server computer and then installed in the computing devices within the vehicle to receive sensor data from sensors included in the vehicle. The machine learning model can determine predictions about the received sensor data to assist in operating the vehicle. For example, a machine learning model can be trained to receive images from a camera and determine the positions of objects in the environment surrounding the vehicle. The predicted state output from the machine learning model can include the predicted position and orientation of the object relative to the vehicle, including the distance and angle between the vehicle and the object. The computing devices included in the vehicle can use the object prediction data to determine a trajectory that the vehicle can travel to reach the predicted future position. The computing devices can then control the vehicle to travel along the trajectory by issuing commands to controllers that operate vehicle components such as propulsion, steering, and braking, as described below. Figure 1 As described.

[0005] The performance of a machine learning model can be determined by comparing the identities and locations of objects (such as vehicles, roads, curbs, buildings, trees, traffic signs, traffic obstacles, etc.) appearing in an input image with ground-based data (or simply "ground-based data"). Ground-based data includes object identities and locations obtained from sources other than the machine learning model being tested. Examples of sources of ground-based data include other previously trained machine learning models or humans. The performance of a machine learning model can be measured by determining the percentage of objects correctly identified and located within a user-defined threshold compared to the identities and locations included in the ground-based data.

[0006] The performance of a machine learning model can depend on the images in its training dataset, which accurately represent the types of images encountered when the model is deployed in the field. A powerful approach to determining a representative training dataset could be to simply acquire a very large number of images, determine the ground truth for all of them, and use the entire training dataset to train and test the machine learning model. Determining the ground truth (also referred to as labeling in this paper) involves identifying and locating objects in the images. The process of determining ground truth data and training the machine learning model requires computational resources for machine learning labeling, which are proportional to the number of images in the training dataset. The technique described in this paper uses active learning techniques to train high-performance machine learning models while minimizing the number of images required in the training dataset.

[0007] Active learning is a technique for identifying input data that enhances the performance of machine learning models when used before computational resources are spent determining ground truth and training the model with additional data. Active learning samples data from an unlabeled dataset to select data for labeling based on the results, and then trains the machine learning model. The model is first trained using a labeled training dataset that includes a subset of available data. Unlabeled data is then selected from the remaining data for the model to process. The results from the unlabeled data are evaluated to determine whether the data should be labeled and added to the training set. Data that produces inaccurate results is selected for labeling and further training based on the assumption that further training based on inaccurate results will provide the greatest gain in terms of the machine learning model's performance.

[0008] Several different approaches exist for determining the accuracy of active learning outcomes. Some approaches rely on confidence values ​​output by the machine learning model to measure the uncertainty of the outcome or to determine the confidence margin based on comparing possible outcomes from the machine learning model's output. Other techniques measure the entropy or average information content of the machine learning model's output. Still others rely on training multiple machine learning models using different training datasets and comparing the results between different systems. All of these techniques rely on specially coded machine learning models that output confidence or entropy values ​​and require computational resources in addition to the machine learning model. The active learning technique described in this paper does not rely on confidence or entropy values ​​and does not require additional computational resources to determine these values. The active learning technique described in this paper can be applied to classification tasks (such as object recognition) and regression tasks (such as object localization).

[0009] The technique for active machine learning described in this paper uses available vehicle data to test the accuracy of predictions output by a machine learning model by determining one-dimensional features, such as contact between a moving object and objects in the environment. If the available vehicle data indicates that the predictions about object identity and location are inaccurate, the image data used to generate the predictions can be labeled to generate ground truth, and the selected images and corresponding ground truth data can be included in the training dataset for further training of the machine learning model. By selecting image data that generates incorrect identity and / or location predictions, the technique described in this paper can enhance the training of the machine learning model by improving performance while minimizing the computational resources used for labeling images and training the machine learning model.

[0010] The techniques for active machine learning described herein include acquiring image data from a vehicle and sensor data describing the vehicle's attitude, including its position and orientation, over multiple time steps following the acquisition of image data. Feature initiation data may also be acquired from the vehicle. Feature initiation refers to the activation of vehicle features (such as contact, airbag deployment, computer-controlled stopping and / or steering), i.e., commanded by the vehicle's computing device in response to an object in the vehicle's environment. Feature initiation may be indicated by computer data stored to register or record feature initiations. Feature initiation may occur in response to or to prevent disturbances, where disturbances here refer to contact with an object in the environment surrounding the vehicle. Possible disturbances may also be detected by sudden changes in the vehicle's trajectory, which may or may not be accompanied by an initiation event.

[0011] As disclosed herein, active machine learning can select data for labeling and training based on examining image data output from a machine learning system and comparing it with vehicle trajectory data and feature initiation data. If the combination of image data output from the machine learning system and vehicle trajectory data indicates possible interference, but the vehicle trajectory data and / or feature initiation data indicate that possible contact between the vehicle and the object may not exist, labeling training data and training the machine learning system enhances its performance. Similarly, if the combination of image data output from the machine learning system and vehicle trajectory data indicates no interference, but the vehicle trajectory and / or feature initiation data indicate actual interference, labeling and training the machine learning system will enhance its performance.

[0012] This document discloses a method comprising: determining a depth map based on an image received by a machine learning model trained on a first dataset; and determining an object trajectory of an object included in the image. The image may be selected based on determining interference between the object and the depth map according to the object trajectory. A second training dataset may be determined by adding the selected image to the first training dataset. The machine learning model may be trained using the second training dataset. The machine learning model can determine the location of three-dimensional features in a depth map of the environment surrounding the object. The depth map may be formatted as a grid, wherein one or more of the cells are occupied cells, the cells being occupied by one or more three-dimensional features whose height relative to the ground plane exceeds a user-defined threshold. Interference may be determined by the overlap between the object and one or more occupied cells at a location predicted based on the trajectory.

[0013] The trajectory of an object can be determined based on sensor data from sensors included in the object. Adding the selected image to the first training dataset can include determining a label for the object. A second machine learning model can determine the label for the object. The machine learning model can be a generative adversarial network including an encoder, decoder, and discriminator. The depth map can include portions of one or more of roads, curbs, buildings, and trees. The object can be a vehicle. A second computer can be included, wherein the trained machine learning model is included in the second computer in the second vehicle, wherein the second computer is programmed to operate the second vehicle by determining the vehicle trajectory based on the output from the machine learning model. The second computer can be programmed to operate the second vehicle on the vehicle trajectory by manipulating vehicle components via a command controller. Adding the selected image to the first training dataset can include determining the location of the object. The second machine learning model can determine the location of the object.

[0014] A computer-readable medium is also disclosed, storing program instructions for performing some or all of the above-described method steps. A computer for performing some or all of the above-described method steps is also disclosed, the computer including a computer device programmed to determine a depth map based on images received by a machine learning model trained on a first dataset, and to determine object trajectories of objects included in the images. The images may be selected based on interference between the object and the depth map determined according to the object trajectory. A second training dataset may be determined by adding the selected images to the first training dataset. The machine learning model may be trained using the second training dataset. The machine learning model can determine the position of three-dimensional features in a depth map of the environment surrounding the object. The depth map may be formatted as arranged on a grid, wherein one or more of the cells are occupied cells, the cells being occupied by one or more three-dimensional features whose height relative to the ground plane exceeds a user-defined threshold. Interference may be determined by the overlap between the object and one or more occupied cells at a location predicted based on the trajectory.

[0015] The instructions may include determining the trajectory of an object based on sensor data from sensors included in the object. Adding the selected image to the first training dataset may include determining a label for the object. A second machine learning model may determine the label for the object. The machine learning model may be a generative adversarial network including an encoder, decoder, and discriminator. The depth map may include portions of one or more of roads, curbs, buildings, and trees. The object may be a vehicle. A second computer may be included, wherein the trained machine learning model is included in the second computer within the second vehicle, wherein the second computer is programmed to operate the second vehicle by determining the vehicle trajectory based on the output from the machine learning model. The second computer may be programmed to operate the second vehicle on the vehicle trajectory by manipulating vehicle components via a command controller. Adding the selected image to the first training dataset may include determining the location of the object. The second machine learning model may determine the location of the object. Attached Figure Description

[0016] Figure 1 This is a block diagram of an example image-based system.

[0017] Figure 2 This is an illustration of an example vehicle equipped with sensors.

[0018] Figure 3 This is a diagram of an example fisheye image.

[0019] Figure 4 This is an illustration of an example bird's-eye view image.

[0020] Figure 5 This is a diagram of an example machine learning model.

[0021] Figure 6 This is an illustration of an example bird's-eye view image with an overlay grid.

[0022] Figure 7 This is an illustration of an example bird's-eye view image with a covered grid and occupied cells.

[0023] Figure 8 This is a diagram of an example image showing occupancy units and vehicle trajectories.

[0024] Figure 9 This is an illustration of another example image with occupancy units and vehicle trajectories.

[0025] Figure 10 It is a diagram of the outline of an example vehicle.

[0026] Figure 11 This is a diagram illustrating another example image of a unit occupied by a vehicle.

[0027] Figure 12 This is a diagram of an example system for active machine learning.

[0028] Figure 13 This is a flowchart of the process used for proactive machine learning.

[0029] Figure 14 This is a flowchart illustrating the process of operating a vehicle based on training a machine learning model using active learning. Detailed Implementation

[0030] Figure 1 This is a diagram of an image-based system 100. In this example, system 100 includes a vehicle 110; however, in other examples, system 100 may include other devices that are mobile and / or have movable parts, such as robots, drones, or object tracking devices. In the example where system 100 includes a robot, drone, or object tracking device, controllers 112, 113, and 114 would be controllers for controlling the robot, drone, or object tracking device components. In the example described herein, system 100 includes a vehicle 110, a computing device 115 included in the vehicle 110, and a server computer 120 located remotely from the vehicle 110. The computing device 115 of one or more vehicles 110 may receive data about the operation of the vehicle 110 from sensors 116. The computing device 115 may operate the vehicle 110 based on the data received from the sensors 116 and the data received from the remote server computer 120. The server computer 120 may communicate with the vehicle 110 via a network 130.

[0031] The computing device 115 includes, for example, a processor and memory as known. Furthermore, the memory includes one or more forms of computer-readable medium and stores instructions that can be executed by the processor to perform various operations, including those disclosed herein. For example, the computing device 115 may include programming to operate one or more of the following: vehicle braking, propulsion (i.e., controlling speed in vehicle 110 by controlling one or more of an internal combustion engine, electric motor, hybrid engine, etc.), steering, climate control, interior lights, and exterior lights, as well as determining whether and when the computing device 115 (rather than a human operator) controls such operations. The computing device 115 may also control the timing alignment of lighting with sensor acquisition to take into account the color effects of vehicle lights or exterior lights (i.e., lighting may be adjusted to facilitate image data collection by sensor 116, said adjustment occurring at a time determined for data acquisition by sensor 116).

[0032] The computing device 115 may include more than one computing device (i.e., controllers included in vehicle 110 for monitoring and controlling various vehicle components, such as propulsion controller 112, brake controller 113, steering controller 114, etc.), or be communicatively connected to the more than one computing device via a vehicle communication bus as further described below. The computing device 115 is typically arranged for communication via a vehicle communication network (i.e., a bus including those in vehicle 110, such as a controller area network (CAN)); the vehicle 110 network may additionally or alternatively include known wired or wireless communication mechanisms, such as Ethernet or other communication protocols.

[0033] The computing device 115 can transmit and receive messages from various devices in the vehicle 110 (i.e., controllers, actuators, sensors (including sensor 116), etc.) via a vehicle network. Alternatively or additionally, where the computing device 115 actually comprises multiple devices, the vehicle communication network can be used for communication between the devices represented herein as computing device 115. Furthermore, as mentioned below, various controllers or sensing elements (such as sensor 116) can provide data to the computing device 115 via the vehicle communication network.

[0034] Additionally, computing device 115 may be configured to communicate with remote server computer 120 (i.e., cloud server) via network 130 through vehicle-to-infrastructure (V2I) interface 111, as described below. This interface includes hardware, firmware, and software that allow computing device 115 to communicate with remote server computer 120 via network 130, such as Wi-Fi® or cellular networks. V2X interface 111 may therefore include processors, memory, transceivers, etc., configured to utilize various wired and wireless networking technologies, i.e., cellular, BLUETOOTH®, Bluetooth Low Energy (BLE), ultra-wideband (UWB), peer-to-peer communication, UWB-based radar, IEEE 802.11, and other wired and wireless packet networks or technologies. The computing device 115 can be configured to communicate with other vehicles 110 via a V2X (Vehicle-to-Outside) interface 111 using a vehicle-to-vehicle (V2V) network (i.e., according to cellular wireless communication (C-V2X), dedicated short-range communication (DSRC), and similar communications) formed between adjacent vehicles 110 on the basis of a mobile ad hoc network or through an infrastructure-based network. The computing device 115 also includes known non-volatile memory. The computing device 115 can record data by storing it in the non-volatile memory for later retrieval and transmission to a server computer 120 or a user mobile device 160 via the vehicle communication network and the vehicle-to-infrastructure (V2I) interface 111.

[0035] As already mentioned, programming for operating one or more vehicle 110 components (i.e., braking, steering, propulsion, etc.) without human operator intervention is typically included in instructions stored in memory and executable by the processor of computing device 115. Using data received in computing device 115 (i.e., sensor data from sensor 116, data from server computer 120, etc.), computing device 115 can perform various determinations and controls on various vehicle 110 components and operations. For example, computing device 115 may include programming to control the operating behavior of vehicle 110 (i.e., the physical manifestations of vehicle 110 operation), such as speed, steering, etc., and strategic behavior (i.e., controlling operating behavior in a manner generally intended to achieve efficient route crossing), such as distance and time between vehicles, lane changes, minimum clearance between vehicles, minimum left-turn crossing path, arrival time to a specific location, and the shortest time from arrival to crossing an intersection (without traffic lights).

[0036] As used herein, the term "controller" includes a computing device typically programmed to monitor and control a particular vehicle subsystem. Examples include a propulsion controller 112, a brake controller 113, and a steering controller 114. A controller can be, for example, a known electronic control unit (ECU), and may include additional programming as described herein. A controller may be communicatively connected to and receive instructions from a computing device 115 to actuate a subsystem according to those instructions. For example, a brake controller 113 may receive instructions from a computing device 115 to operate the brakes of a vehicle 110.

[0037] One or more controllers 112, 113, 114 (such as computing device 115) for vehicle 110 include a computer processor and may include electronic control units (ECUs), etc., including, as a non-limiting example, one or more propulsion controllers 112, one or more brake controllers 113, and one or more steering controllers 114. Each of controllers 112, 113, 114 may include a corresponding processor and memory, and one or more actuators. Controllers 112, 113, 114 may be programmed and connected to a vehicle 110 communication bus, such as a controller area network (CAN) bus or a local area network (LIN) bus, to receive instructions from computing device 115 and control the actuators based on the instructions.

[0038] Sensor 116 may include a variety of known devices to provide data via a vehicle communication bus. For example, a radar fixed to the front bumper (not shown) of vehicle 110 may provide the distance from vehicle 110 to the next vehicle in front of vehicle 110, or a Global Positioning System (GPS) sensor located in vehicle 110 may provide the geographic coordinates of vehicle 110. For example, the distances provided by radar and other sensors 116 and the geographic coordinates provided by GPS sensors may be used by computing device 115 to operate vehicle 110 autonomously or semi-autonomously.

[0039] Vehicle 110 is typically a ground-based vehicle 110 and may be capable of autonomous and / or semi-autonomous operation and typically has three or more wheels (i.e., a passenger car, light truck, etc.). Vehicle 110 includes one or more sensors 116, a V2I interface 111, a computing device 115, and one or more controllers 112, 113, 114. Sensor 116 can collect data related to vehicle 110 and its operating environment. By way of example, but not limitation, sensor 116 may include, for example, a height gauge, a camera, a lidar (LiDAR), radar, an ultrasonic sensor, an infrared sensor, a pressure sensor, an accelerometer, a gyroscope, a temperature sensor, a Hall sensor, an optical sensor, a voltage sensor, a current sensor, a mechanical sensor (such as a switch), etc. Sensor 116 can be used to sense the operating environment of vehicle 110, i.e., sensor 116 can detect phenomena such as weather conditions (rain, outside temperature, etc.), road slope, road position (i.e., using road edges, lane markings, etc.) or the position of target objects (such as adjacent vehicles 110). Sensor 116 can also be used to collect data, including dynamic vehicle 110 data related to the operation of vehicle 110, such as speed, yaw rate, steering angle, engine speed, braking pressure, oil pressure, power applied to controllers 112, 113, 114 in vehicle 110, connectivity between components, and accurate and timely execution of components of vehicle 110.

[0040] Server computer 120 typically shares features with the V2I interface 111 of vehicle 110 and computing device 115 (e.g., computer processors and memory and configurations that communicate via network 130), and therefore these features will not be described further to reduce redundancy. Server computer 120 can be used to develop and train machine learning models that can be transferred to computing device 115 in vehicle 110.

[0041] Figure 2 This is an illustration of an example vehicle 110 including a front-facing camera 202, a rear-facing camera 204, and side-facing cameras 206 and 208. The front-facing camera, rear-facing camera, and side-facing cameras 202, 204, 206, and 208 each include fields of view 210, 212, 214, and 216, respectively. The front-facing camera, rear-facing camera, and side-facing cameras 202, 204, 206, and 208 may be fisheye cameras that allow fields of view 210, 212, 214, and 216 to acquire image data providing a 360-degree view of the environment surrounding vehicle 110. The fisheye camera includes an ultra-wide-angle (fisheye) lens that acquires images with an extremely wide field of view. Fisheye cameras are included in vehicle 110 because they can acquire image data from fields of view that would require two or more cameras with linear lenses to cover.

[0042] Figure 3These are illustrations of four fisheye images 302, 304, 306, and 308 acquired by cameras 202, 204, 206, and 208, respectively. While they offer advantages in covering a wide field of view, fisheye images 302, 304, 306, and 308 have the drawback of distorting objects within the field of view. The convex distortion included in fisheye images 302, 304, 306, and 308 can cause lines that are straight in the real world to appear curved in these images. Furthermore, object distortion varies depending on the object's position within the field of view, making it difficult to identify and locate objects using machine learning models. In some examples, fisheye-to-line transformation and image stitching can be used to transform fisheye images 302, 304, 306, and 308 into straight-line bird's-eye view (BEV) images. In the technique described in this article, a machine learning model is used to perform a fisheye-to-line transformation and image stitching to transform fisheye images 302, 304, 306, and 308 into BEV images, as follows: Figure 5 As stated above.

[0043] Figure 4 This is a diagram illustrating how fisheye images 302, 304, 306, and 308 are transformed into a single BEV image 400 by inputting images 302, 304, and 306 into a machine learning model. The BEV images appear as if they were acquired from a position looking directly down at the scene from a camera with a normal, undistorted lens.

[0044] BEV image 400 can also be referred to as a depth map because it includes the heights of objects included within it. Object heights can be determined using a machine learning model based on the location of pixels corresponding to the top edges of vertical surfaces included within the objects. Once the model has determined the ground plane based on the identified road pixels 402 and 404, the heights of objects such as curbs 406, 408, 410, and 412, buildings 414, 416, 418, 420, and 422, and trees 424 and 426 can be determined by assuming the objects extend vertically from the ground plane.

[0045] BEV image 400 includes roads 402 and 404 defined by curbs 406, 408, 410, and 412. Buildings 414, 416, 418, 420, and 422, as well as trees 424 and 426, are included in BEV image 400. BEV image 400 also includes an icon 428 indicating the location of vehicle 110, which includes cameras 202, 204, 206, and 208 that acquire fisheye images 302, 304, 306, and 308, which are transformed and stitched together to form BEV image 400. The transformations that generate BEV image 400 may include height data of objects, including roads 402 and 404, curbs 406, 408, 410, and 412, buildings 414, 416, 418, 420, and 422, and trees 424 and 426. The height assigned to roads 402 and 404 can be considered as the ground plane and can indicate the height of other objects in the BEV image 400 relative to the ground plane. For example, the height can be indicated using global coordinates (i.e., X and Y or longitude and latitude geographic coordinates, or Cartesian coordinates relative to the vehicle) and centimeters above the ground plane.

[0046] The technique described in this paper can use a machine learning model to generate BEV images 400 from fisheye images 302, 304, 306, and 308. Generating BEV images 400 by means of mathematical methods reduces the computational resources required. Furthermore, using adaptive learning to train the machine learning model further reduces the computational resources required to generate the training dataset and train the machine learning model.

[0047] Figure 5 This is a diagram of a machine learning model 500 including a Generative Adversarial Network (GAN) 516. GAN 516 is an example machine learning model 500 that can be used to generate a BEV image 510 from an input fisheye image 502. GAN 516 includes a generator network 504, which includes a decoder layer 506 and an encoder layer 508, followed by a discriminator network 512. The decoder 506 and encoder 508 process the input image data by convolving it with kernels whose weights are determined through training the decoder 506 and encoder 508. The discriminator 512 is trained using a ground-based BEV image 400 to determine whether the output BEV image 510 is "real" or "fake" based on whether it visually resembles the ground-based BEV image. During training, the output BEV image 510 from the generator network 504 is passed to the trained discriminator network 512, which receives the output image 510 and determines whether the output image 510 from the generator network 504 is real or fake.

[0048] During training, the output from the discriminator (e.g., "true" or "false") is used together with the output image 510 to form a loss function 514, which is backpropagated to the generator network 504 for training. GAN 516 is considered trained when the discriminator network 512 accepts the output image 510 generated by the generator network 504 as the real BEV image 400. At inference time, the output image 510 from the generator network 504 is used as the output, and the discriminator is not used. An overview of the technique for converting fisheye images into straight-line images using GAN 516 is "A Comprehensive Overview of Fisheye Camera Distortion Correction Methods" submitted by Jian Xu, De-Wei Han, Kang Li, Jun-Jie Li, and Zhao-Yuan Ma in May 2024, available at https: / / arxiv.org / abs / 2401.00442 from the date of this application.

[0049] Figure 6 This is a diagram of a BEV image 600 formatted as cells arranged on a grid 628. The BEV image 600 is generated by a machine learning model 500 in response to input fisheye images 302, 304, 306, and 308, as described above. Figures 3 to 5 The above is about... Figures 3 to 5 The described machine learning model 500 generates a BEV image 600 that includes height data about objects 630 in the BEV image 600. The objects 630 in the BEV image 600 include roads 602 and 604, curbs 606, 608, 610, and 612, buildings 614, 616, 618, 620, and 622, and trees 624 and 626. The heights of the objects 630 included in the BEV image 600 can be in global coordinates, such as meters. Global coordinates can be based on latitude, longitude, and altitude. It can be assumed that roads 602 and 604 are local ground planes, and the heights of all other objects 630 in the BEV image 600 can be measured relative to the ground plane.

[0050] Each cell of grid 628 may include a number indicating the height of the portion of BEV image 600 enclosed by the cell. The cell height can be determined by determining the height of three-dimensional features based on the pixels included in the BEV image 600 appearing within the grid 628 cell. Determining the height of the grid 628 cells in this way allows computations to be performed on the BEV image 600 at a resolution much lower than the pixel resolution of the BEV image 600, while preserving the height and position data of object 630 at a useful resolution. Performing computations on the BEV image 600 at grid 628 resolution enhances adaptive learning techniques by reducing the computational resources required to determine the interference between vehicle trajectory and the position and height of object 630.

[0051] Figure 7 This is a diagram illustrating the calculation of the heights included in a BEV image 700 based on grid 728. BEV image 700 includes objects 730, including roads 702, 704, curbs 706, 708, 710, 712, buildings 714, 716, 718, 720, 722, and trees 724, 726. The height of a grid cell 728 can be determined based on the maximum pixel height value included in the grid cells. In other examples, the average pixel height value can be used as the height value. In any example, a single height value is used to determine the height value of a grid cell 728.

[0052] The height value of each grid cell 728 can be compared with a user-defined threshold. The user-defined threshold can be determined based on a decision about which object 730 height will cause interference to vehicle 110. Interference can be defined as an interaction between grid cells 728 and vehicle 110 that will cause damage to vehicle 110, such as contact between the vehicle and an object with a height greater than the threshold. For example, vehicle 110 may overlap with grid cells 728 including curbs 706, 708, 710, 712 without interference, while vehicle 110 may overlap with grid cells 728 including buildings 714, 716, 718, 720, 722 or trees 724, 726. A threshold of 10 cm is an example of a threshold that can be used to distinguish grid cells 728 indicating interference from those not indicating interference, based on interference. BEV image 700 has been divided into non-interfering cells 732 (without cross-shading lines) and interfering cells 734 (with cross-shading lines) based on the cell height of grid 728. The non-interference unit 732 does not include objects 730 exceeding the threshold, while the interference unit 734 includes objects 730 exceeding the threshold.

[0053] Figure 8 This is a diagram illustrating a grid 800 based on a BEV image 700 output from a machine learning model 500, the grid comprising as described above regarding... Figure 7The non-interference unit 802 and interference unit 804 are described. The grid 800, including the non-interference unit 802 and interference unit 804, can be combined with an object trajectory. In this example, the vehicle trajectory 806 is determined based on sensor data to determine the fisheye images 302, 304, 306, and 308 that are input to the machine learning model 500 to form the BEV image 700 used to determine the grid 800. These images can be used to enhance the training of the machine learning model 500 using adaptive learning techniques as described herein.

[0054] As described above, fisheye images 302, 304, 306, and 308 can be acquired from cameras 202, 204, 206, and 208 included in vehicle 110. While acquiring fisheye images 302, 304, 306, and 308, data regarding vehicle trajectory 808 and feature initiation can be obtained from sensor data generated by sensors 116 included in vehicle 110. For example, GPS sensors, speedometers, wheel rotation sensors, and accelerometers can acquire data regarding the position, orientation, and speed of vehicle 110 while acquiring fisheye images 302, 304, 306, and 308 to determine the position and orientation of vehicle profile 806 at time t0. Computing device 115 can acquire data from controllers 112, 113, and 114 and other vehicle components (such as steering mechanisms) to determine whether feature initiation has occurred.

[0055] The sensor can also acquire position, orientation, and velocity data within a time period after acquiring fisheye images 302, 304, 306, and 308 to determine vehicle trajectory 808. Vehicle trajectory 808 can be used to determine the position and orientation of vehicle contours 810 and 812 (dashed lines) at times t1 and t2, respectively, after time t0. Checking grid 800 and vehicle trajectory 808 indicates that an interference event should have occurred between interference unit 814 and vehicle contour 812 at time t2. Interference events include changes in vehicle trajectory 808 in response to interference or the activation of features in response to interference.

[0056] Vehicle sensor 116 data can also be used to determine whether vehicle 110 experienced a disturbance event at time t2. If vehicle 110 experienced a disturbance event at time t2, vehicle sensor 116 will record a change in speed or direction indicating contact between the vehicle and an object with a height greater than a threshold. Similarly, feature initiation detected by computing device 115 at time t2 will indicate a disturbance event. If no change in vehicle trajectory 808 or feature initiation is detected at time t2, an error is indicated in BEV image 700. An error in BEV image 700 may indicate that machine learning model 500 has incorrectly placed object 730 in unit 814. An incorrect object in unit 814 may indicate that machine learning model 500 has incorrectly responded to fisheye images 302, 304, 306, 308. This suggests that labeling fisheye images 302, 304, 306, 308 and training machine learning model 500 using labeled fisheye images 302, 304, 306, 308 can help reduce the error rate of machine learning model 500.

[0057] The technique described in this paper for selecting data for labeling and training a machine learning model 500 based on errors in the output of the model, limits the adaptive learning of the machine learning model 500. By selecting data that is likely to reduce erroneous output when training the machine learning model 500, the computational resources used for labeling and training the machine learning model 500 are minimized, while the accuracy of the results is enhanced.

[0058] Figure 9 It is a diagram based on a grid 900 of a BEV image 700 output from a machine learning model 500, the grid comprising as described above regarding... Figure 7 The described non-interference unit 902 and interference unit 904 are based on... Figure 8 The interference detected in the machine learning system 500 is used to generate a grid 900 after labeling and training. The grid 900 shows the cells 914 that are correctly indicated as non-interference, which are correctly correlated with the vehicle trajectory 908, which indicates that the vehicle 110 travels from the position indicated by contour 906 at time t0 to the positions indicated by contours 910 and 912 at times t1 and t2, respectively.

[0059] If vehicle sensor 116 has recorded a change in speed or direction indicating an interference event between the vehicle and an object with a height greater than a threshold, an interference event is indicated. Similarly, feature initiation detected by computing device 115 at time t2 will indicate an interference event. If a change in vehicle trajectory 908 or feature initiation is detected at time t2, and grid 900 indicates no interference, an error is indicated in BEV image 700. An error in BEV image 700 may indicate that machine learning model 500 has not incorrectly placed object 730 in cell 914. An incorrect object in cell 914 may indicate that machine learning model 500 has responded incorrectly to fisheye images 302, 304, 306, 308. This indicates that labeling fisheye images 302, 304, 306, 308 and training machine learning model 500 using labeled fisheye images 302, 304, 306, 308 can help reduce the error rate of machine learning model 500.

[0060] Figure 10 This is a diagram illustrating vehicle profiles 1004 and 1008 for vehicle attitude calculation. To correctly place vehicle profiles 1004 and 1008 on grid 900, apply vehicle trajectory data 908 to vehicle profiles 1004 and 1008. Figure 10 Including reference frame 1002, which describes global coordinates X gps , Y gps Position and orientation within the vehicle. Vehicle trajectory data 908, including vehicle position and orientation, is received from vehicle 110 in global coordinates: in It is in time t+i The vehicle's position in a direction parallel to longitude, obtained from the vehicle's GPS sensors. It is in time t+i The vehicle's position in a direction parallel to latitude is obtained from the vehicle's GPS sensors, and It is in time t+i Vehicle orientation relative to reference frame 1002.

[0061] This can be achieved by translating the vehicle profile from global coordinates to local coordinates. This is used to determine the interference to the power grid. Local translations 1006 and 1012 (dashed lines) of local reference frames 1008 and 1014 relative to the global reference frame 1002 rotate the global orientation. and Local location It can be determined by the following equation: fori = 0, 1, 2, ... t p .

[0062] Figure 11 This is a diagram of grid 1100, which shows the effect on the grid as follows: Figures 8 to 10 The determination of the cells occupied by the translated and oriented vehicle profile 1102 as described herein. The vehicle profile 1102 is surrounded by a bounding box 1104, wherein the bounding box 1104 is determined by the smallest x-mesh element and y-mesh element that completely surrounds the vehicle profile 1102. Any mesh 1100 cells that appear within the bounding box 1104 and are within or in contact with the vehicle profile 1102 are included as occupied cells 1106 (cross-shaded lines) of the vehicle profile 1102.

[0063] The occupying cell 1106 can be determined by constructing a polyhedron that connects the sides of the vehicle profile 1102 to each of the cells within the bounding box 1104. The polyhedron is... n Generalization of the polyhedron. In this example, the polyhedron is a four-sided pyramid, taking the vehicle profile 1102 as its base and each element of the bounding box 1104 as its peak. If any part of the polyhedron's peak falls within the vehicle profile 1102, the element forming the peak is included as an occupying element 1106. Regarding... Figure 8 Any occupied unit 1106 in the vehicle outline 1102 that overlaps with the interference unit 804 will generate an interference event.

[0064] Figure 12 This is a diagram of an adaptive learning system 1200. The adaptive learning system includes data and software programs executed on a server computer 120 under the control of software programs, which control the flow of data and results between the software programs included in the adaptive learning system 1200. The adaptive learning system 1200 includes a machine learning model 1204 trained with a first training dataset to receive fisheye images 302, 304, 306, and 308 from a training dataset 1202, including images of the environment surrounding the vehicle, and outputs a BEV image 600.

[0065] The output BEV image 600 is received by the interference engine 1206, which determines the interference grid 800 based on the received BEV image 600 and determines the occupancy unit 1106 based on vehicle sensor 116 data determined by vehicle 110 when acquiring fisheye images 302, 304, 306, and 308. The interference engine 1206 can determine interference between objects 630 included in the BEV image 600, and as per... Figure 8 , Figure 9 , Figure 10 and Figure 11The vehicle outline 812 is determined as described. In an example where the interference engine 1206 determines that there is interference between objects 630 included in the BEV image 600, the locations of the interference in the input fisheye images 302, 304, 306, 308 and the BEV image 600 can be passed to the labeled machine learning model 1208.

[0066] The labeling machine learning model 1208 receives the labeled BEV image 600 and data from the interference engine 1206 indicating one or more locations that have been mislabeled, and corrects the labels on the BEV image 600. The BEV image 600 can be relabeled using a machine learning model trained to correctly label the BEV image 600. In some examples, labeling can be performed offline by humans. While the machine learning model is labeling or humans are labeling, the portions of the BEV image 600 that cause incorrect interference or are not interference are recorded to ensure that the erroneous portions are correctly labeled.

[0067] After labeling the machine learning model 1208, the BEV image 600 including ground reality labels, the input fisheye images 302, 304, 306, and 308, and the included vehicle data are returned to the training dataset 1202 to train the machine learning model 1204, as described above. Figure 5 As described. In the example of adaptive machine learning training, multiple sets of fisheye images 302, 304, 306, and 308, including ground real-world data, can be combined into a second training dataset for training a machine learning model. Training the machine learning model 1208 based on determining the error in the output data using independent data sources (such as vehicle trajectories) can enhance the training of the machine learning model by identifying datasets that will provide performance enhancements while minimizing computational resources dedicated to labeling and training by selecting data that generate erroneous results.

[0068] After training the machine learning model 1204, it can be transmitted from the server computer 120 to the computing device 115 included in the vehicle 110 via network 130. The machine learning model can be executed on the computing device 115 to receive image data from the sensors 116 included in the vehicle 110 and output a BEV image 600 that will be used by the computing device 115 to operate the vehicle 110.

[0069] Figure 13This is a flowchart of a process 1300 for adaptive training of a machine learning model 1204. Process 1300 can be implemented as hardware and software executed on a server computer 120 to train the machine learning model 1204. Process 1300 includes multiple boxes that can be executed in the order shown. Alternatively or additionally, process 1300 may include fewer boxes and may include boxes executed in a different order.

[0070] At box 1302, a first software program executing on server computer 120 selects unlabeled fisheye images 302, 304, 306, and 308 from a training dataset 1202 of fisheye images acquired from vehicle 110. Fisheye images 302, 304, 306, and 308 include vehicle information acquired from vehicle sensor 116 indicating multiple vehicle positions and orientations at multiple time steps. For example, the time steps may indicate multiple sets of fisheye images 302, 304, 306, and 308 acquired over multiple video frame times. Fisheye images 302, 304, 306, and 308 include vehicle position and attitude data that can be used to determine vehicle trajectory 808, as well as vehicle feature initiation data as described above.

[0071] At box 1304, the machine learning model 1204, trained on the first labeled dataset, takes fisheye images 302, 304, 306, and 308 as input and outputs BEV image 600, as described above. Figure 4 As described.

[0072] At frame 1306, the interference engine 1206 applies mesh 628 to the BEV image 600 and determines non-interference units 732 and interference units 734 based on object heights exceeding a threshold. Vehicle position and orientation data included in fisheye images 302, 304, 306, and 308 are used to determine multiple vehicle profiles 1102, as described above. Figure 10 and Figure 11 As described. The vehicle profile 1102 and feature initiation data are combined with the non-interference unit 732 and the interference unit 734 to determine whether the combination of the vehicle profile 1102, feature initiation data and BEV image 600 indicates: an interference event when no interference event occurs in the vehicle data; or no interference when an interference event occurs in the vehicle data.

[0073] At box 1308, when there is no difference between the BEV image 600 and the vehicle data regarding the occurrence or non-occurrence of an interfering event, process 1300 loops back to box 1302 to select another set of fisheye images 302, 304, 306, 308 and the corresponding vehicle trajectory and feature initiation data. When there is a difference between the BEV image 600 and the vehicle data regarding the occurrence or non-occurrence of an interfering event, process 1300 proceeds to box 1310.

[0074] At box 1310, the BEV image 600 is passed to the labeling machine learning model 1208 for ground reality labeling using the object's identity and location, including corrections determined by the interference engine 1206, as described above. Figure 12 As described, BEV image 600, along with the ground reality identities and locations of the objects, were passed to training dataset 1202.

[0075] At box 1312, the BEV image 600, ground reality labels, vehicle data, and original fisheye images 302, 304, 306, and 308 are output to the machine learning model 1204 for training via the training dataset 1202. The machine learning model is trained to output a new BEV image 600, which includes the correct identification and location of objects included in the new BEV image 600, as described above regarding... Figure 5 As described. After box 1312, process 1300 ends.

[0076] Figure 14 This is a flowchart of a process 1400 for operating a vehicle 110 based on adaptive training of a machine learning model 1204. Process 1400 can be implemented as hardware and software included in a server computer 120 to train the machine learning model 1204 and hardware and software included in a computing device 115 included in the vehicle 110. Process 1400 includes multiple boxes that can be executed in the order shown. Alternatively or additionally, process 1400 may include fewer boxes and may include boxes executed in a different order.

[0077] Process 1400 begins at box 1402, where a machine learning model 1204 is trained on server computer 120 based on adaptive learning, as described above. Figure 12 and Figure 13 As described.

[0078] At frame 1404, the trained machine learning model 1204 is transmitted to the computing device 115 included in the vehicle 110.

[0079] At frame 1406, computing device 115 acquires fisheye images 302, 304, 306, and 308 from sensors 116 included in vehicle 110. Machine learning model 1204 receives fisheye images 302, 304, 306, and 308 and outputs BEV image 600. BEV image 600 can be received by a second machine learning model included in computing device 115 to determine objects 630, including roads 602 and 604, curbs 606, 608, 610, and 612, buildings 614, 616, 618, 620, and 622, trees 624 and 626, and other objects such as vehicles, traffic signs, and traffic obstacles. Computing device 115 can determine a vehicle trajectory based on the objects 630 included in BEV image 600. Computing device 115 can control vehicle 110 to operate on the vehicle trajectory by transmitting commands to vehicle controllers 112, 113, and 114 to control vehicle components. After box 1406, process 1400 ends.

[0080] Any action taken by the vehicle or its user must comply with all rules and regulations specific to the vehicle's location (e.g., federal, state, national, city, etc.) and operation. More importantly, any actions disclosed herein are for illustrative purposes only. Depending on the context, situation, and applicable rules and regulations, certain actions may be modified or omitted. Furthermore, regardless of the action or determination, the user should exercise good judgment and common sense when operating the vehicle. That is, all actions (whether standard or "enhanced") should only be followed if it is appropriate to do so and in accordance with any rules and regulations specific to the vehicle's location and operation.

[0081] Computing devices such as those described herein typically each include commands that can be executed by one or more computing devices such as those identified above and are used to implement blocks or steps of the processes described above. For example, a process block described above can be embodied as a computer-executable command.

[0082] Computer-executable commands can be compiled or interpreted by computer programs created using a variety of programming languages ​​and / or technologies, including but not limited to single or combined forms of the following: Java™, C, C++, Python, Julia, SCALA, Visual Basic, JavaScript, Perl, HTML, etc. Typically, a processor (i.e., a microprocessor) receives commands from memory, computer-readable media, etc., and executes these commands to perform one or more processes, including those described herein. Such commands and other data can be stored in files and transferred using a variety of computer-readable media. Files in a computing device are typically collections of data stored on computer-readable media such as storage media, random access memory, etc.

[0083] Computer-readable media (also known as processor-readable media) include any non-transitory (i.e., tangible) medium that contributes to providing data (i.e., instructions) that can be read by a computer (i.e., by the computer's processor). Such media can take many forms, including but not limited to non-volatile and volatile media. Instructions can be transmitted via one or more transmission media, including optical fibers, wires, wireless communications, and internals that constitute a system bus coupled to the computer's processor. Common forms of computer-readable media include, for example, RAM, PROM, EPROM, FLASH-EEPROM, any other memory chip or magnetic tape, or any other medium from which a computer can read.

[0084] Unless otherwise expressly indicated herein, all terms used in the claims are intended to have the ordinary and common meaning as understood by those skilled in the art. In particular, unless the claims expressly limit the recitation to the contrary, the use of singular articles such as “a,” “the,” or “the” should be interpreted as one or more of the elements indicated by the recitation.

[0085] The term “exemplary” is used in this document to mean an example, that is, a candidate for “exemplary widget” should be interpreted as referring only to an example of a widget.

[0086] The adverb "approximately" when modifying a value or result indicates that the shape, structure, measurement, value, determination, calculation, etc., may deviate from the exact description of the geometry, distance, measurement, value, determination, calculation, etc. due to defects in materials, machining, manufacturing, sensor measurement, calculation, processing time, communication time, etc.

[0087] In the accompanying drawings, the same reference numerals indicate the same elements. Regarding the media, processes, systems, methods, etc., described herein, it should be understood that although the steps or blocks of such processes, etc., are described as occurring in a specific sequence, such processes can be practiced by performing the described steps in an order other than that described herein. It should also be understood that some steps may be performed simultaneously, other steps may be added, or some steps described herein may be omitted. In other words, the description of processes herein is provided for the purpose of illustrating certain embodiments and should in no way be construed as limiting the claimed invention.

[0088] According to the present invention, a system is provided comprising: a computer including a processor and a memory, the memory including instructions executable by the processor to: determine a depth map based on an image received by a machine learning model trained on a first training dataset; determine an object trajectory of an object included in the image; select an image based on determining interference between the object and the depth map according to the object trajectory; determine a second training dataset by adding the selected image to the first training dataset; and train the machine learning model with the second training dataset.

[0089] According to an example, a machine learning model determines the location of three-dimensional features in the depth map of the environment surrounding the object.

[0090] According to an embodiment, the depth map is formatted as cells arranged on a grid, wherein one or more of the cells are occupied cells, the cells are occupied by one or more three-dimensional features, and the height of one or more of the three-dimensional features relative to the ground plane exceeds a user-defined threshold.

[0091] According to an embodiment, interference is determined by the overlap between an object and one or more occupancy units at a location predicted based on the trajectory.

[0092] According to an embodiment, the trajectory of an object is determined based on sensor data from sensors included in the object.

[0093] According to an embodiment, adding the selected image to the first training dataset includes determining the label of the object.

[0094] According to an embodiment, the second machine learning model determines the label of the object.

[0095] According to an embodiment, the machine learning model is a generative adversarial network that includes an encoder, a decoder, and a discriminator.

[0096] According to an embodiment, the depth map includes portions of one or more of roads, curbs, buildings, and trees.

[0097] According to an embodiment, the object is a vehicle.

[0098] According to an embodiment, the invention is further characterized by a second computer, wherein the trained machine learning model is included in the second computer in the second vehicle, wherein the second computer is programmed to operate the second vehicle by determining the vehicle trajectory based on the output from the machine learning model.

[0099] According to an embodiment, the second computer is programmed to operate the second vehicle on the vehicle trajectory by manipulating vehicle components via a command controller.

[0100] According to the present invention, a method includes: determining a depth map based on an image received by a machine learning model trained on a first training dataset; determining an object trajectory of an object included in the image; selecting an image based on determining interference between the object and the depth map according to the object trajectory; determining a second training dataset by adding the selected image to the first training dataset; and training the machine learning model with the second training dataset.

[0101] In one aspect of the invention, a machine learning model determines the location of three-dimensional features in a depth map of the environment surrounding the object.

[0102] In one aspect of the invention, the depth map is formatted as cells arranged on a grid, wherein one or more of the cells are occupied cells, the cells being occupied by one or more three-dimensional features whose height relative to the ground plane exceeds a user-defined threshold.

[0103] In one aspect of the invention, interference is determined by the overlap between an object and one or more occupancy units at a location predicted based on a trajectory.

[0104] In one aspect of the invention, the trajectory of an object is determined based on sensor data from sensors included in the object.

[0105] In one aspect of the invention, adding the selected image to the first training dataset includes determining the label of the object.

[0106] In one aspect of the invention, a second machine learning model determines the label of an object.

[0107] In one aspect of the invention, the machine learning model is a generative adversarial network comprising an encoder, a decoder, and a discriminator.

Claims

1. A method comprising: Depth maps are determined based on images received by a machine learning model trained on a first training dataset; Determine the object trajectories of the objects included in the image; The image is selected based on the interference between the object and the depth map determined according to the object trajectory. The second training dataset is determined by adding the selected image to the first training dataset; as well as The machine learning model is trained using the second training dataset.

2. The method of claim 1, wherein the machine learning model determines the location of the three-dimensional features in the depth map of the environment surrounding the object.

3. The method of claim 2, wherein the depth map is formatted to be arranged on a grid, wherein one or more of the cells are occupied cells, the cells are occupied by one or more of the three-dimensional features, the height of the one or more of the three-dimensional features relative to the ground plane exceeding a user-defined threshold.

4. The method of claim 3, wherein the interference is determined by the overlap between the object and the one or more occupancy units at a location predicted based on the trajectory.

5. The method of claim 1, wherein the trajectory of the object is determined based on sensor data from sensors included in the object.

6. The method of claim 1, wherein adding the selected image to the first training dataset includes determining the label of the object.

7. The method of claim 1, wherein the second machine learning model determines the label of the object.

8. The method of claim 1, wherein the machine learning model is a generative adversarial network comprising an encoder, a decoder, and a discriminator.

9. The method of claim 1, wherein the depth map comprises portions of one or more of roads, curbs, buildings, and trees.

10. The method of claim 1, wherein the object is a vehicle.

11. The method of claim 1, further comprising a second computer, wherein the trained machine learning model is included in the second computer in the second vehicle, wherein the second computer is programmed to operate the second vehicle by determining a vehicle trajectory based on the output from the machine learning model.

12. The method of claim 11, wherein the second computer is programmed to operate the second vehicle on the vehicle trajectory by manipulating vehicle components via a command controller.

13. The method of claim 1, wherein adding the selected image to the first training dataset includes determining the location of the object.

14. The method of claim 1, wherein the second machine learning model determines the location of the object.

15. A method comprising a computer programmed to perform the method as claimed in any one of claims 1 to 14.