Trajectory prediction model training method, trajectory prediction method, and electronic device
By introducing a visual-language model-assisted training module into the trajectory prediction model, the problems of low accuracy and insufficient risk identification of trajectory prediction models under the end-to-end pure imitation learning method in the high-speed road environment are solved, achieving more accurate trajectory prediction and risk identification, and improving the prediction accuracy and safety of the model.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2026-05-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing trajectory prediction models trained using a pure imitation learning approach have low accuracy in high-speed road environments and lack a deep understanding of causal risks, making it difficult to distinguish between critical risk objects and non-risk backgrounds, resulting in low safety.
An auxiliary training module based on a vision-language model is introduced. The first risk information output by the trajectory prediction module is supervised and constrained by the second risk information, and the trajectory prediction model is trained in combination with the predicted trajectory to improve the trajectory prediction accuracy and risk identification capability of the model.
This improved the trajectory prediction accuracy and risk identification capability of the trajectory prediction model, making the predicted trajectory more closely match the needs of actual scenarios and enhancing the overall prediction accuracy and reliability of the model.
Smart Images

Figure CN122244639A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a training method for a trajectory prediction model, a trajectory prediction method, and an electronic device. Background Technology
[0002] Autonomous driving technology is an intelligent transportation technology that enables vehicles to automatically predict their trajectories without driver intervention, and then control the vehicle based on these predicted trajectories to achieve safe driving. In some scenarios, autonomous driving technology can be used to control vehicles operating in high-speed road environments.
[0003] In related technologies, servers can train a trajectory prediction model using an end-to-end pure imitation learning approach, minimizing the error between the predicted trajectory and a preset trajectory. This trained trajectory prediction model is then deployed on a vehicle, which uses it to obtain the predicted trajectory and control the vehicle for autonomous driving. However, the trajectory prediction accuracy of the model obtained through this training method is relatively low. Summary of the Invention
[0004] This application provides a training method for a trajectory prediction model, a trajectory prediction method, and an electronic device to improve the trajectory prediction accuracy of the trajectory prediction model.
[0005] This application provides a training method for a trajectory prediction model. The trajectory prediction model includes a trajectory prediction module and an auxiliary training module. The auxiliary training module is built based on a vision-language model and includes:
[0006] Determine multiple surround view images, status information, and driving commands of the sample vehicle at the sample time.
[0007] Multiple surround view images, status information, and driving commands are input into the trajectory prediction module to determine at least one first risk information and the predicted trajectory of the sample vehicle in the first time period; the first time period is a period with the sample time as the starting time and the duration as a preset duration.
[0008] Multiple panoramic images are input into the auxiliary training module to determine at least one second risk information;
[0009] The trajectory prediction model is trained based on at least one first risk information, at least one second risk information, and the predicted trajectory to obtain the trained trajectory prediction model.
[0010] This application provides a trajectory prediction method, including:
[0011] Determine multiple target surround view images, target status information, and target driving commands for the target vehicle at the current moment;
[0012] Multiple target surround view images, target status information, and target driving commands are input into the trajectory prediction model. The trajectory prediction model processes the multiple target surround view images, target status information, and target driving commands to obtain the target predicted trajectory of the target vehicle in the second time period. The trajectory prediction model is obtained by training the trajectory prediction model based on the training method and then removing the auxiliary training module. The second time period is a period with the current time as the starting time and a preset duration.
[0013] This application provides a training device for a trajectory prediction model. The trajectory prediction model includes a trajectory prediction module and an auxiliary training module. The auxiliary training module is built based on a vision-language model and includes:
[0014] The first determining module is used to determine multiple surround view images, status information and driving instructions of the sample vehicle at the sample time.
[0015] The first processing module is used to input multiple surround view images, status information and driving instructions into the trajectory prediction module to determine at least one first risk information and the predicted trajectory of the sample vehicle in the first time period; the first time period is a time period with the sample time as the starting time and the duration as a preset duration.
[0016] The second processing module is used to input multiple panoramic images into the auxiliary training module to determine at least one second risk information.
[0017] The training module is used to train the trajectory prediction model based on at least one first risk information, at least one second risk information, and the predicted trajectory to obtain the trained trajectory prediction model.
[0018] This application provides a trajectory prediction device, comprising:
[0019] The second determining module is used to determine multiple target surround view images, target status information and target driving instructions of the target vehicle at the current moment;
[0020] The prediction module is used to input multiple target surround view images, target status information and target driving commands into the trajectory prediction model. The trajectory prediction model processes the multiple target surround view images, target status information and target driving commands to obtain the target predicted trajectory of the target vehicle in the second time period. The trajectory prediction model is obtained by training the trajectory prediction model based on the training method and then removing the auxiliary training module. The second time period is a period with the current time as the starting time and a preset duration.
[0021] This application also provides an electronic device, comprising: a memory for storing a computer program; and a processor for executing the computer program to implement the training method and / or trajectory prediction method steps of the above-described trajectory prediction model.
[0022] This application also provides a computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, it implements the steps of the above-described trajectory prediction model training method and / or trajectory prediction method.
[0023] This application also provides a computer program product, including a computer program, wherein when the computer program is executed by a processor, it implements the training method and / or the steps of the trajectory prediction method described above.
[0024] The trajectory prediction model training method, trajectory prediction method, and electronic device provided in this application introduce an auxiliary training module based on a vision-language model. The second risk information is used to supervise and constrain the first risk information output by the trajectory prediction module, and the trajectory prediction model is trained together with the predicted trajectory. This not only improves the trajectory prediction accuracy of the trajectory prediction model, but also enables the trajectory prediction model to have a more accurate risk identification capability while predicting the trajectory. This makes the predicted trajectory more in line with the needs of actual scenarios and improves the overall prediction reliability and practicality of the trajectory prediction model. Attached Figure Description
[0025] To more clearly illustrate the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0026] Figure 1 This is a schematic diagram illustrating an application scenario provided in the embodiments of this application;
[0027] Figure 2 A flowchart illustrating a training method for a trajectory prediction model provided in an embodiment of this application;
[0028] Figure 3 A flowchart illustrating a process for determining at least one first risk information and a predicted trajectory, provided as an embodiment of this application;
[0029] Figure 4 A flowchart illustrating a process for determining at least one second risk information is provided in this application embodiment;
[0030] Figure 5 A flowchart illustrating the process of determining a trained trajectory prediction model, as provided in an embodiment of this application;
[0031] Figure 6 A schematic diagram illustrating the training of a trajectory prediction model provided in an embodiment of this application;
[0032] Figure 7A schematic diagram of trajectory prediction provided for an embodiment of this application;
[0033] Figure 8 A schematic diagram of the structure of a training device for a trajectory prediction model provided in an embodiment of this application;
[0034] Figure 9 This is a schematic diagram of the structure of a trajectory prediction device provided in an embodiment of this application;
[0035] Figure 10 A schematic diagram of the structure of the electronic device provided in this application. Detailed Implementation
[0036] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the protection scope of this application.
[0037] It should be noted that, in the description of this application, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. The terms "first," "second," etc., in this application are used to distinguish similar objects and are not used to describe a specific order or sequence.
[0038] Autonomous driving technology is an intelligent transportation technology that enables vehicles to automatically predict their trajectories without driver intervention, and then control the vehicle based on these predicted trajectories to achieve safe driving. In some scenarios, autonomous driving technology can be used to control vehicles operating in high-speed road environments.
[0039] In related technologies, servers can train trajectory prediction models based on end-to-end pure imitation learning by minimizing the error between predicted trajectories and preset trajectories, thereby obtaining trained trajectory prediction models. This allows the trajectory prediction models to learn the mapping from state space to action space, reducing information transmission losses in traditional modular processes.
[0040] However, highway scenarios are challenging operating environments for autonomous driving, demanding high robustness from trajectory prediction models. Highway scenarios are characterized by high vehicle speeds, short decision windows, highly dynamic environments, and stringent requirements for prediction accuracy.
[0041] Specifically, the physical characteristics of vehicles traveling on highways determine the complexity of autonomous driving systems. For example, when driving on urban roads, vehicles travel at lower speeds, allowing for more time for execution and decision-making. However, on highways, vehicles typically travel at speeds between 80 km / h and 120 km / h. According to the principles of physics, the square of the speed is proportional to the braking distance; that is, the braking distance increases geometrically with increasing vehicle speed. For instance, at a speed of 100 km / h, the driver's reaction time and the vehicle's own braking delay compress the decision-making time to an extreme degree.
[0042] Furthermore, vehicles traveling on highways need to effectively avoid long-tail risk events. Long-tail risk events refer to events that occur very infrequently but possess extremely high uncertainty and danger. In some embodiments, long-tail risk events may include, for example:
[0043] High-speed lane cutting: A vehicle in an adjacent lane suddenly changes lanes and cuts in at extremely high relative speed without using its turn signal or being too close. This behavior places millisecond-level reaction requirements on the target vehicle's acceleration and lateral control.
[0044] Chain reaction and ghosting: When the view ahead is blocked by a large vehicle, if the vehicle in front suddenly brakes or there is a falling object, the target vehicle is required to make a complex risk assessment and braking decision based on the trend of environmental changes and common sense in a very short time, even with insufficient perception information.
[0045] Extreme Weather and Low Visibility Decision Making: Dense fog, heavy rain, or nighttime glare can severely degrade sensor performance. The autonomous driving system of the target vehicle needs to maintain safety strategies based on its deep understanding of traffic rules and environmental common sense, even with imperfect perception inputs, which exceeds the generalization scope of traditional data-driven models.
[0046] Therefore, a vehicle's high-speed autonomous driving system must have advanced, risk-based predictive capabilities, rather than simple geometric obstacle avoidance, in order to complete the entire process from perception and risk assessment to trajectory planning within an extremely short decision window.
[0047] However, the essence of training a trajectory prediction model in the above way is to teach it the mapping from the state space to the action space. While the trained trajectory prediction model can directly map from sensors to driving behavior, it is highly dependent on the coverage of the training data. If, during actual operation, the vehicle's trajectory deviates from the training dataset due to minor errors in perception or control, the autonomous driving system enters a state "outside the data distribution." This deviation can lead to serious safety consequences, especially on highways. Specifically, the trained trajectory prediction model cannot accurately predict the trajectory for novel current states, and the resulting trajectory error will rapidly accumulate and amplify over time, leading to low accuracy in trajectory prediction.
[0048] For example, assuming the target vehicle is traveling at 120 km / h on a highway, a steering error of 1 degree can lead to a serious deviation from the trajectory within seconds, or even run out of the lane or cause a collision.
[0049] At the same time, due to the limitations of the training dataset, the autonomous driving system of the target vehicle cannot determine the accurate predicted trajectory for long-tail risk events such as accident debris that is rare on highways and driving on low-friction surfaces in extreme weather.
[0050] Furthermore, the trajectory prediction model trained using the aforementioned pure imitation learning method lacks a deep understanding of causal risks. For example, the trained trajectory prediction model only knows "in state X, perform action Y," but does not understand "in state X, the purpose of performing action Y is to avoid risk Z." This makes it difficult for the trained trajectory prediction model to distinguish between key risk objects and non-risk backgrounds in high-risk scenarios.
[0051] For example, the trained trajectory prediction model has difficulty distinguishing the risk level of "a vehicle traveling at 110 km / h with its steering wheel locked" from that of "a stationary vehicle with its lights on" in a highway scenario.
[0052] Therefore, the trajectory prediction accuracy of the trajectory prediction model trained based on the end-to-end pure imitation learning method is low.
[0053] Given the superior capabilities of visual-language models in image understanding, commonsense reasoning, and zero-shot generalization, they can be combined to train trajectory prediction models. For example, visual-language models can generate qualitative analyses and causal assessments of the risk levels of key traffic participants in highway scenarios based on image context and language instructions. For instance, the output of a visual-language model could be: "This white car is accelerating and changing lanes, violating the safe distance rule, and posing a high risk."
[0054] This high-dimensional, common-sense reasoning ability is lacking in the aforementioned end-to-end pure imitation learning training methods. Therefore, in some related technologies, the server can train the trajectory prediction model using a hybrid approach of visual-language models and end-to-end models, or a hybrid approach of visual-language models and visual-action schemes, to obtain the trained trajectory prediction model.
[0055] The trained trajectory prediction model can be improved by using high-level abstract instructions provided by a vision-language model, an end-to-end modular planner, or a vision-action scheme to execute vehicle control.
[0056] However, directly integrating the visual-language model into the end-to-end control loop increases the computational load and inference latency of the target vehicle's autonomous driving system. Specifically, visual-language models typically require inference time in the hundreds of milliseconds or even seconds, while the decision-making time of autonomous driving technology during high-speed driving on highways is extremely short, generally less than 100 milliseconds. Therefore, the inference time of the visual-language model far exceeds the millisecond-level real-time inference requirements during high-speed driving.
[0057] In this way, directly integrating the vision-language model into the end-to-end control loop will cause the autonomous driving system to be unable to respond to sudden high-risk events in a timely manner, resulting in lower safety of the target vehicle.
[0058] Furthermore, the visual-language model outputs high-level semantic guidance (e.g., "The current risk is high, please drive cautiously"), while the end-to-end planning module requires low-level dynamic control accurate to the millimeter level (e.g., acceleration curves and steering angles for the next 2 seconds). Therefore, in the trajectory prediction model trained using a hybrid approach based on the visual-language model and the end-to-end model, the end-to-end planning module cannot understand the semantic guidance output by the visual-language model, resulting in low trajectory prediction accuracy.
[0059] Furthermore, training a visual-language model requires extensive image-text alignment annotations, and even when using a trained visual-language model for zero-shot inference, complex prompts are needed to ensure that the risk assessments output by the visual-language model align with the safety standards for autonomous driving. Therefore, training a trajectory prediction model using the above methods is costly.
[0060] Based on this, this application provides a training method for a trajectory prediction model. By introducing an auxiliary training module based on a vision-language model, the first risk information output by the trajectory prediction module is supervised and constrained by the second risk information, and the trajectory prediction model is trained together with the predicted trajectory. This not only improves the trajectory prediction accuracy of the trajectory prediction model, but also enables the trajectory prediction model to have a more accurate risk identification capability while predicting the trajectory, making the predicted trajectory more in line with the needs of actual scenarios, and improving the overall prediction accuracy and reliability of the trajectory prediction model.
[0061] To enable those skilled in the art to better understand the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0062] The specific application environment architecture or hardware architecture upon which the training method for the trajectory prediction model depends is described here. (References) Figure 1 , Figure 1 This is a schematic diagram illustrating an application scenario provided in an embodiment of this application. For example... Figure 1 As shown, it includes a server 11 and a target vehicle 12, wherein the target vehicle 12 is equipped with an autonomous driving system.
[0063] In practical applications, server 11 can train the trajectory prediction model to obtain a trained trajectory prediction model, and then deploy the trained trajectory prediction model in the autonomous driving system of target vehicle 12. During the driving process, the autonomous driving system can predict the trajectory in the future based on the trained trajectory prediction model, and control the target vehicle 12 to achieve autonomous driving based on the trajectory in the future.
[0064] It should be noted that the execution subject in each embodiment of this application can be a processor, microprocessor, or a device integrating the aforementioned processor or microprocessor, such as a terminal device. The specific execution subject in each embodiment of this application is not limited and can be selected and set according to actual needs. In the following embodiments, a terminal device integrating the aforementioned processor or microprocessor is used as an example for description, which does not constitute a limitation on the actual execution subject.
[0065] It should be noted that, Figure 1 This is merely an example to illustrate one application scenario, and is not intended to limit the application scenario.
[0066] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.
[0067] Figure 2 This is a flowchart illustrating a training method for a trajectory prediction model provided in an embodiment of this application. Figure 2 As shown, the method may include the following steps:
[0068] S201. Determine multiple surround view images, status information, and driving instructions of the sample vehicle at the sample time.
[0069] A sample vehicle refers to a specific vehicle used for training the trajectory prediction model, equipped with the same hardware configuration as a real autonomous vehicle. For example, the sample vehicle is equipped with at least one surround-view camera, speed sensor, and acceleration sensor. Various data generated during the sample vehicle's operation (surround-view images, status information, and driving commands, etc.) are recorded and stored as training samples.
[0070] A sample time point refers to a specific point in time within the training set of the trajectory prediction model. In some embodiments, the sample time point can be recorded with millisecond-level precision.
[0071] The surround-view images were acquired by multiple surround-view cameras deployed on the sample vehicle. It should be noted that these cameras are typically positioned at the front, rear, and side mirrors of the vehicle. The acquired surround-view images include key visual information about the sample vehicle in a highway scenario, such as lane lines, adjacent lane lines, obstacles ahead, road signs, and weather conditions. These multiple surround-view images provide visual image data of the sample vehicle's 360-degree surrounding environment.
[0072] State information refers to the set of operating parameters of the sample vehicle at the sample time, reflecting the current motion state of the sample vehicle. State information may include parameters such as wheel speed, acceleration, and historical trajectory of the sample vehicle at the sample time. Specifically, the wheel speed can be collected by wheel speed sensors deployed on the sample vehicle, the acceleration can be collected by acceleration sensors deployed on the sample vehicle, and the historical trajectory can be determined by the motion control system of the sample vehicle.
[0073] Driving instructions are digital commands used to guide the driving intentions of a sample vehicle. For example, driving instructions can be sent by the driver via voice, keypad, central control screen, etc., such as commands to go straight, change lanes, overtake, and decelerate.
[0074] S202. Input multiple surround view images, status information and driving instructions into the trajectory prediction module to determine at least one first risk information and the predicted trajectory of the sample vehicle in the first time period; the first time period is a time period with the sample time as the starting time and the duration as the preset duration.
[0075] The trajectory prediction model is specifically designed for autonomous driving scenarios to achieve risk identification and trajectory prediction for sample vehicles. In some embodiments, the trajectory prediction model consists of two core units: a trajectory prediction module and an auxiliary training module.
[0076] The trajectory prediction module is the core execution module of the trajectory prediction model, used to identify risk events present in the sample vehicle at the sample time and predict the trajectory of the sample vehicle in future time periods. The auxiliary training module is the auxiliary training unit for the trajectory prediction model. In some embodiments, the auxiliary training module is built based on a visual-language model, which is a large-scale artificial intelligence model that combines visual image understanding and natural language reasoning capabilities, possessing powerful common-sense reasoning and zero-shot generalization abilities. It should be noted that the visual-language model used in this application is a pre-trained model and does not require fine-tuning for autonomous driving scenarios.
[0077] The first risk information is risk-related information generated by the trajectory prediction module based on the input data. In some embodiments, the first risk information includes risk events identified by the trajectory prediction module and the corresponding risk scores for those risk events.
[0078] The first time period is a specific time interval within which the trajectory prediction module plans and predicts trajectories for sample vehicles. In some embodiments, the first time period is a prediction time domain set for autonomous driving trajectory prediction on highways, and its time range is adapted to the decision window of highways.
[0079] The predicted trajectory is the driving trajectory planned by the trajectory prediction module for the sample vehicle within the first time period. The predicted trajectory may include specific driving parameters of the sample vehicle at multiple moments within the first time period, such as position, speed, steering angle, and acceleration.
[0080] In some embodiments, the server can input multiple surround view images, status information, and driving instructions into the trajectory prediction model, and process the multiple surround view images, status information, and driving instructions through the trajectory prediction model to obtain at least one first risk information and a predicted trajectory.
[0081] S203. Input multiple panoramic images into the auxiliary training module to determine at least one second risk information.
[0082] The second risk information is structured risk information generated by an auxiliary training module constructed from a pre-trained vision-language model after feature processing of multiple surround view images of the sample vehicle at a sample time. In some embodiments, the second risk information includes risk events identified by the auxiliary training module and risk scores for those risk events.
[0083] In some embodiments, the server can input multiple surround view images into the auxiliary training module, and the auxiliary training module can process the multiple surround view images covering the 360-degree surrounding environment of the vehicle body to obtain at least one second risk information.
[0084] S204. Train the trajectory prediction model based on at least one first risk information, at least one second risk information, and the predicted trajectory to obtain the trained trajectory prediction model.
[0085] A trained trajectory prediction model refers to a model that has undergone constrained training and parameter convergence optimization using at least one first risk information, at least one second risk information, and a predicted trajectory. In some embodiments, the trained trajectory prediction model possesses stable risk assessment and trajectory prediction capabilities, and can accurately output corresponding risk information and reasonable predicted trajectories based on input data.
[0086] In some embodiments, the server can determine the risk prediction capability of the trajectory prediction model based on at least one first risk information and at least one second risk information, determine the trajectory prediction capability of the trajectory prediction model based on the predicted trajectory, and then adjust the parameters of the trajectory prediction model based on the risk prediction capability and the trajectory prediction capability to obtain the trained trajectory prediction model.
[0087] exist Figure 2 In the embodiment shown, by introducing an auxiliary training module based on a vision-language model, the first risk information output by the trajectory prediction module is supervised and constrained by the second risk information, and the trajectory prediction model is trained together with the predicted trajectory. This not only improves the trajectory prediction accuracy of the trajectory prediction model, but also enables the trajectory prediction model to have more accurate risk identification capabilities while predicting the trajectory, making the predicted trajectory more in line with the needs of the actual scenario, and improving the overall prediction accuracy and reliability of the trajectory prediction model.
[0088] exist Figure 2 Based on the illustrated embodiment, the following, in conjunction with Figure 3 The method for determining at least one primary risk information and the predicted trajectory of the sample vehicle in the first time period is further explained.
[0089] Figure 3 This is a schematic diagram illustrating a process for determining at least one first risk information and a predicted trajectory, provided as an embodiment of this application. Figure 3As shown, the process may include the following steps:
[0090] S301. The trajectory prediction module processes multiple panoramic images to determine the panoramic image features of the sample vehicle at the sample time. The panoramic image features are used to indicate the environmental layout around the sample vehicle, the dynamic status of traffic participants, and risk association information at the sample time.
[0091] Panoramic image features are abstract feature data output by the trajectory prediction module after processing the surround-view image; they represent the overall environment surrounding the sample vehicle digitally. In some embodiments, panoramic image features include the environmental layout around the sample vehicle, the dynamic state of traffic participants, and risk-related information, providing a basis for subsequent risk assessment and trajectory prediction. Panoramic image features can be, for example, bird's-eye view feature vectors.
[0092] In some embodiments, the trajectory prediction module includes a feature encoding submodule. The feature encoding submodule is used to project and fuse features from multiple panoramic images into a unified bird's-eye view space to obtain panoramic image features.
[0093] In some embodiments, the trajectory prediction module can input multiple panoramic images into the feature encoding submodule, and the feature encoding submodule processes the multiple panoramic images to obtain panoramic image features.
[0094] S302. Based on the features of the panoramic image, determine at least one first risk information.
[0095] In some embodiments, the trajectory prediction module includes a risk prediction submodule. The risk prediction submodule can, for example, be built based on a Transformer model. The risk prediction submodule can utilize the feature extraction and correlation reasoning capabilities of the Transformer model to process the environmental layout around the sample vehicle, the dynamic state of traffic participants, and risk correlation information in the panoramic image features to determine at least one risk event. Then, for each of the at least one risk event, the risk prediction submodule determines a risk score for that risk event based on the panoramic image features.
[0096] In some embodiments, the server determines at least one first risk information in the following manner: inputting panoramic image features into a risk prediction submodule, processing the panoramic image features through the risk prediction submodule, and determining at least one first risk event and at least one first risk score; wherein, at least one first risk event and at least one first risk score are in one-to-one correspondence; performing structured processing on at least one first risk event and at least one first risk score to determine at least one first risk information.
[0097] The first risk event is a specific risk situation or risk target faced by the sample vehicle in the current scenario, identified and determined by the risk prediction submodule within the trajectory prediction module based on panoramic image features. It should be noted that in at least one first risk event, each first risk event corresponds to an independent risk situation in the actual driving environment.
[0098] The first risk score is a quantitative value calculated by the risk prediction submodule for each first risk event, combining environmental layout, traffic participant dynamics, and risk correlation information from the panoramic image features. It is used to measure the degree of risk of the corresponding first risk event. In some embodiments, the first risk score can be expressed as a percentage, and the larger the first risk score, the higher the degree of risk of the corresponding first risk event; the smaller the first risk score, the lower the degree of risk of the corresponding first risk event.
[0099] In some embodiments, after the server inputs panoramic image features into the risk prediction submodule, the risk prediction submodule processes the panoramic image features to determine at least one first risk event. Then, for each first risk event, the risk prediction submodule combines the environmental layout, traffic participant dynamics, and risk association information in the panoramic image features to determine a first risk score for that first risk event.
[0100] In some embodiments, to facilitate the unified processing, quantitative comparison, and loss calculation of at least one first risk event and at least one first risk score by the trajectory prediction model, at least one first risk event and at least one first risk score can be structured to determine at least one first risk information.
[0101] In this way, by setting a risk prediction submodule in the trajectory prediction module, the panoramic image features are processed to obtain a one-to-one corresponding first risk event and first risk score, and then structured to form first risk information. This enables the organic combination of risk event identification and risk degree quantitative assessment, making the expression of risk information more standardized and accurate.
[0102] S303. Determine the predicted trajectory based on panoramic image features, at least one primary risk information, status information, and driving instructions.
[0103] In some embodiments, the trajectory prediction module includes a trajectory planning submodule. The trajectory planning submodule is the functional unit within the trajectory prediction module responsible for generating the future driving trajectory of the sample vehicle. The trajectory planning submodule can receive inputs such as panoramic image features, at least one first risk information, state information, and driving instructions. Through comprehensive analysis and calculation of the environment, risks, vehicle's own state, and driving intentions, it outputs a predicted trajectory that conforms to safety constraints and driving objectives, providing the final trajectory decision result for the trajectory prediction model.
[0104] In some embodiments, the server determines the predicted trajectory as follows: panoramic image features, at least one first risk information, status information, and driving instructions are input into the trajectory planning submodule, and the trajectory planning submodule processes the panoramic image features, first risk information, status information, and driving instructions to determine the predicted trajectory.
[0105] In some embodiments, since panoramic image features can reflect the surrounding environment layout and traffic participant information of the sample vehicle, at least one first risk information can reflect the surrounding risk situation and risk level of the sample vehicle, state information can reflect the sample vehicle's own motion state, and driving instructions can reflect the sample vehicle's driving intention, the panoramic image features, at least one first risk information, state information, and driving instructions are input into the trajectory planning submodule. The trajectory planning submodule can combine environmental information, risk information, vehicle state, and driving intention to determine a predicted trajectory that meets the requirements of safe driving and the driving intention.
[0106] In this way, by setting a trajectory planning submodule in the trajectory prediction module, the panoramic image features, first risk information, status information and driving command multi-source information are input and fused together. This allows for the full integration of environmental perception, risk assessment, vehicle status and driving intention during the trajectory planning process, resulting in a safer predicted trajectory output that better meets actual driving needs and improving the rationality and reliability of trajectory prediction.
[0107] exist Figure 3 In the illustrated embodiment, the trajectory prediction module is divided into a panoramic image feature extraction submodule, a risk prediction submodule, and a trajectory planning submodule. First, panoramic image features containing environmental, dynamic target, and risk information are obtained based on the surround view image. Then, the risk prediction submodule outputs a structured first risk event and a first risk score. Finally, the panoramic image features, risk information, vehicle status information, and driving instructions are combined to generate a predicted trajectory. This allows the trajectory planning process to fully integrate environmental perception, risk identification, vehicle status, and driving intention, thereby improving the accuracy, safety, and reliability of trajectory prediction.
[0108] Based on the above embodiments, the following is combined with Figure 4 The process of determining at least one second risk information in the embodiments of this application will be further explained.
[0109] Figure 4 This is a schematic diagram illustrating a process for determining at least one second risk information, provided as an embodiment of this application. Figure 4 As shown, the process may include the following steps:
[0110] S401. Determine the target prompt word corresponding to the sample vehicle at the sample time from multiple preset prompt words; the target prompt word is used to guide the auxiliary training module to determine the risk information of the sample vehicle at the sample time according to preset analysis rules, preset evaluation dimensions and preset output format.
[0111] Multiple preset prompts are natural language instructions designed and written in advance before model training. They are specifically used to guide the visual-language model to perform specific tasks such as risk identification, risk assessment, and output of structured results. Each prompt corresponds to a fixed analysis logic, evaluation criteria, and output format, and is a key instruction to assist the training module in risk reasoning.
[0112] The target prompt is the core instruction ultimately selected from multiple preset prompts, specifically used to guide the auxiliary training module in risk identification at the sample time. In some embodiments, the target prompt instructs the auxiliary training module to determine the risk information of the sample vehicle at the sample time according to preset analysis rules, preset evaluation dimensions, and preset output format. Therefore, the target prompt determines how the auxiliary training module interprets the surround view image, how it assesses the risk, and in what format it outputs the results.
[0113] In some embodiments, the server stores multiple prompt words for different high-speed driving scenarios, risk types, and other situations. At a given time, the trajectory prediction model can first perform scene recognition and risk type judgment on the surround view image or full-scene image features of the current sample vehicle, and determine the target prompt word suitable for the current scenario from multiple preset prompt words. In this way, the task instruction of the target prompt word can be highly adapted to the scenario at the current time, thereby guiding the auxiliary training module to output accurate risk information.
[0114] S402. Input the target prompt and multiple panoramic images into the auxiliary training module to obtain at least one second risk information.
[0115] In some embodiments, the server obtains at least one second risk information in the following manner: feature processing is performed on multiple panoramic images using an auxiliary training module to determine image feature information of the multiple panoramic images; semantic processing is performed on target prompt words using an auxiliary training module to determine semantic information of the target prompt words; the image feature information is processed based on the semantic information using an auxiliary training module to determine at least one second risk event and at least one second risk score; wherein at least one second risk event and at least one second risk score are in one-to-one correspondence, and at least one second risk event includes at least one first risk event; the at least one second risk event and at least one second risk score are structured using an auxiliary training module to determine at least one second risk information.
[0116] Image feature information is the digital feature extracted from multiple panoramic images by the auxiliary training module. It is used to represent the visual feature information such as environment, target, position, and shape in multiple panoramic images.
[0117] The semantic information of the target prompt is the instruction information obtained by the auxiliary training module after semantic parsing the target prompt. It includes the analysis rules, evaluation dimensions, and output format specified by the prompt, and is used to guide the processing direction of image features. In some embodiments, the semantic information of the target prompt can be represented in the form of text tokens that the auxiliary training module can understand.
[0118] The second risk event is a risk situation or risk target identified by the auxiliary training module, and is a qualitative description of a dangerous situation existing in the current scenario. In some embodiments, the second risk event can be used as supervisory reference information during the training of the trajectory prediction model.
[0119] The second risk score is a quantitative value provided by the auxiliary training module for the second risk event. For each of the at least one second risk events, the second risk score indicates the degree of risk of the second risk event. In some embodiments, the second risk score may be expressed as a percentage, and a larger second risk score indicates a higher degree of risk for the second risk event; a smaller second risk score indicates a lower degree of risk for the second risk event.
[0120] In some embodiments, the auxiliary training module includes an image encoding submodule. The image encoding submodule performs adaptive encoding processing on multiple panoramic images, transforming pixel-level panoramic image data into high-dimensional image feature vectors that the auxiliary training module can recognize and process. Simultaneously, the image encoding submodule can remove redundant pixel noise from the panoramic images, thereby obtaining image feature information.
[0121] In some embodiments, the auxiliary training module can input multiple panoramic images into the image coding submodule, and the image coding submodule processes the multiple panoramic images to obtain image feature information.
[0122] In some embodiments, the auxiliary training module includes a text encoding submodule. The text encoding submodule is used to encode and convert the target prompt words, segmenting and vectorizing the natural language form of the target prompt words (such as preset analysis rules, preset evaluation dimensions, and preset output formats). Then, the text encoding submodule uses an encoding network to convert these into text units that the auxiliary training module can parse. These text units not only preserve the semantic logic and instruction requirements of the target prompt words, but also enable cross-modal feature fusion with the image feature information output by the image encoding submodule in the auxiliary training module.
[0123] In some embodiments, the auxiliary training module can input the target prompt word into the text encoding submodule, and process the target prompt word through the text encoding submodule to obtain the semantic information of the target prompt word.
[0124] In some embodiments, the auxiliary training module may further include a visual-language model submodule. The visual-language model submodule can leverage its pre-trained massive amounts of visual and linguistic knowledge to achieve cross-modal feature fusion and semantic reasoning. Specifically, the visual-language model submodule can comprehensively analyze the scene represented by the image feature information according to the semantic information of the target prompt, identify and label traffic participants, obstacles, and other targets in the scene that pose a collision risk to the sample vehicle, thereby determining at least one second risk event. Then, the visual-language model submodule combines the physical characteristics of the sample vehicle's operating scene (such as relative speed, distance to the vehicle, behavioral intent, and collision probability) to determine the second risk score for each of the at least one second risk event.
[0125] In some embodiments, the second risk event and the second risk score are made into a standardized, unified format that facilitates the trajectory prediction model to perform risk loss calculation and comparative learning. The auxiliary training module can perform structured processing on at least one second risk event and at least one second risk score to determine at least one second risk information.
[0126] exist Figure 4 In the embodiment shown, the auxiliary training module is guided by first selecting target prompt words that match the current scene, and then performing feature processing and semantic processing on multiple surround view images and target prompt words respectively, thereby obtaining a one-to-one corresponding second risk event and second risk score, and performing structured processing to form second risk information, so that the auxiliary training module can output standardized, accurate and matching supervision information that matches the risk identification results of the trajectory prediction module.
[0127] Based on the above embodiments, the following is combined with Figure 5 The process of training the trajectory prediction model to obtain the trained trajectory prediction model in the embodiments of this application will be further explained.
[0128] Figure 5 This is a schematic diagram illustrating a process for determining a trained trajectory prediction model, as provided in an embodiment of this application. Figure 5 As shown, the process may include the following steps:
[0129] S501. Determine the risk loss value of the trajectory prediction model based on at least one first risk information and at least one second risk information.
[0130] The risk loss value is a numerical value calculated from the difference between the first risk information output by the trajectory prediction model and the second risk information output by the auxiliary training module. It is used to measure the deviation between the risk prediction result of the trajectory prediction module and the second risk information, which serves as a supervision standard. This loss value can guide the trajectory prediction model to adjust its parameters during training, continuously reduce the risk judgment error, and improve the model's risk identification accuracy.
[0131] In some embodiments, the server determines the risk loss value of the trajectory prediction module as follows: Based on at least one first risk event and at least one second risk event, determine the event prediction loss of the trajectory prediction module; the event prediction loss is used to indicate the similarity between at least one first risk event and at least one second risk event; for each of the at least one first risk event, determine the loss value of the first risk event based on the first risk score of the first risk event and the second risk score of the second risk event corresponding to the first risk event; determine the risk loss value based on the event prediction loss and the respective loss values of the at least one first risk event.
[0132] Event prediction loss is a loss specifically calculated to determine whether risk events are accurately identified. It measures the similarity between the first risk event identified by the trajectory prediction module and the second risk event identified by the auxiliary training module. A larger event prediction loss indicates a lower similarity between at least one first risk event and at least one second risk event; conversely, a smaller event prediction loss indicates a higher similarity between at least one first risk event and at least one second risk event.
[0133] The loss value for a first risk event is an error value calculated based on the score difference of a single first risk event. It reflects the gap between the first risk score of the first risk event and the second risk score of the corresponding second risk event. Specifically, a larger loss value for a first risk event indicates a larger gap between the first risk score and the second risk score of the corresponding second risk event; conversely, a smaller loss value indicates a smaller gap.
[0134] In some embodiments, at least one first risk event output by the trajectory prediction module is aligned, matched, and compared one by one with at least one second risk event serving as a monitoring criterion, by calculating the degree of difference between the two in terms of the risk event's category, quantity, location, or content. Then, based on the mapping relationship between the degree of difference and the loss value, the event prediction loss is determined.
[0135] In some embodiments, for each of the at least one first risk event, the absolute value of the difference between the first risk score of the first risk event and the second risk score of the second risk event corresponding to the first risk event can be determined as the loss value of the first risk event.
[0136] In some embodiments, the risk loss value used to supervise the training trajectory prediction model can be determined based on a weighted calculation of the event prediction loss and the loss value of at least one first risk event.
[0137] S502. Based on the predicted trajectory and the preset trajectory, determine the prediction loss value of the trajectory prediction model.
[0138] The prediction loss value measures the deviation between the predicted trajectory generated by the trajectory prediction model and the preset trajectory, and can intuitively reflect the accuracy of the trajectory prediction model in generating predicted trajectories in trajectory generation tasks. In some embodiments, a larger prediction loss value indicates that the trajectory prediction model generates predicted trajectories with lower accuracy; a smaller prediction loss value indicates that the trajectory prediction model generates predicted trajectories with higher accuracy.
[0139] In some embodiments, the predicted trajectory and the preset trajectory can be compared point by point in dimensions such as trajectory point position, driving direction, and path shape. The error value between the two can be calculated and determined as the prediction loss value.
[0140] S503. Determine the total loss value of the trajectory prediction model based on the risk loss value and the predicted loss value.
[0141] The total loss value reflects the overall error of the trajectory prediction model in terms of both risk identification accuracy and trajectory prediction accuracy. A larger total loss value indicates a larger overall error in both aspects of the trajectory prediction model; conversely, a smaller total loss value indicates a smaller overall error in both aspects.
[0142] In some embodiments, the sum of the wind sorting loss value and the predicted loss value can be determined as the total loss value.
[0143] S504. Adjust the model parameters of the trajectory prediction model according to the total loss value to obtain the trained trajectory prediction model.
[0144] In some embodiments, the model parameters inside the trajectory prediction model can be updated and corrected through optimization methods such as backpropagation based on the calculated total loss value, so as to continuously reduce the total loss value, thereby continuously reducing the error of the model in risk identification and trajectory prediction, and finally obtaining a trained trajectory prediction model with higher accuracy and more stable performance.
[0145] exist Figure 5 In the illustrated embodiment, by calculating the risk loss value and prediction loss value of the trajectory prediction model separately and fusing them to obtain the total loss value, the model parameters are adjusted based on the total loss value. At the same time, the event prediction loss and the scoring loss of each risk event are further combined into the risk loss value. This allows for joint optimization of the model from two dimensions: risk identification accuracy and trajectory prediction precision. As a result, the trained trajectory prediction model has both more accurate risk judgment ability and more reliable trajectory generation ability, thereby improving the overall risk identification ability and trajectory prediction accuracy of the trajectory prediction model.
[0146] Based on the above embodiments, the following is combined with Figure 6 The training process of the trajectory prediction model provided in the embodiments of this application will be further explained. Figure 6 This is a schematic diagram illustrating the training of a trajectory prediction model provided in an embodiment of this application. Figure 6 As shown, the trajectory prediction model includes a trajectory prediction module and an auxiliary training module. The trajectory prediction module includes a feature encoding submodule, a risk prediction submodule, a latent space submodule, and a trajectory planning submodule. The auxiliary training module includes an image encoding submodule, a file encoding submodule, and a vision-language model submodule.
[0147] In some embodiments, the server can acquire multiple surround-view images using multi-directional surround-view cameras deployed on the vehicle body, and input these images into a feature encoding submodule and an image encoding submodule, respectively. The feature encoding submodule, relying on a feature fusion algorithm running on the onboard computing hardware, transforms the surround-view images into unified panoramic image features. Then, the server inputs these panoramic image features into a risk prediction submodule, which uses the Transformer's attention mechanism to identify at least one first risk event and its respective first risk score within the panoramic image features. The server then inputs the at least one first risk event and its corresponding first risk score into a latent space submodule, which performs structured processing on these elements to obtain at least one first risk information.
[0148] Then, the server determines the state information and driving instructions of the sample vehicle at the sample time, and inputs the state information, driving instructions, at least one first risk information, and panoramic image features into the trajectory planning submodule. The trajectory planning submodule uses the Transformer's attention mechanism to identify and generate a predicted trajectory under the driver's intention and safety constraints.
[0149] In addition, after inputting multiple panoramic images into the image encoding submodule, the image encoding submodule encodes the multiple panoramic images and converts them into image feature information that can be processed by the auxiliary training module.
[0150] The server determines the target prompt word from multiple preset prompt words and inputs it into the text encoding submodule. The text encoding submodule encodes the target prompt word, converting it into semantic information understandable by the auxiliary training module. Then, the image feature information and semantic information are input into the vision-language model submodule, which processes the image feature information and semantic information to obtain at least one second risk information.
[0151] The server determines the risk loss value based on at least one second risk information and at least one first risk information; determines the predicted loss value based on the predicted trajectory and the preset trajectory; and determines the total loss value based on the risk loss value and the predicted loss value.
[0152] Finally, the server adjusts the model parameters of the trajectory prediction model based on the total loss value to obtain the trained trajectory prediction model.
[0153] The above embodiments mainly describe the process of training a trajectory prediction model to obtain the trained trajectory prediction model. In some embodiments, the trained trajectory prediction model can be deployed in the autonomous driving system of the target vehicle, and the trajectory prediction model can be used to predict the target vehicle's trajectory in the second time period, and the target vehicle can be controlled to operate based on the predicted trajectory.
[0154] Specifically, multiple target surround view images, target status information, and target driving commands of the target vehicle at the current moment can be determined. These images, commands, and instructions are then input into a trajectory prediction model. The model processes these images, commands, and instructions to obtain the target vehicle's predicted trajectory in the second time period. The trajectory prediction model is obtained by training the trajectory prediction model using a training method and then removing the auxiliary training module. The second time period is a time period starting from the current moment and lasting for a preset duration.
[0155] The target vehicle refers to the vehicle whose future driving trajectory needs to be predicted, and it is the main object of the entire trajectory prediction process.
[0156] The target surround view image is acquired by multiple surround view cameras deployed on the target vehicle. These cameras are typically positioned at the front, rear, and side mirrors of the vehicle. The acquired image includes key visual information about the target vehicle in a highway scenario, such as lane lines, adjacent lane lines, obstacles ahead, road signs, and weather conditions. The multiple surround view images provide visual image data of the target vehicle's 360-degree surrounding environment.
[0157] Target state information refers to the set of operating parameters of the target vehicle at the current moment, reflecting the current motion state of the target vehicle. State information may include parameters such as the target vehicle's wheel speed, acceleration, and historical trajectory at the current moment. Specifically, the target vehicle's wheel speed can be collected by wheel speed sensors deployed on the target vehicle, acceleration can be collected by acceleration sensors deployed on the target vehicle, and the historical trajectory can be determined by the target vehicle's motion control system.
[0158] Target driving instructions are digital instructions used to guide the driving intentions of a target vehicle. Driving instructions can be given by the driver via voice, keypad, central control screen, etc., such as instructions to go straight, change lanes, overtake, and decelerate.
[0159] The target predicted trajectory is the output of the trajectory prediction model, representing the predicted driving path of the target vehicle in the second future time period.
[0160] The auxiliary training module is an auxiliary training unit for the trajectory prediction model. In some embodiments, the auxiliary training module is built based on a vision-language model, which is a large-scale artificial intelligence model that combines visual image understanding and natural language reasoning capabilities, and has powerful commonsense reasoning and zero-shot generalization capabilities.
[0161] In some embodiments, after training the trajectory prediction model using the training method, the auxiliary training model in the trained trajectory prediction model can be removed to obtain the final trajectory prediction model. Then, the trajectory prediction model can be deployed on the target vehicle to achieve autonomous driving. Therefore, the trajectory prediction model includes a trajectory prediction module, which is the functional unit responsible for image feature processing, risk identification, and trajectory generation within the trajectory prediction model.
[0162] In some embodiments, since the trajectory prediction model is trained based on the training method of the trajectory prediction model shown above, the trajectory prediction module in the trajectory prediction model has the inherent multimodal visual-language joint understanding and reasoning features of the auxiliary training module (visual-language model). It can complete fine-grained visual feature extraction from multiple target surround view images, and realize scene perception, risk event understanding and risk quantification assessment under the guidance of prompt words. At the same time, it deeply integrates multimodal information with the trajectory prediction task, thereby outputting a safer predicted trajectory that is more in line with real driving logic.
[0163] Therefore, after inputting multiple target surround view images, target status information, and target driving commands into the trajectory prediction model, the trajectory prediction module can process these data. The trajectory prediction module includes a feature encoding submodule, a risk prediction submodule, and a trajectory planning submodule. Through processing the multiple target surround view images, target status information, and target driving commands, the trajectory prediction module obtains the target vehicle's predicted trajectory for the second time period.
[0164] Specifically, it can be combined with Figure 7 To understand, Figure 7 This is a schematic diagram of trajectory prediction provided as an embodiment of this application. Figure 7 As shown, the trajectory prediction module includes a feature encoding submodule, a risk prediction submodule, a trajectory planning submodule, and a latent space submodule.
[0165] In some embodiments, the trajectory prediction module can input multiple target surround view images into a feature encoding submodule. The feature encoding submodule, relying on a feature fusion algorithm running on the onboard computing hardware, transforms the surround view images into unified target panoramic image features. Then, the trajectory prediction module inputs these target panoramic image features into a risk prediction submodule. The risk prediction submodule uses the Transformer's attention mechanism to identify at least one target risk event and its respective target risk score within the target panoramic image features. Finally, the trajectory prediction module inputs the at least one target risk event and at least one target risk score into a latent space submodule. The latent space submodule performs structured processing on the at least one target risk event and at least one target risk score to obtain at least one target risk information.
[0166] Then, the trajectory prediction module inputs at least one target risk information, target state, target driving command, and target panoramic image features into the trajectory planning submodule. The trajectory planning submodule uses the Transformer's attention mechanism to identify and generate a predicted target trajectory under the driver's intent and safety constraints.
[0167] In the above embodiments, by acquiring multiple target surround view images, target state information, and target driving commands of the target vehicle at the current moment, and inputting them into the trajectory prediction model trained using the aforementioned supervised training method of the auxiliary training module, the model can output the target predicted trajectory of the vehicle in the second future time period, relying on the visual feature understanding, risk identification, and trajectory generation capabilities of the trajectory prediction module in the trajectory prediction model. This approach combines the multimodal visual-language joint understanding and reasoning characteristics of the visual-language model for trajectory prediction, ensuring accuracy while avoiding the problem of low trajectory prediction efficiency caused by excessively long reasoning time in the visual-language model.
[0168] Figure 8 This is a schematic diagram of the structure of a training device for a trajectory prediction model provided in an embodiment of this application. Figure 8 As shown in the embodiment of this application, a training device 80 for a trajectory prediction model is also provided. The trajectory prediction model includes a trajectory prediction module and an auxiliary training module. The auxiliary training module is constructed based on a vision-language model. The training device 80 for the trajectory prediction model includes:
[0169] The first determining module 81 is used to determine multiple surround view images, status information and driving instructions of the sample vehicle at the sample time.
[0170] The first processing module 82 is used to input multiple surround view images, status information and driving instructions into the trajectory prediction module to determine at least one first risk information and the predicted trajectory of the sample vehicle in the first time period; the first time period is a time period with the sample time as the starting time and the duration as a preset duration.
[0171] The second processing module 83 is used to input multiple panoramic images into the auxiliary training module to determine at least one second risk information;
[0172] Training module 84 is used to train the trajectory prediction model based on at least one first risk information, at least one second risk information and the predicted trajectory, so as to obtain the trained trajectory prediction model.
[0173] In one possible implementation, the first processing module 82 is specifically used for:
[0174] The trajectory prediction module processes multiple panoramic images to determine the panoramic image features of the sample vehicle at the sample time. The panoramic image features are used to indicate the environmental layout around the sample vehicle, the dynamic status of traffic participants, and risk association information at the sample time.
[0175] Based on the features of the panoramic image, at least one primary risk information is determined;
[0176] The predicted trajectory is determined based on panoramic image features, at least one primary risk information, status information, and driving instructions.
[0177] In one possible implementation, the trajectory prediction module includes a risk prediction submodule, and the first processing module 82 is specifically used for:
[0178] The panoramic image features are input into the risk prediction submodule, which processes the panoramic image features to determine at least one first risk event and at least one first risk score; wherein, at least one first risk event and at least one first risk score are in one-to-one correspondence.
[0179] The first risk event and the first risk score are structured to determine at least one first risk information.
[0180] In one possible implementation, the trajectory prediction module includes a trajectory planning submodule, and the first processing module 82 is specifically used for:
[0181] The panoramic image features, at least one first risk information, status information, and driving instructions are input into the trajectory planning submodule. The trajectory planning submodule processes the panoramic image features, first risk information, status information, and driving instructions to determine the predicted trajectory.
[0182] In one possible implementation, the second processing module 83 is specifically used for:
[0183] The target prompt word corresponding to the sample vehicle at the sample time is determined from multiple preset prompt words; the target prompt word is used to guide the auxiliary training module to determine the risk information of the sample vehicle at the sample time according to preset analysis rules, preset evaluation dimensions and preset output format;
[0184] The target prompt and multiple panoramic images are input into the auxiliary training module to obtain at least one second risk information.
[0185] In one possible implementation, the second processing module 83 is specifically used for:
[0186] The auxiliary training module performs feature processing on multiple panoramic images to determine the image feature information of multiple panoramic images;
[0187] The semantic information of the target prompt words is determined by performing semantic processing on the target prompt words through the auxiliary training module.
[0188] By using an auxiliary training module to process image feature information based on semantic information, at least one second risk event and at least one second risk score are determined; wherein, at least one second risk event and at least one second risk score are in one-to-one correspondence, and at least one second risk event includes at least one first risk event;
[0189] The auxiliary training module performs structured processing on at least one second risk event and at least one second risk score to determine at least one second risk information.
[0190] In one possible implementation, training module 84 is specifically used for:
[0191] The risk loss value of the trajectory prediction model is determined based on at least one first risk information and at least one second risk information.
[0192] Based on the predicted trajectory and the preset trajectory, determine the prediction loss value of the trajectory prediction model;
[0193] The total loss value of the trajectory prediction model is determined based on the risk loss value and the predicted loss value.
[0194] The model parameters of the trajectory prediction model are adjusted based on the total loss value to obtain the trained trajectory prediction model.
[0195] In one possible implementation, training module 84 is specifically used for:
[0196] Based on at least one first risk event and at least one second risk event, determine the event prediction loss of the trajectory prediction module; the event prediction loss is used to indicate the similarity between at least one first risk event and at least one second risk event.
[0197] For each of the at least one first risk event, the loss value of the first risk event is determined based on the first risk score of the first risk event and the second risk score of the second risk event corresponding to the first risk event.
[0198] The risk loss value is determined based on the predicted loss of the event and the loss value of at least one first risk event.
[0199] For a description of the features in the embodiment corresponding to the training device 80 of the trajectory prediction model, please refer to the relevant description of the embodiment corresponding to the training method of the trajectory prediction model, which will not be repeated here.
[0200] Figure 9 This is a schematic diagram of the structure of a trajectory prediction device provided in an embodiment of this application. Figure 9 As shown, this application embodiment also provides a trajectory prediction device 90, which includes:
[0201] The second determining module 91 is used to determine multiple target surround view images, target status information and target driving instructions of the target vehicle at the current moment;
[0202] The prediction module 92 is used to input multiple target surround view images, target status information and target driving instructions into the trajectory prediction model. The trajectory prediction model processes the multiple target surround view images, target status information and target driving instructions to obtain the target predicted trajectory of the target vehicle in the second time period. The trajectory prediction model is obtained by training the trajectory prediction model based on the training method and then removing the auxiliary training module. The second time period is a period with the current time as the starting time and a preset duration.
[0203] For a description of the features in the embodiment corresponding to the trajectory prediction device 90, please refer to the relevant descriptions in the embodiment corresponding to the trajectory prediction method, which will not be repeated here.
[0204] Figure 10 A schematic diagram of the structure of the electronic device provided in this application. Figure 10 As shown, the electronic device 100 provided in this embodiment includes at least one processor 1001 and a memory 1002. Optionally, the electronic device 100 further includes a communication component 1003. The processor 1001, memory 1002, and communication component 1003 are connected via a bus.
[0205] In a specific implementation, at least one processor 1001 executes computer execution instructions stored in memory 1002, causing at least one processor 1001 to execute the above-described trajectory prediction model training method and / or trajectory prediction method embodiment.
[0206] The specific implementation process of the training method and trajectory prediction method of the processor 1001 for executing the trajectory prediction model can be found in the above method embodiment. The implementation principle and technical effect are similar, and will not be repeated here.
[0207] In the above embodiments, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in the application can be directly manifested as being executed by a hardware processor, or executed by a combination of hardware and software modules within the processor.
[0208] The memory may include random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device.
[0209] The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.
[0210] Embodiments of this application also provide a computer-readable storage medium storing a computer program, wherein the computer program is configured to execute the steps in the training method and / or trajectory prediction method embodiments of any of the above-described trajectory prediction models when running.
[0211] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard disk, magnetic disk, or optical disk.
[0212] Embodiments of this application also provide a computer program product, which includes a computer program that, when executed by a processor, implements the steps in the training method and / or trajectory prediction method embodiments of any of the above-described trajectory prediction models.
[0213] Embodiments of this application also provide another computer program product, including a non-volatile computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps in the training method and / or trajectory prediction method embodiments of any of the above-described trajectory prediction models.
[0214] Any of the components, modules, units, parts, methods, and operations described herein can be implemented using software, firmware, hardware (e.g., fixed logic circuitry), manual processing, or any combination thereof. Alternatively or additionally, any functionality described herein can be executed at least in part by one or more hardware logic components, such as, but not limited to, a central processing unit (CPU), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), an application-specific standard product (ASSP), a system-on-a-chip (SoC), a complex programmable logic device (CPLD), a microprocessor (MCU), etc. The terms "system," "computing device," or "apparatus" as used herein encompass various means, devices, and machines for processing data, including, for example, one or more programmable processors, computers, SoCs, or combinations thereof. The apparatus may also include code that creates an execution environment for the computer program in question, such as code constituting processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or one or more combinations thereof. The aforementioned computer program (also known as a program, software, software application, app, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and can be deployed in any form, including as a standalone program or as a module, component, subroutine, object, or other unit suitable for a computing environment.
[0215] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0216] The training method, trajectory prediction method, and electronic device for a trajectory prediction model provided in this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are only for the purpose of helping to understand the method and core ideas of this application. It should be noted that those skilled in the art can make several improvements and modifications to this application without departing from the principles of this application, and these improvements and modifications also fall within the protection scope of the claims of this application.
Claims
1. A training method for a trajectory prediction model, characterized in that, The trajectory prediction model includes a trajectory prediction module and an auxiliary training module. The auxiliary training module is built based on a vision-language model. The method includes: Determine multiple surround view images, status information, and driving commands of the sample vehicle at the sample time. The multiple surround view images, the status information, and the driving instructions are input into the trajectory prediction module to determine at least one first risk information and the predicted trajectory of the sample vehicle in a first time period; the first time period is a period with the sample time as the starting time and a preset duration. The multiple panoramic images are input into the auxiliary training module to determine at least one second risk information; The trajectory prediction model is trained based on the at least one first risk information, the at least one second risk information, and the predicted trajectory to obtain the trained trajectory prediction model.
2. The method according to claim 1, characterized in that, The step of inputting the multiple surround view images, the status information, and the driving instructions into the trajectory prediction module to determine at least one first risk information and the predicted trajectory of the sample vehicle in the first time period includes: The trajectory prediction module processes the multiple panoramic images to determine the panoramic image features of the sample vehicle at the sample time; the panoramic image features are used to indicate the environmental layout around the sample vehicle, the dynamic state of traffic participants, and risk association information at the sample time. Based on the features of the panoramic image, the at least one first risk information is determined; The predicted trajectory is determined based on the panoramic image features, the at least one first risk information, the status information, and the driving instructions.
3. The method according to claim 2, characterized in that, The trajectory prediction module includes a risk prediction submodule, wherein determining the at least one first risk information based on the panoramic image features includes: The panoramic image features are input into the risk prediction submodule, which processes the panoramic image features to determine at least one first risk event and at least one first risk score; wherein the at least one first risk event and the at least one first risk score are in one-to-one correspondence. The at least one first risk event and the at least one first risk score are structured to determine the at least one first risk information.
4. The method according to claim 2 or 3, characterized in that, The trajectory prediction module includes a trajectory planning submodule. The step of determining the predicted trajectory based on the panoramic image features, at least one first risk information, the status information, and the driving instructions includes: The panoramic image features, the at least one first risk information, the status information, and the driving command are input into the trajectory planning submodule. The trajectory planning submodule processes the panoramic image features, the first risk information, the status information, and the driving command to determine the predicted trajectory.
5. The method according to claim 3, characterized in that, The step of inputting the multiple panoramic images into the auxiliary training module to determine at least one second risk information includes: The target prompt word corresponding to the sample vehicle at the sample time is determined from multiple preset prompt words; the target prompt word is used to guide the auxiliary training module to determine the risk information of the sample vehicle at the sample time according to preset analysis rules, preset evaluation dimensions and preset output format; The target prompt and the multiple panoramic images are input into the auxiliary training module to obtain at least one second risk information.
6. The method according to claim 5, characterized in that, The step of inputting the target prompt word and the multiple panoramic images into the auxiliary training module to obtain the at least one second risk information includes: The auxiliary training module performs feature processing on the multiple panoramic images to determine the image feature information of the multiple panoramic images; The auxiliary training module performs semantic processing on the target prompt word to determine its semantic information. The auxiliary training module processes the image feature information based on the semantic information to determine at least one second risk event and at least one second risk score; wherein the at least one second risk event and the at least one second risk score are in one-to-one correspondence, and the at least one second risk event includes the at least one first risk event; The auxiliary training module performs structured processing on the at least one second risk event and the at least one second risk score to determine the at least one second risk information.
7. The method according to claim 6, characterized in that, The step of training the trajectory prediction model based on the at least one first risk information, the at least one second risk information, and the predicted trajectory to obtain the trained trajectory prediction model includes: Based on the at least one first risk information and the at least one second risk information, determine the risk loss value of the trajectory prediction model; Based on the predicted trajectory and the preset trajectory, determine the prediction loss value of the trajectory prediction model; Based on the risk loss value and the predicted loss value, determine the total loss value of the trajectory prediction model; The model parameters of the trajectory prediction model are adjusted based on the total loss value to obtain the trained trajectory prediction model.
8. The method according to claim 7, characterized in that, Determining the risk loss value of the trajectory prediction model based on the at least one first risk information and the at least one second risk information includes: Based on the at least one first risk event and the at least one second risk event, the event prediction loss of the trajectory prediction module is determined; the event prediction loss is used to indicate the degree of similarity between the at least one first risk event and the at least one second risk event. For each of the at least one first risk event, the loss value of the first risk event is determined based on the first risk score of the first risk event and the second risk score of the second risk event corresponding to the first risk event. The risk loss value is determined based on the predicted loss of the event and the respective loss values of the at least one first risk event.
9. A trajectory prediction method, characterized in that, The method includes: Determine multiple target surround view images, target status information, and target driving commands for the target vehicle at the current moment; The multiple target surround view images, the target state information, and the target driving command are input into the trajectory prediction model. The trajectory prediction model processes the multiple target surround view images, the target state information, and the target driving command to obtain the target predicted trajectory of the target vehicle in the second time period. The trajectory prediction model is obtained by training the trajectory prediction model based on the training method of any one of claims 1-8 and then removing the auxiliary training module. The second time period is a time period with the current time as the starting time and a preset duration.
10. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor, configured to implement the steps of the training method for the trajectory prediction model as claimed in any one of claims 1 to 8, and / or the steps of the trajectory prediction method as claimed in claim 9, when executing the computer program.