Vehicle control method, trajectory prediction model training method and device, and vehicle
By collecting and processing information on vehicles, maps, and traffic participants, and using a pre-trained trajectory prediction model to generate high-quality predicted trajectories, the accuracy and safety issues of vehicle navigation in complex traffic scenarios are solved, thereby improving intelligent driving.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAOMI EV TECH CO LTD
- Filing Date
- 2026-04-24
- Publication Date
- 2026-06-19
AI Technical Summary
In complex traffic scenarios, how can we better predict the movement trajectories of surrounding objects to achieve safe and efficient vehicle navigation?
The system collects vehicle information, map information, and traffic participant information of the target vehicle, inputs them into a pre-trained trajectory prediction model, and generates high-quality traffic participant predicted trajectory information through features extraction, multi-head interactive attention processing, and other technologies for intelligent driving control of the vehicle.
It improves the accuracy and reliability of trajectory prediction for traffic participants, ensures vehicle driving safety, and enhances driving comfort and adaptability to complex scenarios.
Smart Images

Figure CN122232644A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of intelligent driving technology, and more specifically, to a vehicle control method, a trajectory prediction model training method, an apparatus, a vehicle, an electronic device, a non-transitory computer-readable storage medium, and a computer program product. Background Technology
[0002] With the continuous development of intelligent driving technology, vehicle intelligent driving technology has also moved from simple fixed road application scenarios to more complex open road application scenarios.
[0003] In application scenarios where traffic conditions are more complex, such as developing roads, in order to achieve safer and more efficient vehicle navigation, it is necessary not only to determine the relevant data of traffic participants around the vehicle, but also to predict the possible trajectories of traffic participants around the vehicle based on this data.
[0004] Therefore, how to better predict the motion trajectory of surrounding objects has become an urgent technical problem to be solved. Summary of the Invention
[0005] To overcome the problems existing in related technologies, this disclosure provides a vehicle control method, a trajectory prediction model training method, an apparatus, a vehicle, an electronic device, a non-transitory computer-readable storage medium, and a computer program product.
[0006] According to a first aspect of the present disclosure, a vehicle control method is provided, the method comprising: Collect vehicle information of the target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a first range from the target vehicle; The vehicle information, map information, and traffic participant information are input into the trajectory prediction model to obtain the predicted trajectory information of the traffic participants; wherein, the predicted trajectory information is obtained by the trajectory prediction model after processing the vehicle information, map information, and traffic participant information to obtain motion interaction fusion features, and then performing trajectory prediction. The intelligent driving of the target vehicle is controlled based on the predicted trajectory information of the traffic participants.
[0007] In one possible implementation, the vehicle information, map information, and traffic participant information are respectively input into the trajectory prediction model to obtain the predicted trajectory information of the traffic participants, including: Feature extraction is performed on the vehicle information, map information, and traffic participant information respectively to obtain vehicle features, map features, and traffic participant features; Based on the vehicle features, map features, traffic participant features, traffic participant information, and map information, motion interaction fusion features are obtained; Based on the motion interaction fusion features, the predicted trajectory information of traffic participants is obtained.
[0008] In one possible implementation, motion interaction fusion features are obtained based on the vehicle features, the map features, the traffic participant features, traffic participant information, and map information, including: Enhanced features are obtained based on the vehicle features, map features, traffic participant features, traffic participant information, and map information; Based on the enhanced features, the motion features of the traffic participants, and the map features, motion interaction fusion features are obtained; In one possible implementation, enhanced features are obtained based on the vehicle features, the map features, the traffic participant features, the traffic participant information, and the map information, including: Based on the characteristics of the traffic participants and the characteristics of the vehicles, the time characteristics are determined; Based on the characteristics of traffic participants, traffic participant information, and map information, spatial features are determined; The time features are weighted to determine weighted time features, and the spatial features are weighted to determine weighted spatial features; The weighted temporal features, weighted spatial features, and map features are subjected to multi-head interactive attention processing to obtain the enhanced features.
[0009] In one possible implementation, determining the time characteristics based on the traffic participant characteristics and the vehicle characteristics includes: The traffic participant features and the vehicle features are concatenated to obtain fused features; The fused features are encoded by at least two time-scale encoders to obtain at least two encoding results, and the at least two encoding results are concatenated along the feature dimension to determine the time features.
[0010] In one possible implementation, spatial features are determined based on the traffic participant characteristics, traffic participant information, and map information, including: Based on the traffic participant information and the map information, a selection distance sequence is determined; The traffic participant features are masked according to different filtering distances in the filtering distance sequence to obtain at least two subspace features, and the at least two subspace features are concatenated along the feature dimension to determine the spatial features.
[0011] In one possible implementation, the traffic participant features are masked based on different filtering distances in the filtering distance sequence to obtain at least two subspace features, and the at least two subspace features are concatenated along the feature dimension to determine the spatial features, including: The traffic participant features are masked based on the first filtering distance to obtain local regions. Convolutional budgeting is then performed on the local regions to obtain initial local spatial features. The local spatial features are then adjusted and adapted by means and dimensions of the initial local spatial features to obtain further local spatial features. The traffic participant features are masked according to the second filtering distance to obtain a global region. The global region is then convolved to obtain an initial global spatial feature. The global spatial feature is then obtained by adjusting the mean and dimension of the initial global spatial feature. The local spatial features and the global spatial features are concatenated along the feature dimension to determine the spatial features.
[0012] In one possible implementation, the time features are weighted to determine weighted time features, and the spatial features are weighted to determine weighted spatial features, including: Based on the map information, the target density is determined, and based on the target density and the acceleration information of the target vehicles, the scene complexity is determined. The weighted temporal features, the weighted spatial features, and the scene complexity are concatenated into a gating input, and the gating input is processed through a fully connected network and an activation function to determine the temporal weights corresponding to the temporal features and the spatial weights determined by the spatial features. The time features are weighted based on the time weights to determine weighted time features; and the spatial features are weighted based on the spatial weights to determine weighted spatial features.
[0013] In one possible implementation, multi-head interactive attention processing is performed on the weighted temporal features, weighted spatial features, and map features to obtain the enhanced features, including: Using the weighted time features as the query sequence, the weighted spatial features as the key sequence, and the map features as the value sequence, multi-head interactive attention processing is performed on the query sequence and the key sequence to determine the attention weights. The enhanced features are obtained by weighting and summing the attention weights and the value sequence.
[0014] In one possible implementation, motion interaction fusion features are obtained based on the enhanced features, the motion features of the traffic participants, and the map features, including: The enhanced features and motion features are concatenated along the feature dimension to obtain updated motion features, and the updated motion features are then subjected to self-attention processing to obtain self-attention motion features. The self-attention motion features and the map features are subjected to cross-attention processing to obtain the motion interaction fusion features.
[0015] According to a second aspect of the present disclosure, a method for training a trajectory prediction model is provided, the method comprising: Obtain training samples, which include at least: traffic participant information, map information, vehicle information, and the marked trajectories of traffic participants; The training samples are input into the trajectory prediction model to be trained to obtain the predicted trajectory of the traffic participants; wherein, the predicted trajectory of the traffic participants is obtained by the trajectory prediction model to be trained after processing the vehicle information, map information and traffic participant information to obtain motion interaction fusion features and then performing trajectory prediction. The trajectory prediction model is trained based on the predicted trajectory of the traffic participant and the labeled trajectory to obtain the trajectory prediction model.
[0016] In one possible implementation, the trajectory prediction model to be trained is trained based on the predicted trajectory of the traffic participant and the labeled trajectory to obtain the trajectory prediction model, including: The loss function is determined based on the predicted trajectory of the traffic participant and the labeled trajectory; The trajectory prediction model to be trained is trained according to the loss function to obtain the trajectory prediction model.
[0017] In one possible implementation, a loss function is determined based on the predicted trajectory of the traffic participant and the labeled trajectory, including: Based on the predicted trajectory of the traffic participant and the labeled trajectory, at least one of the following loss sub-functions is determined: first loss sub-function, second loss sub-function, third loss sub-function, fourth loss sub-function, and fifth loss sub-function. The loss function is determined based on the first loss sub-function, the second loss sub-function, the third loss sub-function, the fourth loss sub-function, and the fifth loss sub-function; The first loss function is used to quantify the error between the predicted trajectory and the labeled trajectory of the traffic participant; the second loss function is used to optimize the probability distribution of each mode output by the trajectory prediction model to be trained through cross-entropy loss; the third loss function is used to constrain the motion state parameters output by the trajectory prediction model to be trained by constructing multi-dimensional physical constraints, so that the motion state parameters meet the preset physical characteristic requirements; the fourth loss function is used to set constraints on the higher-order changes of the motion state to suppress the abrupt changes of the motion state; and the fifth loss function is used to constrain the predicted trajectory to maintain a safe distance from obstacles.
[0018] According to a third aspect of the present disclosure, a vehicle control device is provided, the device comprising: The data collection unit is used to collect vehicle information of the target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a first range from the target vehicle. The acquisition unit is used to input the vehicle information, map information, and traffic participant information into the trajectory prediction model to obtain the predicted trajectory information of the traffic participant; wherein, the predicted trajectory information is obtained by the trajectory prediction model after processing the vehicle information, map information, and traffic participant information to obtain motion interaction fusion features and then performing trajectory prediction. The control unit is used to control the intelligent driving of the target vehicle based on the predicted trajectory information of the traffic participants.
[0019] In one possible implementation, the obtaining unit is configured to: Feature extraction is performed on the vehicle information, map information, and traffic participant information respectively to obtain vehicle features, map features, and traffic participant features; Based on the vehicle features, map features, traffic participant features, traffic participant information, and map information, motion interaction fusion features are obtained; Based on the motion interaction fusion features, the predicted trajectory information of traffic participants is obtained.
[0020] In one possible implementation, the obtaining unit is configured to: Enhanced features are obtained based on the vehicle features, map features, traffic participant features, traffic participant information, and map information; Based on the enhanced features, the motion features of the traffic participants, and the map features, motion interaction fusion features are obtained.
[0021] In one possible implementation, the obtaining unit is configured to: Based on the characteristics of the traffic participants and the characteristics of the vehicles, the time characteristics are determined; Based on the characteristics of traffic participants, traffic participant information, and map information, spatial features are determined; The time features are weighted to determine weighted time features, and the spatial features are weighted to determine weighted spatial features; The weighted temporal features, weighted spatial features, and map features are subjected to multi-head interactive attention processing to obtain the enhanced features.
[0022] In one possible implementation, the obtaining unit is configured to: The traffic participant features and the vehicle features are concatenated to obtain fused features; The fused features are encoded by at least two time-scale encoders to obtain at least two encoding results, and the at least two encoding results are concatenated along the feature dimension to determine the time features.
[0023] In one possible implementation, the obtaining unit is configured to: Based on the traffic participant information and the map information, a selection distance sequence is determined; The traffic participant features are masked according to different filtering distances in the filtering distance sequence to obtain at least two subspace features, and the at least two subspace features are concatenated along the feature dimension to determine the spatial features.
[0024] In one possible implementation, the obtaining unit is configured to: The traffic participant features are masked based on the first filtering distance to obtain local regions. Convolutional budgeting is then performed on the local regions to obtain initial local spatial features. The local spatial features are then adjusted and adapted by means and dimensions of the initial local spatial features to obtain further local spatial features. The traffic participant features are masked according to the second filtering distance to obtain a global region. The global region is then convolved to obtain an initial global spatial feature. The global spatial feature is then obtained by adjusting the mean and dimension of the initial global spatial feature. The local spatial features and the global spatial features are concatenated along the feature dimension to determine the spatial features.
[0025] In one possible implementation, the obtaining unit is configured to: Based on the map information, the target density is determined, and based on the target density and the acceleration information of the target vehicles, the scene complexity is determined. The weighted temporal features, the weighted spatial features, and the scene complexity are concatenated into a gating input, and the gating input is processed through a fully connected network and an activation function to determine the temporal weights corresponding to the temporal features and the spatial weights determined by the spatial features. The time features are weighted based on the time weights to determine weighted time features; and the spatial features are weighted based on the spatial weights to determine weighted spatial features.
[0026] In one possible implementation, the obtaining unit is configured to: Using the weighted time features as the query sequence, the weighted spatial features as the key sequence, and the map features as the value sequence, multi-head interactive attention processing is performed on the query sequence and the key sequence to determine the attention weights. The enhanced features are obtained by weighting and summing the attention weights and the value sequence.
[0027] In one possible implementation, the obtaining unit is configured to: The enhanced features and motion features are concatenated along the feature dimension to obtain updated motion features, and the updated motion features are then subjected to self-attention processing to obtain self-attention motion features. The self-attention motion features and the map features are subjected to cross-attention processing to obtain the motion interaction fusion features.
[0028] According to a fourth aspect of the present disclosure, a trajectory prediction model training apparatus is provided, the apparatus comprising: The sample acquisition unit is used to acquire training samples, which include at least: traffic participant information, map information, vehicle information, and the labeled trajectory of the traffic participants; The trajectory prediction unit is used to input the training samples into the trajectory prediction model to be trained to obtain the predicted trajectory of the traffic participants; wherein, the predicted trajectory of the traffic participants is obtained by the trajectory prediction model to be trained after processing the vehicle information, map information and traffic participant information to obtain motion interaction fusion features and then performing trajectory prediction. The model training unit is used to train the trajectory prediction model to be trained based on the predicted trajectory of the traffic participant and the labeled trajectory, so as to obtain the trajectory prediction model.
[0029] In one possible implementation, the model training unit is used for: The loss function is determined based on the predicted trajectory of the traffic participant and the labeled trajectory; The trajectory prediction model to be trained is trained according to the loss function to obtain the trajectory prediction model.
[0030] In one possible implementation, the model training unit is used for: Based on the predicted trajectory of the traffic participant and the labeled trajectory, at least one of the following loss sub-functions is determined: first loss sub-function, second loss sub-function, third loss sub-function, fourth loss sub-function, and fifth loss sub-function. The loss function is determined based on the first loss sub-function, the second loss sub-function, the third loss sub-function, the fourth loss sub-function, and the fifth loss sub-function; The first loss function is used to quantify the error between the predicted trajectory and the labeled trajectory of the traffic participant; the second loss function is used to optimize the probability distribution of each mode output by the trajectory prediction model to be trained through cross-entropy loss; the third loss function is used to constrain the motion state parameters output by the trajectory prediction model to be trained by constructing multi-dimensional physical constraints, so that the motion state parameters meet the preset physical characteristic requirements; the fourth loss function is used to set constraints on the higher-order changes of the motion state to suppress the abrupt changes of the motion state; and the fifth loss function is used to constrain the predicted trajectory to maintain a safe distance from obstacles.
[0031] According to a fifth aspect of the present disclosure, a vehicle is provided, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the vehicle control method described in any one of the first aspects.
[0032] According to a sixth aspect of the present disclosure, an electronic device is provided, comprising: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to: implement the vehicle control method as described in any one aspect of the first aspect; or implement the trajectory prediction model training method as described in any one aspect of the second aspect.
[0033] According to a seventh aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, storing a computer program that, when executed by a processor, implements the vehicle control method described in any one of the first aspects; or implements the trajectory prediction model training method described in any one of the second aspects.
[0034] According to an eighth aspect of the present disclosure, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the vehicle control method as described in any one of the first aspects; or implements the trajectory prediction model training method as described in any one of the second aspects.
[0035] The technical solutions provided by the embodiments of this disclosure may include the following beneficial effects: This disclosure provides a vehicle control method. The method collects vehicle information of a target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a first range from the target vehicle. This collected information is then input into a trajectory prediction model, which is a pre-trained model capable of learning to adapt to dynamic and complex scenarios. By inputting the collected information into the trajectory prediction model, high-quality predicted trajectories of various possible traffic participants can be obtained. This trajectory prediction information is obtained by processing the vehicle information, map information, and traffic participant information to obtain motion interaction fusion features, and then performing trajectory prediction. This improves the accuracy and reliability of trajectory prediction for traffic participants. Based on this predicted trajectory information of traffic participants, intelligent driving of the vehicle can be controlled, ensuring vehicle driving safety.
[0036] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description
[0037] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.
[0038] Figure 1 This is a flowchart illustrating a vehicle control method according to an exemplary embodiment of the present disclosure.
[0039] Figure 2 This is a schematic diagram illustrating a trajectory prediction model according to an exemplary embodiment of the present disclosure.
[0040] Figure 3 This is a flowchart illustrating the process of obtaining predicted trajectory information of traffic participants according to an exemplary embodiment of this disclosure.
[0041] Figure 4 This is a flowchart illustrating a feature extraction process according to an exemplary embodiment of the present disclosure.
[0042] Figure 5 This is a flowchart illustrating an embodiment of the present disclosure of obtaining motion interaction fusion features.
[0043] Figure 6 This is a flowchart illustrating an enhanced feature according to an exemplary embodiment of the present disclosure.
[0044] Figure 7 This is a process for obtaining time characteristics according to an exemplary embodiment of the present disclosure. Figure 1 .
[0045] Figure 8This is a process for obtaining time characteristics according to an exemplary embodiment of the present disclosure. Figure 2 .
[0046] Figure 9 This is a process for obtaining spatial features according to an exemplary embodiment of the present disclosure. Figure 1 .
[0047] Figure 10 This is a process for obtaining spatial features according to an exemplary embodiment of the present disclosure. Figure 2 .
[0048] Figure 11 This is a flowchart illustrating an exemplary embodiment of the present disclosure for obtaining weighted spatial features and weighted temporal features.
[0049] Figure 12 This is a flowchart illustrating an embodiment of the present disclosure for obtaining scene complexity.
[0050] Figure 13 This is a process for obtaining enhanced features according to an exemplary embodiment of the present disclosure. Figure 2 .
[0051] Figure 14 This is a flowchart illustrating an embodiment of the present disclosure of obtaining motion interaction fusion features.
[0052] Figure 15 This is a schematic diagram illustrating an exemplary embodiment of the present disclosure of obtaining predicted trajectory information of traffic participants.
[0053] Figure 16 This is a schematic diagram illustrating a predicted trajectory information according to an exemplary embodiment of the present disclosure.
[0054] Figure 17 This is a flowchart illustrating a trajectory prediction model training method according to an exemplary embodiment of the present disclosure.
[0055] Figure 18 This is a schematic diagram illustrating an exemplary embodiment of the present disclosure of obtaining predicted trajectory information of traffic participants.
[0056] Figure 19 This is a schematic diagram illustrating an example of obtaining a trajectory prediction model according to an exemplary embodiment of the present disclosure.
[0057] Figure 20 This is a block diagram illustrating a vehicle control device according to an exemplary embodiment of the present disclosure.
[0058] Figure 21 This is a block diagram illustrating a trajectory prediction model training apparatus according to an exemplary embodiment of the present disclosure.
[0059] Figure 22 This is a functional block diagram of a vehicle according to an exemplary embodiment of the present disclosure.
[0060] Figure 23 This is a functional block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Detailed Implementation
[0061] Exemplary embodiments of this disclosure will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings denote the same or similar elements unless otherwise indicated. Various changes, modifications, and equivalents of the methods, apparatus, and / or systems described herein will become apparent upon understanding this disclosure. For example, the order of operations described herein is merely illustrative and is not limited to those orders set forth herein, but can be changed as will become apparent upon understanding this disclosure, except for operations that must be performed in a particular order. Furthermore, for clarity and brevity, descriptions of features known in the art may be omitted.
[0062] The embodiments described below, which are examples of some of the embodiments of this disclosure, do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.
[0063] In application scenarios involving more complex traffic conditions on developed roads, achieving safer and more efficient vehicle navigation requires not only identifying relevant data on traffic participants around the vehicle but also predicting their potential trajectories. Therefore, how to better predict the motion trajectories of surrounding objects and intelligently control the vehicle has become a pressing technical problem that needs to be solved.
[0064] In view of one or more of the above-mentioned problems, this disclosure provides a vehicle control method. This method collects vehicle information of a target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a first range from the target vehicle. The collected information is then input into a trajectory prediction model, which is a pre-trained model capable of learning to adapt to dynamic and complex scenarios. By inputting the collected information into the trajectory prediction model, high-quality predicted trajectory information for multiple possible traffic participants can be obtained. This trajectory prediction information is obtained by the trajectory prediction model after processing the vehicle information, map information, and traffic participant information to obtain motion interaction fusion features, thereby improving the accuracy and reliability of trajectory prediction for traffic participants. Based on this predicted trajectory information of traffic participants, intelligent driving of the vehicle can be controlled, ensuring vehicle driving safety.
[0065] The steps of each method in the exemplary embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings and examples.
[0066] Figure 1 This is a flowchart illustrating a vehicle control method according to an exemplary embodiment of the present disclosure. The method of this embodiment can be applied to vehicles.
[0067] like Figure 1 As shown, in some embodiments, the vehicle control method of this disclosure includes: In step S110, vehicle information of the target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a first range from the target vehicle are collected.
[0068] In this embodiment, various sensors and communication devices can be installed on the target vehicle. For example, cameras, lidar, millimeter-thin radar, inertial strategy units, etc., can be installed on the target vehicle. Through these sensors and communication devices, relevant data about the target vehicle can be sensed and acquired, including traffic participant information, map information, and vehicle information.
[0069] In this embodiment of the disclosure, traffic participants are objects within a first range from the target vehicle. These traffic participants include, for example, pedestrians, non-motorized vehicles, other vehicles, and road obstacles. For instance, based on the position of vehicle A, a preset distance is defined forward, backward, left, and right, such as a rectangular area of 50 meters forward and backward and 30 meters left and right. This rectangular area is the first range. Any pedestrians, non-motorized vehicles, adjacent motorized vehicles, two-wheeled vehicles, handcarts, suitcases, etc., that are within this first range in real time are considered traffic participants.
[0070] In this embodiment of the disclosure, the traffic participant information includes traffic participant classification information and traffic participant historical trajectory information. The traffic participant classification information includes, for example, pedestrians, vehicles, bicycles, electric vehicles, motorcycles, etc. The traffic participant historical trajectory information includes feature dimension information, such as location, instantaneous speed, acceleration, heading angle, pitch angle, roll angle (such as the driving direction of vehicles and the walking direction of pedestrians), etc.
[0071] In this embodiment of the disclosure, the map information includes static road topology and structure information, traffic facilities and signage information, static environmental obstacle information, and real-time dynamic supplementary information.
[0072] In this embodiment, the static road topology and structure information includes basic road attribute information, road connection relationship information, and road boundary and extent information. The basic road attribute information includes road type (e.g., highway, urban road, rural road), number of lanes, lane line type (e.g., solid line, dashed line, double yellow line), and location coordinates. Road connection relationship information includes, for example, intersection type (e.g., crossroad, roundabout, ramp), turning rules (e.g., allowed left turn / right turn / U-turn), and lane merging / exit logic. Road boundary and extent information includes the three-dimensional position and dimensions of curbs, guardrails, and medians.
[0073] In this embodiment of the disclosure, the traffic facilities and sign / marking information includes traffic sign information, traffic signal information, and road marking information. Specifically, traffic sign information includes the location, type, and semantic information of speed limit signs, no-overtaking signs, and intersection warning signs. Traffic signal information includes the installation location, orientation, and status of traffic lights, green lights, yellow lights, and pedestrian crossing lights. Road marking information includes the location and semantic meaning of pedestrian crossings, stop lines, directional arrows, and speed bump markings.
[0074] In this embodiment of the disclosure, the static environmental obstacle information includes fixed obstacle information and road ancillary facility information. The fixed obstacle information includes the three-dimensional position, size, and outline information of bridges, tunnels, streetlights, traffic monitoring poles, roadside buildings, trees, etc. The road ancillary facility information includes the location and attribute labels of manhole covers, drainage outlets, bollards, construction barriers (temporary static obstacles), etc.
[0075] In this embodiment, the real-time dynamic supplementary information includes temporary road change information, map matching calibration information, and road surface condition information. The temporary road change information includes real-time updates of the static environment, such as construction areas, road maintenance barriers, and temporary traffic control signs. The map matching calibration information includes deviation correction data after aligning real-time sensor data with prior information from a high-definition map. The road surface condition information includes real-time perceived road surface materials (such as asphalt, cement, etc.) and road surface conditions (such as water accumulation, icing, potholes, etc.).
[0076] In this embodiment of the disclosure, the vehicle information includes the vehicle's historical trajectory information. This historical trajectory information includes driving speed, acceleration, jerk, driving direction, position, and angular velocity, among other things.
[0077] In step S120, vehicle information, map information, and traffic participant information are input into the trajectory prediction model to obtain the predicted trajectory information of the traffic participants; wherein, the predicted trajectory information is obtained by the trajectory prediction model after processing the vehicle information, map information, and traffic participant information to obtain motion interaction fusion features and then performing trajectory prediction.
[0078] In the embodiments disclosed herein, the trajectory prediction model will be introduced first, and the training of the trajectory prediction model will be introduced later, and will not be repeated here.
[0079] Figure 2 This is a schematic diagram illustrating a trajectory prediction model according to an exemplary embodiment of the present disclosure.
[0080] like Figure 2 As shown, in some embodiments, the trajectory prediction model includes a feature extraction module, a spatiotemporal attention module, a motion interaction fusion module, and a trajectory prediction module.
[0081] In this exemplary embodiment, the feature extraction module includes multiple encoders, such as a first encoder for processing traffic participant information, a second encoder for processing vehicle information, and a third encoder for processing map information.
[0082] In this exemplary embodiment, the spatiotemporal attention module includes an encoder for performing multi-scale temporal feature extraction, an encoder for performing multi-scale spatial feature extraction, a dynamic weight network for determining the weights corresponding to temporal features and the weights corresponding to spatial features, and a multi-head cross-attention mechanism submodule.
[0083] In this exemplary embodiment, the motion interaction fusion module includes a self-attention mechanism submodule and a cross-attention mechanism submodule.
[0084] In this exemplary embodiment, the trajectory prediction module includes a multimodal trajectory generation submodule, a probability multiplier module, and a trajectory regression optimization submodule.
[0085] In this embodiment of the disclosure, once the trained trajectory prediction model is determined, vehicle information, map information, and traffic participant information can be input into the trajectory prediction model to obtain the predicted trajectory information of the traffic participants.
[0086] In step S130, the intelligent driving of the target vehicle is controlled based on the predicted trajectory information of the traffic participants.
[0087] In this embodiment of the disclosure, after obtaining the predicted trajectory information of the traffic participants, the intelligent driving of the target vehicle can be controlled based on the predicted trajectory information of the traffic participants.
[0088] In this way, by relying on the high-precision predicted trajectory information of traffic participants, the future movement trends and driving behaviors of various traffic participants in the surrounding area can be accurately predicted in advance. This breaks the limitations of relying solely on real-time instantaneous perception. Based on this accurate predicted trajectory, driving decisions and dynamic control of the target vehicle can be made, significantly improving the predictability and foresight of vehicle control, avoiding potential collision risks in advance, reducing emergency braking and sudden avoidance operations, making vehicle driving more stable and smooth, effectively improving driving comfort, adapting to complex traffic conditions, optimizing road traffic efficiency, and comprehensively enhancing the driving safety and adaptability of the target vehicle in complex scenarios.
[0089] Figure 3 This is a schematic diagram illustrating the acquisition of predicted trajectory information of traffic participants according to an exemplary embodiment of this disclosure.
[0090] like Figure 3 As shown, in some embodiments, in Figure 1 Based on the vehicle control method shown, Figure 1 Step S120 shown may include the following steps.
[0091] In step S310, feature extraction is performed on vehicle information, map information, and traffic participant information to obtain vehicle features, map features, and traffic participant features.
[0092] In this exemplary embodiment, different encoders can be used to extract features from vehicle information, map information, and traffic participant information respectively, to obtain vehicle features, map features, and traffic participant features.
[0093] In step S320, motion interaction fusion features are obtained based on vehicle features, map features, traffic participant features, traffic participant information, and map information.
[0094] In this exemplary embodiment, vehicle features, map features, traffic participant features, traffic participant information, and map information can be processed to obtain motion interaction fusion features.
[0095] In step S330, the predicted trajectory information of traffic participants is obtained based on the motion interaction fusion features.
[0096] In this exemplary embodiment, after obtaining the motion interaction fusion features, the predicted trajectory information of the traffic participants can be obtained based on the motion interaction fusion features.
[0097] Figure 4 This is a schematic diagram illustrating a feature extraction process according to an exemplary embodiment of the present disclosure.
[0098] like Figure 4 As shown, in some embodiments, in Figure 3 Based on the obtained predicted trajectory information of traffic participants, Figure 3 Step S310 shown may include the following steps.
[0099] In step S410, the traffic participant information is input into the first feature extraction module to obtain the traffic participant features.
[0100] In an exemplary embodiment, traffic participant information can be input into the first feature extraction module (e.g., the aforementioned). Figure 2 The first encoder shown is a first feature extraction module, such as a Long Short-Term Memory (LSTM) encoder, which obtains traffic participant features.
[0101] In step S420, the vehicle information is input into the second feature extraction module to obtain vehicle features.
[0102] In an exemplary embodiment, vehicle information can be input into the second feature extraction module (e.g., the aforementioned). Figure 2 The second encoder shown obtains vehicle features.
[0103] In step S430, the spatial coordinates of map elements in the map information are input into the third feature extraction module to obtain geometric structure features; and the map type and semantic category information in the map information are fused and encoded to obtain type features.
[0104] In step S440, the geometric structure features and type features are fused and encoded to obtain map features.
[0105] In an exemplary embodiment, the spatial coordinates of map elements in the Bird's Eye View (BEV) coordinate system have dimensions [B, P, pts, D_m], where P is the number of map elements, pts is the number of sampling points for each map element, and D_m is the feature dimension of the sampling points.
[0106] In an exemplary embodiment, a third feature extraction module (e.g.) can be used. Figure 2The third encoder shown here, for example, is a local_map_encoder, which encodes the spatial coordinates of map elements to extract their geometric features. It then fuses and encodes the type of map elements (such as lane lines / guardrails / traffic signs) and semantic category information (such as solid lines / dashed lines, speed limit signs / no turning signs) to obtain type features. Further, the geometric features and type features are fused to generate a comprehensive map feature (map_feat) with dimensions [B, P, D].
[0107] As can be seen, the final map features are no longer simple coordinate data or semantic labels, but rather composite features that conform to both geometric constraints and semantic rules. This not only avoids the problems of not understanding the rules when using only geometric features or lacking spatial concepts when using only semantic features, but also improves the model's feature discrimination and anti-interference ability, and enhances the physical rationality and semantic compliance of the trajectories predicted by the subsequent model.
[0108] Figure 5 This is a schematic diagram illustrating the acquisition of motion interaction fusion features according to an exemplary embodiment of the present disclosure.
[0109] like Figure 5 As shown, in some embodiments, in Figure 3 Based on the obtained predicted trajectory information of traffic participants, Figure 3 Step S320 shown may include the following steps.
[0110] In step S510, enhanced features are obtained based on vehicle features, map features, traffic participant features, traffic participant information, and map information.
[0111] In this exemplary embodiment, the spatiotemporal attention module based on the predicted trajectory model can process vehicle features, map features, traffic participant features, traffic participant information, and map information to obtain enhanced features.
[0112] In step S520, motion interaction fusion features are obtained based on the enhanced features, the motion features of traffic participants, and map features.
[0113] In this exemplary embodiment, the motion interaction fusion module, which includes a self-attention mechanism submodule and a cross-attention mechanism submodule, can process the enhanced features, the motion features of traffic participants, and map features to obtain motion interaction fusion features.
[0114] Figure 6 This is a schematic diagram illustrating the acquisition of enhanced features according to an exemplary embodiment of the present disclosure.
[0115] like Figure 6 As shown, in some embodiments, in Figure 5 Based on the obtained motion interaction fusion to acquire the predicted trajectory information of traffic participants, Figure 5 Step S510 shown may include the following steps.
[0116] In step S610, time characteristics are determined based on the characteristics of traffic participants and vehicle characteristics.
[0117] In an exemplary embodiment, the characteristics of traffic participants and vehicles can be encoded based on an encoder to capture temporal statistical patterns, dynamic trends, and spatiotemporal interaction patterns, transforming static / discrete features into structured features with a time dimension, thereby obtaining temporal features.
[0118] In step S620, spatial features are determined based on traffic participant characteristics, traffic participant information, and map information.
[0119] In an exemplary embodiment, traffic participant characteristics, traffic participant information, and map information can be encoded based on an encoder, thereby extracting the traffic participant's own spatial attributes, the spatial constraints between the traffic participant and the environment, and the spatial guidance of the map for the traffic participant, and thus generating structured spatial features that reflect "where the traffic participant is, how it interacts with the surrounding environment, and how it is constrained by the map," i.e., obtaining spatial features.
[0120] In step S630, the time features are weighted to determine the weighted time features, and the spatial features are weighted to determine the weighted spatial features. In an exemplary embodiment, the temporal and spatial features can be adjusted to meet the needs of the actual scenario. For example, the temporal features can be weighted to determine weighted temporal features, and the spatial features can be weighted to determine weighted spatial features. In other words, "targeted weight allocation" can be used to strengthen key information, suppress redundant / noisy information, and improve the accuracy of subsequent fusion and trajectory prediction.
[0121] In step S640, multi-head interactive attention processing is performed on the weighted temporal features, weighted spatial features, and map features to obtain enhanced features.
[0122] In this exemplary embodiment, multi-head interactive attention processing can be performed on weighted temporal features, weighted spatial features, and map features to obtain enhanced features.
[0123] In this embodiment of the disclosure, by multi-dimensional and multi-scale cross-feature association modeling, the enhanced features are simultaneously endowed with the three attributes of "temporal dynamism, spatial constraint, and map guidance", thereby greatly improving the expressive power of the features. This can improve the spatiotemporal consistency of trajectory prediction, strengthen the rationality distinction of multimodal trajectories, and adapt the feature to complex scenarios.
[0124] Figure 7 This is a schematic diagram illustrating the acquisition of time features according to an exemplary embodiment of the present disclosure.
[0125] like Figure 7 As shown, in some embodiments, in Figure 6 Based on the enhanced features obtained as shown, Figure 6 Step S610 shown may include the following steps.
[0126] In step S710, the traffic participant features and vehicle features are spliced together to obtain fused features.
[0127] In this embodiment of the disclosure, traffic participant features and vehicle features can be spliced together along the feature dimension to obtain fused features.
[0128] In step S720, the fused features are encoded by at least two time-scale encoders to obtain at least two encoding results, and the at least two encoding results are concatenated along the feature dimension to determine the time features.
[0129] In this embodiment of the disclosure, at least two time-scale encoders, such as a first time-scale encoder, a second time-scale encoder, and a third time-scale encoder, can be determined based on actual scenario requirements. Then, the fused features are encoded based on the first time-scale encoder, the second time-scale encoder, and the third time-scale encoder, respectively, to obtain a first encoding result, a second encoding result, and a third encoding result. Furthermore, the first encoding result, the second encoding result, and the third encoding result can be concatenated along the feature dimension to determine the time features.
[0130] In this embodiment, at least two time-scale encoders process the fused features in parallel across different temporal coverage spans, such as short-window, medium-window, and long-window scales. This allows for simultaneous coverage of instantaneous changes, stage trends, and global patterns, avoiding the omission of key temporal information. Therefore, the generated temporal features are no longer flat temporal sequences but rather hierarchical features rich in "details-trends-patterns".
[0131] Figure 8 This is a schematic diagram illustrating the acquisition of time features according to an exemplary embodiment of the present disclosure.
[0132] like Figure 8As shown, in some embodiments, in Figure 7 Based on the obtained time features shown, Figure 7 Step S720 shown may include the following steps.
[0133] In step S810, the fusion features within a short time window are encoded by an encoder to obtain short-term time features.
[0134] In this exemplary embodiment, the fused features within a short time window can be encoded by an encoder to obtain short-term features (short_term_feat), with dimensions [B, A, D / 2]. The short time window is a local temporal interval that defines a small number of consecutive adjacent frames, used to characterize subtle changes within a short period of the scene. The aforementioned encoder is, for example, an LSTM encoder, i.e., short_term_feat = LSTMshort(agent_feats). This allows for the capture of short-duration (e.g., 3 seconds) sudden actions (e.g., braking, lane changing, etc.).
[0135] In step S820, the fusion features within a long time window are encoded by an encoder to obtain long-term time features.
[0136] In this exemplary embodiment, a long-term feature (long_term_feat) can be obtained by encoding the fused features (agent_feats) within a long-term window using an encoder. The long-term window is a global temporal interval covering a large range of continuous time frames, used to characterize the overall evolution of the scene over a long period. The aforementioned encoder is, for example, an LSTM encoder, i.e., long_term_feat = LSTMlong(agent_feats). This allows the capture of the trend trajectories of traffic participants over a long period (e.g., 10 seconds) (such as constant speed straight travel, turning intentions at intersections, etc.), thereby reflecting the overall movement pattern.
[0137] In step S830, the short-term time features and long-term time features are concatenated along the feature dimension to determine the time features.
[0138] In this exemplary embodiment, short-term and long-term time features can be concatenated along the feature dimension to form a complete time feature (time_feat), i.e., time_feat = Concat(short_term_feat, long_term_feat). This integrates instantaneous changes with long-term trends, avoiding modeling biases of complex motions using a single time scale.
[0139] In this exemplary embodiment, the temporal features obtained by two time-scale encoders can simultaneously capture short-term fine dynamics and long-term trends. For example, they can simultaneously capture instantaneous changes (such as a pedestrian suddenly stopping) and overall trends (such as a pedestrian eventually walking towards an intersection). This avoids the perspective limitations of a single-scale encoder, generates more hierarchical and comprehensive temporal features, and enables the temporal features to adapt to various complex motion scenarios, thereby improving the generalization of trajectory prediction.
[0140] Figure 9 This is a schematic diagram illustrating the acquisition of spatial features according to an exemplary embodiment of the present disclosure.
[0141] like Figure 9 As shown, in some embodiments, in Figure 6 Based on the enhanced features obtained as shown, Figure 6 Step S620 shown may include the following steps.
[0142] In step S910, a selection distance sequence is determined based on traffic participant information and map information.
[0143] In this exemplary embodiment, the distance between the latest location of a traffic participant in the traffic participant information and the center of each map element in the map information can be calculated using the Euclidean distance function to obtain a filtered distance sequence. The filtered distance sequence is a one-dimensional array, where each element corresponds to the Euclidean distance between a traffic participant and the center of a map element, quantifying the spatial correlation between the traffic participant and different map elements. The smaller the distance, the stronger the spatial correlation.
[0144] In step S920, the traffic participant features are masked according to different filtering distances in the filtering distance sequence to obtain at least two subspace features, and the at least two subspace features are spliced together in the feature dimension to determine the spatial features.
[0145] In this exemplary embodiment, based on the numerical distribution of the filtering distance sequence and the need to filter spatial features at different scales, multiple mask distance thresholds are selected from the filtering distance sequence, corresponding to the filtering range of spatial features at different scales. For example, the near-scale mask distance threshold is 5m, which filters map elements within 5 meters of the traffic participant (such as lane lines adjacent to the traffic participant, and guardrails nearby), corresponding to fine-grained spatial features that focus on the local tight spatial constraints of the target; the medium-scale mask distance threshold is 15m, which filters map elements within 5 to 15 meters of the traffic participant (such as the lane ahead, the edge of the intersection), corresponding to medium-grained spatial features that focus on the spatial constraints of the target's medium-term movement; and the far-scale mask distance threshold is 30m, which filters map elements within 15 to 30 meters of the traffic participant (such as distant road boundaries, distant intersections), corresponding to coarse-grained spatial features that focus on the spatial guidance of the traffic participant's long-term movement.
[0146] In this exemplary embodiment, the spatial association between traffic participants and map elements is quantified by mask distance. This avoids the indiscriminate use of spatial features of all map elements. Instead, spatial features of different scales (near, medium, and far) are selected based on distance, thereby better reflecting the spatial influence range of traffic participants' movements.
[0147] Figure 10 This is a schematic diagram illustrating the acquisition of spatial features according to an exemplary embodiment of the present disclosure.
[0148] like Figure 10 As shown, in some embodiments, in Figure 9 Based on the spatial features shown, Figure 9 Step S920 shown may include the following steps.
[0149] In step S1010, the traffic participant features are masked according to the first screening distance to obtain local regions, and convolutional budgeting is performed on the local regions to obtain initial local spatial features. The local spatial features are then adapted by adjusting the mean and dimension of the initial local spatial features to obtain the final local spatial features.
[0150] In this exemplary embodiment, the first filtering distance is, for example, 5 meters. That is, regions <5m are filtered based on the first mask distance threshold (local_mask). Then, a 3x3 convolutional layer (Conv2D3x3) is used to extract initial local spatial features, and the target dimension is adjusted by means and dimensions to obtain local spatial features (local_feat). That is: local_feat = Conv2D3x3(local_mask)mean + permute[B,A,D / 2]. In this way, the obtained local spatial features are features that capture close-range interactions (such as avoidance relationships with vehicles ahead and collision risks with pedestrians).
[0151] In step S1020, the traffic participant features are masked according to the second filtering distance to obtain the global region, and the global region is convolved to obtain the initial global spatial features. The global spatial features are then obtained by adjusting the mean and dimension of the initial global spatial features.
[0152] In this exemplary embodiment, the second filtering distance is, for example, 20 meters. That is, regions <20m are filtered based on the second mask distance threshold (global_mask). Then, a 7x7 convolutional layer (Conv2D7x7) is used to extract the initial global spatial features, and the target dimension is adjusted by means and dimensions to obtain the global spatial features (global_feat). That is: global_feat = Conv2D7x7(global_mask)mean + permute[B,A,D / 2]. In this way, the obtained local spatial features are the features that capture the influence of the long-distance environment (such as the guidance of the path by the intersection layout).
[0153] In step S1030, the local spatial features and global spatial features are concatenated along the feature dimension to determine the spatial features.
[0154] In this exemplary embodiment, local spatial features and global spatial features can be concatenated into a complete spatial feature (space_feat), i.e., space_feat = Concat(local_feat, global_feat). In this way, the final spatial feature can integrate interactive information from different spatial ranges, avoiding local field-of-view limitations or global information redundancy.
[0155] Figure 11 This is a schematic diagram illustrating the acquisition of weighted spatial features and weighted temporal features according to an exemplary embodiment of this disclosure.
[0156] like Figure 11 As shown, in some embodiments, in Figure 6 Based on the enhanced features obtained as shown, Figure 6 Step S630 shown may include the following steps.
[0157] In step S1110, the scene complexity is determined.
[0158] In this exemplary embodiment, the scenario complexity can be determined based on relevant information about the target vehicle and traffic participants.
[0159] In step S1120, the weighted temporal features, weighted spatial features, and scene complexity are concatenated into a gating input, and the gating input is processed through a fully connected network and an activation function to determine the temporal weights corresponding to the temporal features and the spatial weights determined by the spatial features.
[0160] In this exemplary embodiment, weighted temporal features, weighted spatial features, and scene complexity can be concatenated into a gated input. This gated input is then processed through a fully connected network and an activation function to determine the temporal weights corresponding to the temporal features and the spatial weights determined by the spatial features. The fully connected network, for example, consists of two layers: Linear1 (the first fully connected layer performs feature dimension transformation) → an activation function layer introducing nonlinearity → Linear2 (the second fully connected layer calibrates and outputs the feature dimensions) to process the gated input, thereby determining the temporal weights (wt) and spatial weights (ws).
[0161] In step S1130, the time features are weighted based on the time weight to determine the weighted time features; and the spatial features are weighted based on the spatial weight to determine the weighted spatial features.
[0162] In this exemplary embodiment, time weights (wt) and spatial weights (ws) can be used to perform element-wise weighting on time features and spatial features respectively, thereby obtaining weighted features, namely time-weighted features (weighted_time_feat) and spatial-weighted features (weighted_space_feat).
[0163] In this exemplary embodiment, by adaptively allocating time and space weights based on scenario complexity, the core needs of different scenarios can be met, thereby improving the rationality of vehicle control.
[0164] Figure 12 This is a schematic diagram illustrating the acquisition of scene complexity according to an exemplary embodiment of this disclosure.
[0165] like Figure 12 As shown, in some embodiments, in Figure 11 Based on the obtained weighted temporal and weighted spatial features, Figure 11 Step S1110 shown may include the following steps.
[0166] In step S1210, the target density is determined based on the map information.
[0167] In this exemplary embodiment, the total number of targets (including traffic participants such as vehicles, pedestrians, and cyclists) and the available area within the effective driving space of the map can be determined based on map information, and the target density, i.e. the density of the number of targets within the effective space, can be determined accordingly.
[0168] For example, if the available area of the main lane within 30 meters of the target vehicle is 200 square meters, and there are 8 vehicles traveling in it, then the target density can be determined as 8 / 200 = 0.04 vehicles / square meter.
[0169] In step S1220, the scene complexity is determined based on the target density and the acceleration information of the target vehicle.
[0170] In this exemplary embodiment, the scene complexity can be obtained by summing and expanding the target vehicle's acceleration (ego_acc) and target density (target_density). This not only quantifies the scene's dynamics (e.g., high complexity at congested intersections and low complexity at open highways) but also provides a basis for weight adjustment.
[0171] Figure 13 This is a schematic diagram illustrating the acquisition of enhanced features according to an exemplary embodiment of the present disclosure.
[0172] like Figure 13 As shown, in some embodiments, in Figure 6 Based on the enhanced features obtained as shown, Figure 6 Step S640 shown may include the following steps.
[0173] In step S1310, weighted time features are used as the query sequence, weighted spatial features as the key sequence, and map features as the value sequence. Multi-head interactive attention processing is performed on the query sequence and key sequence to determine the attention weights.
[0174] In step S1320, the attention weights and the value sequence are weighted and summed to obtain the enhanced features.
[0175] In this exemplary embodiment, a multi-head cross-attention mechanism is used, with temporal features as the query and spatial features as the key, to fuse map information and finally output an enhanced feature (enhanced_od_feat) through residual connections. .
[0176] Clearly, by using weighted temporal features as the query (Q), weighted spatial features as the key (K), and map features tailored to the number of targets as the value (V), and calculating the attention output (attn_output) through multi-head attention (MultiHeadAttn), we can establish a correlation between temporal trends and spatial interactions, such as the matching of "long-term straight-ahead trend" and "lane line guidance ahead," while also incorporating map constraints (such as lane boundary restrictions on the trajectory). Furthermore, through residual connections, the attention output can be added to the original fused features (agent_feats) to obtain the final enhanced features. This not only preserves the basic information of the original fused features but also avoids overfitting noise by the attention mechanism, improving feature robustness.
[0177] Figure 14 This is a schematic diagram illustrating the acquisition of motion interaction fusion features according to an exemplary embodiment of the present disclosure.
[0178] like Figure 14 As shown, in some embodiments, in Figure 5 Based on the obtained motion interaction fusion features shown, Figure 5 Step S520 shown may include the following steps.
[0179] In step S1410, the enhanced features and motion features are concatenated along the feature dimension to obtain updated motion features, and self-attention processing is applied to the updated motion features to obtain self-attention motion features.
[0180] In this exemplary embodiment, the enhanced_od_feat obtained based on the spatiotemporal attention mechanism of multi-scale scene perception is combined with motion features (mode_query), where the motion features are the motion features of traffic participants, to generate an updated motion feature motion_query=concat(enhanced_od_feat, mode_query). Then, feature fusion using a self-attention mechanism is performed, and the calculation process is represented as follows: Here, Q, K, and V are query, key, and value matrices generated by updating motion features (motion_query). softmax represents the normalization exponential function, and the output is the encoded result of fusing the internal dependencies of motion features, i.e., the self-attention motion feature (motion_query_self), with dimensions [B, A×M, D].
[0181] In step S1420, cross-attention processing is performed on the self-attention motion features and map features to obtain motion interaction fusion features.
[0182] In this exemplary embodiment, feature fusion based on the input self-attention motion features (motion_query_self) and map features (map_query) is performed using a cross-attention mechanism. The calculation process is represented as follows: The input is Q, which is the query matrix generated by the self-attention motion feature (motion_query_self). K and V are the key-value matrices generated by the map feature (map_query). The output is the encoded result that fuses the motion and map features, i.e., the motion interaction fusion feature (ca_motion_query_flat), with dimensions [B, A×M, D].
[0183] In this exemplary embodiment, a progressive design of self-attention encoding motion internal dependence → cross-attention fusion motion and map association is repeated in 3 layers to achieve deep fusion of multi-dimensional features, providing accurate feature support for subsequent trajectory prediction.
[0184] Figure 15 This is a schematic diagram illustrating a trajectory prediction method according to an exemplary embodiment of the present disclosure.
[0185] like Figure 15 As shown, in some implementations, multimodal trajectory information of traffic participants can be obtained based on the following process.
[0186] In this exemplary embodiment, the target vehicle information and the traffic participant information can be input into an LSTM encoder to obtain fused features; then, the fused features can be input into two encoders at different time scales to obtain time features.
[0187] In this exemplary embodiment, a filtering distance can be determined based on traffic participant information and map information, and then local spatial features and global spatial features can be determined based on the filtering distance to obtain the final spatial features.
[0188] In this exemplary embodiment, scene complexity can also be determined based on map information, and then spatial features, temporal features, and scene complexity can be input into a dynamic weight network to determine temporal weights and spatial weights.
[0189] In this exemplary embodiment, weighted temporal features can be determined based on time weights and time characteristics. Similarly, weighted spatial features can be determined based on spatial weights and spatial characteristics. Furthermore, enhanced features can be determined based on weighted temporal features, weighted spatial features, and map features determined based on map information.
[0190] In this exemplary embodiment, updated motion features can be determined based on enhanced features and motion features, and then self-attention processing can be applied to the updated motion features to obtain self-attention motion features. Furthermore, cross-attention processing can be applied to the self-attention motion features and map features to obtain motion interaction fusion features.
[0191] In this exemplary embodiment, after obtaining the motion interaction fusion features, trajectory prediction can be performed on the motion interaction fusion features to obtain trajectories of multiple modalities and probability scoring information for each trajectory. The trajectories of multiple modalities include, for example... Figure 16 As shown, the predicted trajectory information of traffic participants includes... Figure 16 The multimodal trajectories of vehicle A, vehicle D, two-wheeled vehicle B, and pedestrian C are shown.
[0192] Figure 17 This is a flowchart illustrating the training of a trajectory prediction model according to an exemplary embodiment of the present disclosure. The method of this embodiment can be applied to electronic devices.
[0193] like Figure 17 As shown, in some embodiments, the trajectory prediction model training method of this disclosure includes: In step S1710, training samples are obtained. The training samples include at least: traffic participant information, map information, vehicle information, and traffic participant labeled trajectories.
[0194] In this embodiment, various sensors and communication devices can be installed on the vehicle. For example, cameras, lidar, millimeter-thin radar, inertial strategy units, etc., can be installed on the vehicle. Through these sensors and communication devices, relevant vehicle data can be sensed and acquired. This vehicle data includes traffic participant information, map information, and vehicle information, thereby enabling the construction of training samples based on the acquired vehicle data.
[0195] In this embodiment, relevant data for multiple vehicles can be obtained from a database. These vehicles may have the same or different models; this exemplary embodiment does not impose any limitations. The database may include internal databases, external databases, etc. An external database can be understood as a database associated with a vehicle, while an internal database can be understood as data from the system corresponding to the vehicle.
[0196] In this exemplary embodiment, after obtaining relevant data for multiple vehicles, the following operations can be performed on the relevant data for each vehicle: The relevant data for each vehicle is divided according to a preset time period. This preset time period can be divided according to a certain number of preset time periods per day, weekdays and weekends, or weekdays and holidays; this exemplary embodiment does not limit this. Then, for the relevant data of vehicles within each time period, the division time for the data is determined. The relevant data of vehicles before the division time is considered historical data, and the data after the division time is considered future data. The historical data is divided into vehicle information, traffic participant information, and map information. The true values of the future trajectories of each traffic participant are determined from the future data, thereby determining the labeled trajectories of the traffic participants and ultimately determining the training samples.
[0197] In step S1720, training samples are input into the trajectory prediction model to be trained to obtain the predicted trajectories of traffic participants. The predicted trajectories of traffic participants are obtained by the trajectory prediction model to be trained after processing vehicle information, map information, and traffic participant information to obtain motion interaction fusion features, and then performing trajectory prediction.
[0198] In this embodiment of the disclosure, traffic participant information, map information, and vehicle information can be input into the feature extraction module of the trajectory prediction model to be trained, respectively. For example, traffic participant information is input into the first encoder to obtain traffic participant features, vehicle information is input into the second encoder to obtain vehicle features, and map information is input into the third encoder to obtain map features.
[0199] In this exemplary embodiment, traffic participant features, vehicle features, traffic participant information, and map information can be input into the spatiotemporal attention module of the trajectory prediction model to be trained. Specifically, the traffic participant features, vehicle features, traffic participant information, and map information are input into encoders for multi-scale temporal feature extraction and multi-scale spatial feature extraction, thereby obtaining temporal and spatial features with multi-scale information. Further, these multi-scale temporal and spatial features can be input into a dynamic weight network for determining the weights corresponding to the temporal and spatial features, to obtain weighted temporal and weighted spatial features. Then, the weighted temporal and weighted spatial features can be input into a multi-head cross-attention mechanism submodule to obtain enhanced features.
[0200] In this exemplary embodiment, enhanced features, motion features, and map features can be input into the self-attention mechanism submodule and the cross-attention mechanism submodule for motion interaction fusion, thereby obtaining motion interaction fused features.
[0201] In this exemplary embodiment, motion interaction fusion features can be input into the multimodal trajectory generation submodule and the probability scoring submodule to obtain multiple predicted trajectories. Based on the probability scoring, the predicted trajectory of the traffic participant can be selected, for example, the predicted trajectory with the highest probability score.
[0202] In step S1730, the trajectory prediction model to be trained is trained based on the predicted trajectory and labeled trajectory of the traffic participants to obtain the trajectory prediction model.
[0203] In this embodiment of the disclosure, the predicted trajectory and the labeled trajectory of the traffic participant can be input into the trajectory regression optimization submodule to train the trajectory prediction model to be trained, thereby obtaining the trajectory prediction model.
[0204] In this embodiment of the disclosure, a trajectory prediction model to be trained is trained using training samples including traffic participant information, map information, vehicle information, and the labeled trajectories of traffic participants, thereby obtaining a trajectory prediction model that can learn to adapt to dynamic and complex scenarios.
[0205] Figure 18 This is a schematic diagram illustrating an trajectory prediction model according to an exemplary embodiment of the present disclosure.
[0206] like Figure 18 As shown, in some embodiments, in Figure 17 Based on the trajectory prediction model training method shown, Figure 17 Step S1730 shown may include the following steps.
[0207] In step S1810, the loss function is determined based on the predicted trajectory and labeled trajectory of the traffic participants.
[0208] In step S1820, the trajectory prediction model to be trained is trained according to the loss function to obtain the trajectory prediction model.
[0209] In this embodiment of the disclosure, the trajectory prediction model to be trained is trained based on the loss function, which can quickly and efficiently obtain the target trajectory prediction model.
[0210] Figure 19 This is a schematic diagram illustrating the acquisition of a loss function according to an exemplary embodiment of the present disclosure.
[0211] like Figure 19 As shown, in some embodiments, in Figure 18 Based on the trajectory prediction model shown, Figure 18 Step S1810 shown may include the following steps.
[0212] In step S1910, at least one of the following is determined based on the predicted trajectory and labeled trajectory of the traffic participants: first loss sub-function, second loss sub-function, third loss sub-function, fourth loss sub-function, and fifth loss sub-function.
[0213] In step S1920, a loss function is determined based on the first loss function, the second loss function, the third loss function, the fourth loss function, and the fifth loss function. The first loss function quantifies the error between the predicted trajectory and the labeled trajectory of the traffic participant. The second loss function optimizes the probability distribution of each modality output by the trajectory prediction model to be trained using cross-entropy loss. The third loss function constrains the motion state parameters output by the trajectory prediction model to be trained by constructing multi-dimensional physical constraints, ensuring that the motion state parameters meet preset physical characteristic requirements. The fourth loss function sets constraints on higher-order changes in the motion state to suppress abrupt changes in the motion state. The fifth loss function constrains the predicted trajectory to maintain a safe distance from obstacles.
[0214] In an exemplary embodiment, the loss function is shown in Formula 1 below: Formula 1.
[0215] in, The first loss function, also known as the regression loss function, represents the error calculation between the true state and the predicted trajectory state of the best pattern that is closest to the true value. The second loss function, also known as the classification loss function, encourages the model to assign higher probabilities to the best mode through cross-entropy loss. The third loss function, also known as the physical feasibility loss function, constrains acceleration and jerk separately to ensure that the trajectory conforms to the laws of dynamics. This is the fourth loss function, also known as the trajectory smoothing loss function. It is the square of the second-order difference, which imposes a more severe penalty on larger changes in angular acceleration, preventing unnatural turning behavior in the predicted trajectory. This is the fifth loss function, also known as the collision avoidance loss function. It only penalizes violations of the safe distance, encouraging predicted trajectories to maintain a sufficient safe distance and preventing collisions during trajectory prediction. Let P be the predicted multimodal trajectory of the exponential function, and G be the true trajectory. The predicted trajectory is the optimal pattern, where t is the time step. The scores are for each modality, where M is the number of modalities and indexk=k. The trajectory index corresponding to the optimal mode. Let P be the heading angle at time t.i P j Here are the position vectors of different targets, and dsafe is the safe distance threshold. Let be the curvature, and Δs be the change in arc length. For curvature weights.
[0216] In this exemplary embodiment, by introducing a spatiotemporal consistency loss function, including physical feasibility loss, trajectory smoothing loss, and collision avoidance loss, the consistency of the trajectory in the spatiotemporal dimension is constrained, the generation of physically unreasonable trajectories is reduced, the trajectory prediction results are made more consistent with the actual situation, and the reliability of the prediction results is improved.
[0217] Figure 20 This is a block diagram illustrating a vehicle control device according to an exemplary embodiment of the present disclosure. The device of this embodiment can be applied to vehicles and / or electronic devices.
[0218] like Figure 20 As shown, the vehicle control device 2000 may include: a data acquisition unit 2001, an acquisition unit 2002, and a control unit 2003.
[0219] The acquisition unit 2001 is used to acquire vehicle information of the target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a first range from the target vehicle. The obtaining unit 2002 is used to input the vehicle information, map information, and traffic participant information into the trajectory prediction model to obtain the predicted trajectory information of the traffic participant; wherein, the predicted trajectory information is obtained by the trajectory prediction model after processing the vehicle information, map information, and traffic participant information to obtain motion interaction fusion features and then performing trajectory prediction. Control unit 2003 is used to control the intelligent driving of the target vehicle based on the predicted trajectory information of the traffic participants.
[0220] In one possible implementation, the obtaining unit 2002 is used for: Feature extraction is performed on the vehicle information, map information, and traffic participant information respectively to obtain vehicle features, map features, and traffic participant features; Based on the vehicle features, map features, traffic participant features, traffic participant information, and map information, motion interaction fusion features are obtained; Based on the motion interaction fusion features, the predicted trajectory information of traffic participants is obtained.
[0221] In one possible implementation, the obtaining unit 2002 is used for: Enhanced features are obtained based on the vehicle features, map features, traffic participant features, traffic participant information, and map information; Based on the enhanced features, the motion features of the traffic participants, and the map features, motion interaction fusion features are obtained.
[0222] In one possible implementation, the obtaining unit 2002 is used for: Based on the characteristics of the traffic participants and the characteristics of the vehicles, the time characteristics are determined; Based on the characteristics of traffic participants, traffic participant information, and map information, spatial features are determined; The time features are weighted to determine weighted time features, and the spatial features are weighted to determine weighted spatial features; The weighted temporal features, weighted spatial features, and map features are subjected to multi-head interactive attention processing to obtain the enhanced features.
[0223] In one possible implementation, the obtaining unit 2002 is used for: The traffic participant features and the vehicle features are concatenated to obtain fused features; The fused features are encoded by at least two time-scale encoders to obtain at least two encoding results, and the at least two encoding results are concatenated along the feature dimension to determine the time features.
[0224] In one possible implementation, the obtaining unit 2002 is used for: Based on the traffic participant information and the map information, a selection distance sequence is determined; The traffic participant features are masked according to different filtering distances in the filtering distance sequence to obtain at least two subspace features, and the at least two subspace features are concatenated along the feature dimension to determine the spatial features.
[0225] In one possible implementation, the obtaining unit 2002 is used for: The traffic participant features are masked based on the first filtering distance to obtain local regions. Convolutional budgeting is then performed on the local regions to obtain initial local spatial features. The local spatial features are then adjusted and adapted by means and dimensions of the initial local spatial features to obtain further local spatial features. The traffic participant features are masked according to the second filtering distance to obtain a global region. The global region is then convolved to obtain an initial global spatial feature. The global spatial feature is then obtained by adjusting the mean and dimension of the initial global spatial feature. The local spatial features and the global spatial features are concatenated along the feature dimension to determine the spatial features.
[0226] In one possible implementation, the obtaining unit 2002 is used for: Based on the map information, the target density is determined, and based on the target density and the acceleration information of the target vehicles, the scene complexity is determined. The weighted temporal features, the weighted spatial features, and the scene complexity are concatenated into a gating input, and the gating input is processed through a fully connected network and an activation function to determine the temporal weights corresponding to the temporal features and the spatial weights determined by the spatial features. The time features are weighted based on the time weights to determine weighted time features; and the spatial features are weighted based on the spatial weights to determine weighted spatial features.
[0227] In one possible implementation, the obtaining unit 2002 is used for: The weighted time features are used as the query sequence, the weighted spatial features as the key sequence, and the map features as the value sequence. Multi-head interactive attention processing is performed on the query sequence and the key sequence to determine the attention weight. The enhanced features are obtained by weighting and summing the attention weights and the value sequence.
[0228] In one possible implementation, the obtaining unit 2002 is used for: The enhanced features and motion features are concatenated along the feature dimension to obtain updated motion features, and the updated motion features are then subjected to self-attention processing to obtain self-attention motion features. The self-attention motion features and the map features are subjected to cross-attention processing to obtain the motion interaction fusion features.
[0229] Regarding the apparatus in the above embodiments, the specific manner in which each unit performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0230] Figure 21 This is a block diagram illustrating a trajectory prediction model training apparatus according to an exemplary embodiment of the present disclosure. The apparatus of this embodiment can be applied to electronic devices.
[0231] like Figure 21 As shown, the trajectory prediction model training device 2100 may include: a sample acquisition unit 2101, a trajectory prediction unit 2102, and a model training unit 2103.
[0232] The sample acquisition unit 2101 is used to acquire training samples, which include at least: traffic participant information, map information, vehicle information, and the marked trajectory of traffic participants; The trajectory prediction unit 2102 is used to input the training samples into the trajectory prediction model to be trained to obtain the predicted trajectory of the traffic participant; wherein, the predicted trajectory of the traffic participant is obtained by the trajectory prediction model to be trained after processing the vehicle information, map information and traffic participant information to obtain motion interaction fusion features and then performing trajectory prediction. The model training unit 2103 is used to train the trajectory prediction model to be trained based on the predicted trajectory of the traffic participant and the labeled trajectory, so as to obtain the trajectory prediction model.
[0233] In one possible implementation, the model training unit 2103 is used for: The loss function is determined based on the predicted trajectory of the traffic participant and the labeled trajectory; The trajectory prediction model to be trained is trained according to the loss function to obtain the trajectory prediction model.
[0234] In one possible implementation, the model training unit 2103 is used for: Based on the predicted trajectory of the traffic participant and the labeled trajectory, at least one of the following loss sub-functions is determined: first loss sub-function, second loss sub-function, third loss sub-function, fourth loss sub-function, and fifth loss sub-function. The loss function is determined based on the first loss sub-function, the second loss sub-function, the third loss sub-function, the fourth loss sub-function, and the fifth loss sub-function; The first loss function is used to quantify the error between the predicted trajectory and the labeled trajectory of the traffic participant; the second loss function is used to optimize the probability distribution of each mode output by the trajectory prediction model to be trained through cross-entropy loss; the third loss function is used to constrain the motion state parameters output by the trajectory prediction model to be trained by constructing multi-dimensional physical constraints, so that the motion state parameters meet the preset physical characteristic requirements; the fourth loss function is used to set constraints on the higher-order changes of the motion state to suppress the abrupt changes of the motion state; and the fifth loss function is used to constrain the predicted trajectory to maintain a safe distance from obstacles.
[0235] As described above, the device in this embodiment obtains multiple possible predicted trajectories of traffic participants around the vehicle based on the trajectory prediction model obtained above, thereby improving the accuracy and reliability of trajectory prediction of traffic participants around the vehicle and ensuring the driving safety of the target vehicle.
[0236] Regarding the apparatus in the above embodiments, the specific manner in which each unit performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0237] Figure 22 This is a functional block diagram of a vehicle according to an exemplary embodiment of the present disclosure. For example, vehicle 2200 can be a hybrid vehicle, a non-hybrid vehicle, an electric vehicle, a fuel cell vehicle, or other types of vehicles. Vehicle 2200 can be an intelligent driving vehicle, a semi-intelligent driving vehicle, or a non-intelligent driving vehicle.
[0238] Reference Figure 22 The vehicle 2200 may include various subsystems, such as an infotainment system 2210, a perception system 2220, a decision control system 2230, a drive system 2240, and a computing platform 2250. The vehicle 2200 may also include more or fewer subsystems, and each subsystem may include multiple components. Furthermore, each subsystem and component of the vehicle 2200 can be interconnected via wired or wireless means.
[0239] In some embodiments, the infotainment system 2210 may include a communication system, an entertainment system, and a navigation system, etc.
[0240] The perception system 2220 may include several types of sensors for sensing information about the environment surrounding the vehicle 2200. For example, the perception system 2220 may include a global positioning system (which may be a GPS system, a BeiDou system, or another positioning system), an inertial measurement unit (IMU), lidar, millimeter-wave radar, ultrasonic radar, and a camera device.
[0241] The decision control system 2230 may include a computing system, a vehicle controller, a steering system, a throttle, and a braking system.
[0242] The drive system 2240 may include components that provide powered motion to the vehicle 2200. In one embodiment, the drive system 2240 may include an engine, an energy source, a transmission system, and wheels. The engine may be one or a combination of internal combustion engines, electric motors, and compressed air engines. The engine is capable of converting energy provided by the energy source into mechanical energy.
[0243] Some or all of the functions of vehicle 2200 are controlled by computing platform 2250. Computing platform 2250 may include at least one processor 2251 and memory 2252, and processor 2251 may execute instructions 2253 stored in memory 2252.
[0244] Processor 2251 can be any conventional processor, and may also include graphics processing units (GPUs), field programmable gate arrays (FPGAs), systems on chips (SOCs), application-specific integrated circuits (ASICs), or combinations thereof.
[0245] The memory 2252 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk or optical disk.
[0246] In addition to instruction 2253, memory 2252 can also store data, such as road maps, route information, vehicle position, direction, speed, and other data. The data stored in memory 2252 can be used by computing platform 2250.
[0247] In this embodiment of the disclosure, the processor 2251 may execute instructions 2253 to complete all or part of the steps of the above-described vehicle control method and trajectory prediction model training method.
[0248] Reference Figure 23 The electronic device 2300 may include one or more of the following components: a processing component 2302, a memory 2304, a power supply component 2306, a multimedia component 2308, an audio component 2310, an input / output (I / O) interface 2312, a sensor component 2314, and a communication component 2316.
[0249] Processing component 2302 typically controls the overall operation of electronic device 2300, such as operations associated with display, telephone calls, data communication, camera operation, and recording. Processing component 2302 may include one or more processors 2320 to execute instructions to complete all or part of the steps of the vehicle control method and trajectory prediction model training method described above. Furthermore, processing component 2302 may include one or more modules to facilitate interaction between processing component 2302 and other components. For example, processing component 2302 may include a multimedia module to facilitate interaction between multimedia component 2308 and processing component 2302.
[0250] Memory 2304 is configured to store various types of data to support the operation of electronic device 2300. Memory 2304 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0251] Power supply component 2306 provides power to various components of electronic device 2300. Power supply component 2306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 2300.
[0252] Multimedia component 2308 includes a screen that provides an output interface between the electronic device 2300 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). In some embodiments, multimedia component 2308 includes a front-facing camera and / or a rear-facing camera. When the electronic device 2300 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera can receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
[0253] Audio component 2310 is configured to output and / or input audio signals. For example, audio component 2310 includes a microphone (MIC) configured to receive external audio signals when electronic device 2300 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 2304 or transmitted via communication component 2316. In some embodiments, audio component 2310 also includes a speaker for outputting audio signals.
[0254] I / O interface 2312 provides an interface between processing component 2302 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.
[0255] Sensor assembly 2314 includes one or more sensors for providing state assessments of various aspects of electronic device 2300. For example, sensor assembly 2314 may detect the on / off state of electronic device 2300, the relative positioning of components such as the display and keypad of electronic device 2300, changes in position of electronic device 2300 or a component of electronic device 2300, the presence or absence of user contact with electronic device 2300, orientation or acceleration / deceleration of electronic device 2300, and temperature changes of electronic device 2300. Sensor assembly 2314 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 2314 may also include a light sensor for use in imaging applications. In some embodiments, sensor assembly 2314 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.
[0256] Communication component 2316 is configured to facilitate wired or wireless communication between electronic device 2300 and other devices. Electronic device 2300 can access wireless networks based on communication standards, such as WiFi, 3G, 4G, 5G, other communication standards, or combinations thereof. In some embodiments of this disclosure, communication component 2316 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In some embodiments of this disclosure, communication component 2316 further includes a near-field communication (NFC) module to facilitate short-range communication.
[0257] In some embodiments of this disclosure, the electronic device 2300 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.
[0258] In some embodiments of this disclosure, a non-transitory computer-readable storage medium stores a computer program, which, when executed by a processor, implements all or part of the steps of the above-described vehicle control method and trajectory prediction model training method.
[0259] In some embodiments of this disclosure, a computer program is provided, which, when executed by a processor, implements all or part of the steps of the above-described vehicle control method and trajectory prediction model training method.
[0260] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the appended claims.
[0261] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.
Claims
1. A vehicle control method, characterized in that, The method includes: Collect vehicle information of the target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a first range from the target vehicle; The vehicle information, map information, and traffic participant information are input into the trajectory prediction model to obtain the predicted trajectory information of the traffic participants; wherein, the predicted trajectory information is obtained by the trajectory prediction model after processing the vehicle information, map information, and traffic participant information to obtain motion interaction fusion features, and then performing trajectory prediction. The intelligent driving of the target vehicle is controlled based on the predicted trajectory information of the traffic participants.
2. The method according to claim 1, characterized in that, The vehicle information, map information, and traffic participant information are input into the trajectory prediction model to obtain the predicted trajectory information of the traffic participants, including: Feature extraction is performed on the vehicle information, map information, and traffic participant information respectively to obtain vehicle features, map features, and traffic participant features; Based on the vehicle features, map features, traffic participant features, traffic participant information, and map information, motion interaction fusion features are obtained; Based on the motion interaction fusion features, the predicted trajectory information of traffic participants is obtained.
3. The method according to claim 2, characterized in that, Based on the vehicle features, map features, traffic participant features, traffic participant information, and map information, motion interaction fusion features are obtained, including: Enhanced features are obtained based on the vehicle features, map features, traffic participant features, traffic participant information, and map information; Based on the enhanced features, the motion features of the traffic participants, and the map features, motion interaction fusion features are obtained.
4. The method according to claim 3, characterized in that, Based on the vehicle features, map features, traffic participant features, traffic participant information, and map information, enhanced features are obtained, including: Based on the characteristics of the traffic participants and the characteristics of the vehicles, the time characteristics are determined; Based on the characteristics of traffic participants, traffic participant information, and map information, spatial features are determined; The time features are weighted to determine weighted time features, and the spatial features are weighted to determine weighted spatial features; The weighted temporal features, weighted spatial features, and map features are subjected to multi-head interactive attention processing to obtain the enhanced features.
5. The method according to claim 4, characterized in that, Based on the characteristics of the traffic participants and the characteristics of the vehicles, time characteristics are determined, including: The traffic participant features and the vehicle features are concatenated to obtain fused features; The fused features are encoded by at least two time-scale encoders to obtain at least two encoding results, and the at least two encoding results are concatenated along the feature dimension to determine the time features.
6. The method according to claim 4, characterized in that, Based on the traffic participant characteristics, traffic participant information, and map information, spatial features are determined, including: Based on the traffic participant information and the map information, a selection distance sequence is determined; The traffic participant features are masked according to different filtering distances in the filtering distance sequence to obtain at least two subspace features, and the at least two subspace features are concatenated along the feature dimension to determine the spatial features.
7. The method according to claim 4, characterized in that, The time features are weighted to determine weighted time features, and the spatial features are weighted to determine weighted spatial features, including: Based on the map information, the target density is determined, and based on the target density and the acceleration information of the target vehicles, the scene complexity is determined. The weighted temporal features, the weighted spatial features, and the scene complexity are concatenated into a gating input, and the gating input is processed through a fully connected network and an activation function to determine the temporal weights corresponding to the temporal features and the spatial weights determined by the spatial features. The time features are weighted based on the time weights to determine weighted time features; and the spatial features are weighted based on the spatial weights to determine weighted spatial features.
8. The method according to claim 4, characterized in that, Multi-head interactive attention processing is applied to the weighted temporal features, weighted spatial features, and map features to obtain the enhanced features, including: Using the weighted time features as the query sequence, the weighted spatial features as the key sequence, and the map features as the value sequence, multi-head interactive attention processing is performed on the query sequence and the key sequence to determine the attention weights. The enhanced features are obtained by weighting and summing the attention weights and the value sequence.
9. The method according to any one of claims 3-8, characterized in that, Based on the enhanced features, the motion features of the traffic participants, and the map features, motion interaction fusion features are obtained, including: The enhanced features and motion features are concatenated along the feature dimension to obtain updated motion features, and the updated motion features are then subjected to self-attention processing to obtain self-attention motion features. The self-attention motion features and the map features are subjected to cross-attention processing to obtain the motion interaction fusion features.
10. A method for training a trajectory prediction model, characterized in that, The method includes: Obtain training samples, which include at least: traffic participant information, map information, vehicle information, and the marked trajectories of traffic participants; The training samples are input into the trajectory prediction model to be trained to obtain the predicted trajectory of the traffic participants; wherein, the predicted trajectory of the traffic participants is obtained by the trajectory prediction model to be trained after processing the vehicle information, map information and traffic participant information to obtain motion interaction fusion features and then performing trajectory prediction. The trajectory prediction model is trained based on the predicted trajectory of the traffic participant and the labeled trajectory to obtain the trajectory prediction model.
11. The method according to claim 10, characterized in that, Based on the predicted trajectory of the traffic participant and the labeled trajectory, the trajectory prediction model to be trained is trained to obtain the trajectory prediction model, including: The loss function is determined based on the predicted trajectory of the traffic participant and the labeled trajectory; The trajectory prediction model to be trained is trained according to the loss function to obtain the trajectory prediction model.
12. The method according to claim 11, characterized in that, Based on the predicted trajectory of the traffic participant and the labeled trajectory, a loss function is determined, including: Based on the predicted trajectory of the traffic participant and the labeled trajectory, at least one of the following loss sub-functions is determined: first loss sub-function, second loss sub-function, third loss sub-function, fourth loss sub-function, and fifth loss sub-function. The loss function is determined based on the first loss sub-function, the second loss sub-function, the third loss sub-function, the fourth loss sub-function, and the fifth loss sub-function; The first loss function is used to quantify the error between the predicted trajectory and the labeled trajectory of the traffic participant; the second loss function is used to optimize the probability distribution of each mode output by the trajectory prediction model to be trained through cross-entropy loss; the third loss function is used to constrain the motion state parameters output by the trajectory prediction model to be trained by constructing multi-dimensional physical constraints, so that the motion state parameters meet the preset physical characteristic requirements; the fourth loss function is used to set constraints on the higher-order changes of the motion state to suppress the abrupt changes of the motion state; and the fifth loss function is used to constrain the predicted trajectory to maintain a safe distance from obstacles.
13. A vehicle control device, characterized in that, The device includes: The data collection unit is used to collect vehicle information of the target vehicle, map information of the area where the target vehicle is located, and information of traffic participants within a preset range of the target vehicle. The acquisition unit is used to input the vehicle information, map information, and traffic participant information into the trajectory prediction model to obtain the predicted trajectory information of the traffic participants; wherein, the predicted trajectory information is obtained by the trajectory prediction model through deep fusion of the vehicle information, map information, and traffic participant information to determine the motion interaction fusion features, and then performing trajectory prediction. The control unit is used to control the intelligent driving of the vehicle based on the predicted trajectory information of the traffic participants.
14. The apparatus according to claim 13, characterized in that, The obtaining unit is used for: Feature extraction is performed on the vehicle information, map information, and traffic participant information respectively to obtain vehicle features, map features, and traffic participant features; Based on the vehicle features, map features, traffic participant features, traffic participant information, and map information, motion interaction fusion features are obtained; Based on the motion interaction fusion features, the predicted trajectory information of traffic participants is obtained.
15. The apparatus according to claim 14, characterized in that, The obtaining unit is used for: Enhanced features are obtained based on the vehicle features, map features, traffic participant features, traffic participant information, and map information; Based on the enhanced features, the motion features of the traffic participants, and the map features, motion interaction fusion features are obtained.
16. The apparatus according to claim 15, characterized in that, The obtaining unit is used for: The enhanced features and motion features are concatenated along the feature dimension to obtain updated motion features, and the updated motion features are then subjected to self-attention processing to obtain self-attention motion features. The self-attention motion features and the map features are subjected to cross-attention processing to obtain the motion interaction fusion features.
17. A trajectory prediction model training device, characterized in that, The device includes: The sample acquisition unit is used to acquire training samples, which include at least: traffic participant information, map information, vehicle information, and the labeled trajectory of the traffic participants; The trajectory prediction unit is used to input the training samples into the trajectory prediction model to be trained to obtain the predicted trajectory of the traffic participants. The model training unit is used to train the trajectory prediction model to be trained based on the predicted trajectory of the traffic participant and the labeled trajectory, so as to obtain the trajectory prediction model.
18. The apparatus according to claim 17, characterized in that, The model training unit is used for: The loss function is determined based on the predicted trajectory of the traffic participant and the labeled trajectory; The trajectory prediction model to be trained is trained according to the loss function to obtain the trajectory prediction model.
19. The apparatus according to claim 18, characterized in that, The model training unit is used for: Based on the predicted trajectory of the traffic participant and the labeled trajectory, at least one of the following loss sub-functions is determined: first loss sub-function, second loss sub-function, third loss sub-function, fourth loss sub-function, and fifth loss sub-function. The loss function is determined based on the first loss sub-function, the second loss sub-function, the third loss sub-function, the fourth loss sub-function, and the fifth loss sub-function; The first loss function is used to quantify the error between the predicted trajectory and the labeled trajectory of the traffic participant; the second loss function is used to optimize the probability distribution of each mode output by the trajectory prediction model to be trained through cross-entropy loss; the third loss function is used to constrain the motion state parameters output by the trajectory prediction model to be trained by constructing multi-dimensional physical constraints, so that the motion state parameters meet the preset physical characteristic requirements; the fourth loss function is used to set constraints on the higher-order changes of the motion state to suppress the abrupt changes of the motion state; and the fifth loss function is used to constrain the predicted trajectory to maintain a safe distance from obstacles.
20. A vehicle, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to implement the vehicle control method according to any one of claims 1 to 9.
21. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to: implement the vehicle control method according to any one of claims 1 to 9; or implement the trajectory prediction model training method according to any one of claims 10 to 12.
22. A non-transitory computer-readable storage medium, characterized in that, The system stores a computer program, which, when executed by a processor, implements the vehicle control method as described in any one of claims 1 to 9; or implements the trajectory prediction model training method as described in any one of claims 10 to 12.
23. A computer program product, characterized in that, The method includes a computer program that, when executed by a processor, implements the vehicle control method as described in any one of claims 1 to 9; or implements the trajectory prediction model training method as described in any one of claims 10 to 12.