A goal-driven trajectory prediction method
By combining Bi-LSTM and attention mechanisms, the system predicts the future intentions of pedestrians and the impact of vehicle speed, solving the problem of inaccurate pedestrian trajectory prediction in existing technologies and improving the accuracy and safety of pedestrian prediction for autonomous vehicles.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2024-04-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing physics-based pedestrian trajectory prediction methods lack sufficient accuracy in autonomous driving, and deep learning algorithms fail to fully consider pedestrian intentions and scene influences, resulting in inaccurate pedestrian trajectory predictions.
Bi-LSTM is used to extract pedestrian trajectory features from the temporal and reverse directions. The attention mechanism and LSTM are combined to predict the future intention of pedestrians. The influence of vehicle speed on pedestrian trajectory is also considered. The future trajectory of pedestrians is predicted by cascaded hidden features and temporal attention mechanism.
This improves the accuracy of predicting future pedestrian trajectories, reduces the safety risks posed by autonomous vehicles to pedestrians, and ensures the accuracy and safety of pedestrian prediction.
Smart Images

Figure CN118132992B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of autonomous driving technology and relates to a pedestrian trajectory prediction method driven by Bi-LSTM prediction target. Background Technology
[0002] The rise of autonomous driving technology is constantly changing our perception and expectations of transportation systems. As a cutting-edge interdisciplinary technology integrating artificial intelligence, machine learning, and the automotive industry, autonomous vehicles will bring unprecedented experiences to our lives. The Society of Automotive Engineers (SAE) classifies autonomous vehicles into six levels, from L0 (no automation) to L5 (fully automated driving). When a vehicle's automation level is below L2, the driver needs to constantly monitor road conditions and intervene immediately in emergencies. At higher automation levels (L3, L4), the driver can engage in non-driving activities, while the vehicle's driving status is monitored by the autonomous driving system, which can request takeover in emergencies. When a vehicle is defined as a Level 5 autonomous vehicle, it can perform all driving tasks completely autonomously in any geographical location and under any road conditions, without the need for human driver intervention. Only then does the vehicle truly achieve autonomous driving.
[0003] However, to achieve true autonomous driving, simply giving vehicles perception and decision-making capabilities is far from sufficient. Vehicles also need a deep understanding of their surroundings, including roads, pedestrians, other vehicles, road routes, and traffic signs. Complex environments (such as insufficient lighting, inclement weather, and complex traffic conditions) can affect the decisions of autonomous vehicles, thus threatening the reliability of the autonomous driving system. In autonomous driving scenarios, pedestrians are the most vulnerable participants; once involved in a traffic accident, pedestrians often suffer the most severe injuries. Therefore, to ensure the safety of autonomous vehicles, it is necessary to predict the future movement trajectories of pedestrians around the vehicle.
[0004] Traditional physics-based research methods are generally guided by the dynamics and kinematics of Newton's laws. Coelingh E et al., in "Collision warning with full auto brake and pedestrian detection—a practical example of automatic emergency braking" [13th International IEEE Conference on Intelligent Transportation Systems. IEEE, 2010: 155-160.], utilized constant acceleration for a vehicle's collision warning and automatic braking system. This system helps vehicles avoid traffic accidents by providing warnings and, when necessary, emergency braking. Elnagar A, in "Prediction of moving objects in dynamic environments using Kalman filters" [Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No. 01EX515). IEEE, 2001: 414-419.], used Kalman filter technology to design a framework for predicting the future position and orientation of moving obstacles in changing environments. The advantage of this method is that it can start prediction from the first time step without needing to observe several steps before beginning the prediction process. In "Agent-based modeling for predicting pedestrian trajectories around an autonomous vehicle" [Journal of Artificial Intelligence Research, 2022, 73: 1385-1433.], Prédhumeau M et al. proposed a hybrid pedestrian trajectory prediction model utilizing a social force model for predicting pedestrian-vehicle interactions. This model integrates different pedestrian behaviors and social group behaviors observed in various interaction scenarios with vehicles. The predictive ability of the model was then validated through qualitative and quantitative comparisons with actual ground trajectories. The results show that the model can be well applied to predicting pedestrian trajectories in novel scenarios.Zhang L et al. proposed a multi-person collision risk assessment framework in “Pedestrian collisionrisk assessment based on state estimation and motion prediction” [IEEE Transactions on Vehicular Technology, 2021, 71(1):98-111.], which includes a motion prediction module, a collision detection module, and a collision risk assessment module. The framework first predicts the motion of the vehicle through a constant acceleration model and a constant velocity model, and then uses a Kalman filter method based on the constant velocity model and the constant acceleration model to estimate the pedestrian's trajectory.
[0005] However, while physics-based pedestrian trajectory prediction methods have a strong advantage in model interpretability, their prediction accuracy is still insufficient to meet the requirements of autonomous driving technology for pedestrian trajectory prediction. Therefore, using deep learning to complete pedestrian trajectory prediction tasks has gradually become the mainstream approach.
[0006] Due to the high compatibility of Recurrent Neural Networks (RNNs) with time series tasks, RNNs are widely used in trajectory prediction. Dai S et al. proposed a trajectory prediction model based on spatiotemporal LSTM in "Modeling vehicle interactions via modified LSTM models for trajectory prediction" [IEEE Access, 2019, 7:38287-38296.]. This method embeds spatial interactions into the LSTM, thereby measuring the interaction between pedestrians and vehicles in trajectory prediction tasks. Zhao T et al., in "Multi-agent tensor fusion for contextual trajectory prediction" [Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2019:12126-12134.], encoded the past motion trajectories and scene context of multiple agents into a multi-agent tensor, and then used convolutional fusion to capture multi-agent interactions. Simultaneously, it preserved the spatial structure and scene context information of the agents. This model uses adversarial loss to learn random predictions, repeatedly decoding the future motion trajectories of multiple agents. In "Modeling social interaction and intention for pedestrian trajectory prediction" [Physica A:Statistical Mechanics and its Applications, 2021, 570:125790.], Chen K et al. combined the pedestrian's surrounding environment, facial key points, and relative position to predict the future trajectory of pedestrians. This method effectively unifies rich visual features about category, interaction, and facial key points into a multi-channel tensor and uses this tensor to build an end-to-end fully convolutional encoder-decoder attention model based on convolutional LSTM.In “Social LSTM: Human trajectory prediction in crowded spaces” [Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 961-971.], Alahi A et al. proposed an LSTM model that performs joint prediction among multiple individuals in a scene. Unlike traditional LSTM, this method shares information among multiple LSTMs through a new pooling layer. The pooling layer aggregates the LSTM hidden representations corresponding to each trajectory, which can be used to generate pedestrian interaction information.
[0007] However, since pedestrian movement trajectories are random, and the deep learning algorithms mentioned above do not consider the pedestrian's intentions in their future movement trajectories, nor do they fully consider the influence of the pedestrian's surroundings on their future movement trajectories, these algorithms may make significant errors in predicting pedestrian trajectories, resulting in insufficient accuracy in predicting pedestrian trajectories. This is unacceptable in the field of autonomous driving. Summary of the Invention
[0008] In view of this, the purpose of the present invention is to provide a target-driven trajectory prediction method to predict the future movement trajectory of pedestrians around an autonomous vehicle, providing prior information for the future trajectory planning of the autonomous vehicle, thereby avoiding traffic accidents.
[0009] To achieve the above objectives, the present invention provides the following technical solution:
[0010] A target-driven trajectory prediction method, comprising the following steps:
[0011] S1. By using Bi-LSTM to extract features from the pedestrian trajectory sequence in both chronological and reverse chronological directions, the position distribution of pedestrians at future times is generated, thereby achieving pedestrian target estimation.
[0012] S2. Use attention mechanisms and LSTM to predict the future speed of vehicles around pedestrians;
[0013] S3. Combining the pedestrian's future intentions, the future speed of vehicles around the pedestrian, and the pedestrian's historical movement trajectory, the system predicts the pedestrian's future movement trajectory.
[0014] Furthermore, step S1 specifically involves: using a forward LSTM with the pedestrian's historical motion trajectory as input, propagating from time t+1 to t+δ, and calculating the forward hidden features at each time step. Using a backward LSTM with the future final goal as input, propagating from time t+δ to t+1, the hidden features at each time step from backward to forward are computed. Then, the forward and backward hidden features at the same time point are concatenated to predict the target at that time point, and the hidden features for the pedestrian's future position are calculated; by concatenating the hidden features at each time step, the pedestrian position at each time step is predicted. Complete the prediction of pedestrians' future movement intentions.
[0015] Furthermore, step S2 includes:
[0016] S21, Transfer the historical vehicle speed sequence S t-m ={s t-m ,s t-m+1 ,…,s t As input to the temporal attention mechanism, the influence weights of sequence elements at each time point on the future motion of the vehicle are analyzed through the temporal attention mechanism, and then the sequence S is used as input. t-m ={s t-m ,s t-m+1 ,…,s t The velocity of each frame in the sequence is assigned a corresponding weight, and then the influence weights at each time point are fused with the original sequence to obtain a vehicle velocity sequence with attention weights.
[0017] S22. Using LSTM as the encoder, calculate the hidden feature h of the vehicle speed at each time step. s,t+1 Then, LSTM is used as a decoder to predict the vehicle's speed at each future time step.
[0018] Furthermore, step S3 includes:
[0019] S31. Using pedestrian historical trajectory data as input to a time attention mechanism, assigning corresponding weights to each time step of the pedestrian historical data through the time attention mechanism, and weighting them onto each time step to obtain a pedestrian historical motion sequence with attention weights.
[0020] S32. Encode the pedestrian historical motion sequence with attention weights using LSTM to obtain the hidden feature h. l,t ;
[0021] S33, cascaded pedestrian target estimation, vehicle speed estimation, and hidden features of the same moment in the pedestrian historical motion sequence with attention:
[0022]
[0023] In the formula, h th represents the hidden features for pedestrian target estimation. s,t Hidden features representing vehicle speed estimation;
[0024] S34. Input the cascaded hidden features into the LSTM for decoding, thereby predicting the pedestrian's position at each time step.
[0025] The beneficial effects of this invention are as follows: This invention uses bidirectional LSTM to predict the pedestrian's intentions, and also considers the influence of surrounding vehicles on the pedestrian's future trajectory, which can accurately predict the pedestrian's future trajectory and avoid harm to pedestrians caused by autonomous vehicles.
[0026] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description
[0027] To make the objectives, technical solutions, and advantages of the present invention clearer, the preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, wherein:
[0028] Figure 1 This is a schematic diagram of the structure of the Bi-LSTM model proposed in this embodiment of the invention;
[0029] Figure 2 This is a flowchart illustrating the pedestrian trajectory prediction method of the present invention;
[0030] Figure 3 This is a schematic diagram of the overall framework of the model of the present invention. Detailed Implementation
[0031] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Unless otherwise specified, the following embodiments and features can be combined with each other.
[0032] This invention proposes a pedestrian trajectory prediction method suitable for first-person perspective to improve the accuracy of pedestrian trajectory prediction and reduce safety risks during autonomous vehicle operation. The method mainly involves: using a bidirectional LSTM to independently encode the pedestrian's motion trajectory in both forward and backward directions, generating two future spatial distributions of the pedestrian. One distribution is encoded using the pedestrian's historical motion trajectory, while the other includes the pedestrian's future location information. Then, the loss between the two distributions is calculated, allowing the spatial distribution containing only historical motion information to learn the features of the other distribution. After training, when predicting the pedestrian's short-term future target, sampling is performed from the distribution containing only historical motion trajectory. In addition, this method considers the influence of the pedestrian's neighboring vehicle speeds on the pedestrian's future trajectory, employing a temporal attention mechanism to assign corresponding weights to each frame and weighting the weights of each time slot into the vehicle speed sequence. Then, LSTM is used to predict the vehicle's future speed based on the weighted vehicle speed sequence. Finally, combining the short-term target and vehicle speed information, LSTM is used to complete the pedestrian trajectory prediction.
[0033] Based on the above, an embodiment of the present invention proposes a target-driven trajectory prediction method as follows: Figure 2 As shown, it specifically includes:
[0034] 1. Pedestrian target estimation
[0035] Obtain historical trajectory data of the target pedestrian from the PIE dataset. t-m ={l t-m ,l t-m+1 ,…,l t Then, Bi-LSTM is used to encode the pedestrian's historical movement trajectory, and two LSTMs are used to encode the L... t-m and target g t Encoding is performed, and the structure of Bi-LSTM is as follows: Figure 1 As shown. The forward LSTM processes pedestrian trajectory data L... t-m Encode the data to generate g that does not contain the target information. t eigenvector h t The backward LSTM encodes the pedestrian's historical trajectory data to generate a generator containing target information g. t eigenvector h g,t Then, by processing the two feature vectors using two linear layers, the two positional distributions of the pedestrians can be obtained. Specifically, using the feature vector h... t The pedestrian distribution can be obtained through the linear layer H1. Using the feature vector h g,t The distribution of pedestrians can be obtained through the linear layer H2. in, It is generated based on feature vectors that do not contain pedestrian target information. It is generated from feature vectors containing pedestrian target information. Then, it is calculated. and Learn L using the relative entropy loss function between them t-m and g t Implicit relationships between them.
[0036] Through learning, from The system learns pedestrians' future location information, thereby gaining the ability to predict pedestrians' future goals. During testing, this information is directly derived from... Sampling is performed in the middle, and with h t Connect to produce a predicted estimated target
[0037] To predict the future movement of multiple consecutive pedestrians, a Bi-LSTM is used to accomplish this task from both sequential and reverse temporal perspectives. The forward LSTM takes the pedestrians' historical movement trajectories as input, propagating from time t+1 to t+δ, and calculates the forward hidden features at each time step. The backward LSTM takes the future final goal as input and propagates from time t+δ to t+1, calculating the hidden features at each time step from backward to forward.
[0038] Then, the forward and backward hidden states at the same time point are concatenated to predict the target at that time point, and the hidden features of the pedestrian's future position are calculated. By concatenating the hidden states at each time step, the pedestrian position at each time step is predicted. Complete the prediction of pedestrians' future movement intentions.
[0039] 2. Vehicle speed estimation
[0040] Considering the game-theoretic relationship between pedestrians and vehicles in road traffic scenarios, the future speeds of vehicles surrounding pedestrians in the dataset were predicted.
[0041] Using vehicle speed features S extracted from the PIE dataset t-m ={s t-m ,s t-m+1 ,…,s t As input, predict its motion velocity sequence S over the next δ frames. t+δ ={s t+1 ,s t+2 ,…,s t+δ The process is as follows:
[0042] 1) First, use a temporal attention mechanism to analyze the influence weight of each sequence element at each time point on its future motion. Assign a corresponding weight to the velocity of each frame in the input velocity sequence. Then, merge the influence weight of each time point with the original sequence to obtain a vehicle velocity sequence with attention weights.
[0043] 2) Then, LSTM is used as both the encoder and decoder to process the vehicle speed sequence with weighted features. First, the encoder is used to calculate the hidden feature h of the vehicle speed at each time step. s,t+1 =LSTM(h s,t W s Then, the decoder is used to predict the vehicle's speed at each future time step. Then, the above operations are performed from t+1 to t+δ, and the predicted velocities are used to form a sequence of future motion velocities.
[0044] 3. Pedestrian trajectory prediction
[0045] After predicting the pedestrian's future target and the vehicle's speed, LSTM is used to predict the pedestrian's future trajectory. The overall prediction framework is as follows: Figure 3 As shown. To predict the pedestrian's trajectory, a temporal attention mechanism is also used to process the pedestrian's historical trajectory. Temporal attention assigns corresponding weights to the sequence at each time step, and then these weights are summed at each time step to obtain a pedestrian historical trajectory sequence with attention features. Then, LSTM is used to encode the pedestrian motion features with attention weights.
[0046] Then, the cascaded pedestrian target estimation, vehicle speed estimation, and hidden features of the same moment in the pedestrian motion sequence with attention are obtained. The concatenated hidden layer features are then input into an LSTM to complete the decoding. To predict the location of pedestrians at each time step Propagate the above operations from time t+δ to t+1 to predict the pedestrian's position at each time step. Combine the predicted pedestrian positions at each time step to obtain the predicted sequence of the pedestrian's future movement.
[0047] During model training, the model parameters are updated using a loss function, which consists of three parts: pedestrian target estimation, vehicle speed prediction, and pedestrian trajectory prediction. The loss function for the pedestrian target estimation part is the KLD loss. The vehicle speed estimation part uses the mean squared error loss function to calculate the loss between the actual vehicle speed and the predicted vehicle speed. The loss in the pedestrian trajectory prediction part also uses the mean squared error loss function to calculate the loss between the actual pedestrian trajectory and the predicted pedestrian trajectory. The total loss of the model is Loss = L S +L G +L KL By optimizing the loss, the accuracy of the model is quantified using ADE, CADE, and FADE.
[0048] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.
Claims
1. A goal-driven trajectory prediction method, characterized in that: The method includes the following steps: S1. By using Bi-LSTM to extract features from the pedestrian trajectory sequence in both chronological and reverse chronological directions, the position distribution of pedestrians at future times is generated, pedestrian target estimation is achieved, and the future movement intention of pedestrians is obtained. S2. Use attention mechanisms and LSTM to predict the future speed of vehicles around pedestrians; S21, a historical vehicle speed sequence As the input of the time attention mechanism, the influence weight of the sequence element of each time point on the future motion of the vehicle is analyzed through the time attention mechanism, and then the sequence The speed of each frame is given a corresponding weight, and the influence weight of each time point is fused with the original sequence to obtain a vehicle speed sequence with attention weight ; S22, using LSTM as an encoder, the hidden features of the vehicle speed at each time step are calculated , using LSTM as a decoder, the movement speed of the vehicle at each future time step is predicted ; S3. Combining the pedestrian's future movement intention, the future speed of vehicles around the pedestrian, and the pedestrian's historical movement trajectory, the system predicts the pedestrian's future movement trajectory.
2. The trajectory prediction method according to claim 1, characterized in that: Step S1 specifically involves: using the pedestrian's historical movement trajectory as input through a feedforward LSTM, from time... spread to Calculate the forward hidden features at each time step. Using a backward LSTM with the future final target as input, from time... spread to Calculate the hidden features at each time step from back to forward. Then, the forward and backward hidden features at the same time point are concatenated to predict the target at that time point, and the hidden features of the pedestrian's future position are calculated. By cascading the hidden features at each time step, the pedestrian location at each time step is predicted. It completes the prediction of pedestrians' future movement intentions.
3. The trajectory prediction method according to claim 1, characterized in that: Step S3 includes: S31. Use the pedestrian historical trajectory data as input to the time attention mechanism. Assign corresponding weights to each time step of the pedestrian historical data through the time attention mechanism, and add the weights to each time step to obtain the pedestrian historical motion trajectory with attention weights. S32. Hidden features are obtained by encoding the pedestrian's historical motion trajectory with attention weights using LSTM. ; S33, hidden features at the same moment in the cascaded pedestrian future movement intention, vehicle future speed, and pedestrian historical movement trajectory with attention: In the formula, Hidden features that indicate a pedestrian's future movement intentions. Hidden features that indicate the future speed of a vehicle; S34. Input the cascaded hidden features into the LSTM for decoding, thereby predicting the pedestrian's position at each time step.