Trajectory prediction method and system based on block tokenization pre-training

By using a block-based pre-training method, the problem of poor adaptability of existing trajectory prediction models in unseen scenarios is solved, improving the accuracy and stability of trajectory prediction and ensuring the safety and stability of autonomous vehicles.

CN117763346BActive Publication Date: 2026-06-26SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2023-12-04
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing trajectory prediction methods cannot be effectively applied to unseen scenarios when trained on limited scenario samples, resulting in high risks in practical applications. Furthermore, channel-independent models neglect joint learning of different state sequences, leading to reduced prediction performance.

Method used

By employing a block-based pre-training method, a channel-independent feature extractor is designed to construct a trajectory block-level reconstruction task through random masked trajectory sequences. Combined with causal units and decoupling templates, the local information and long-term dependencies of trajectory points are captured, thereby improving the accuracy of trajectory prediction.

Benefits of technology

It improves the accuracy of lane keeping for autonomous vehicles, ensures the stability of vehicle driving, and enhances the generalization ability and prediction performance of trajectory prediction models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117763346B_ABST
    Figure CN117763346B_ABST
Patent Text Reader

Abstract

The application discloses a trajectory prediction method based on block tokenization pre-training, first, the time sequence position of the masked trajectory point is marked on the basis of the random mask part trajectory, the dependence relationship between different distances of the trajectory points is represented; then, the trajectory block is segmented to capture the local information of the trajectory points, and a reconstruction task of the trajectory block level is designed for pre-training; in the pre-training process, a channel-independent feature extractor is established, a weight-shared causal unit is used to extract the time sequence features of different state sequences in the trajectory sequence; then, independent sequence reconstruction is carried out for different state representations, a feature integrator based on the causal unit is established, and the reconstructed state sequence is combined to predict the target future trajectory. The application effectively improves the accuracy of the trajectory prediction of the unmanned vehicle, improves the safety of the vehicle driving, and has practical significance for promoting the development of the unmanned technology.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field of autonomous driving, specifically relating to a trajectory prediction method and system based on block-based pre-training. Background Technology

[0002] Trajectory prediction is a key part of autonomous driving systems. Its purpose is to predict the possible future trajectory of a target by observing its historical trajectory, thereby improving the safety, stability, and comfort of vehicle operation.

[0003] A review of domestic and international trajectory prediction technologies reveals that most existing methods are based on deep learning, establishing data-driven machine learning models for trajectory prediction. Examples include an autonomous driving trajectory prediction method, device, electronic device, and storage medium (Publication No.: CN117104275A), a ship trajectory prediction method based on LSTM and self-attention mechanisms (Publication No.: CN117114051A), and a trajectory prediction method based on spatiotemporal attention mechanisms and neural constant differential equations (Publication No.: CN117077727A). These methods are typically trained on limited scene samples, making them ineffective for unseen scenarios, leading to significant risks and accidents in practical applications. Recent research has explored the application of pre-training techniques in trajectory prediction, learning generalizable and transferable trajectory features to improve the performance of trajectory prediction models. While most of these methods employ random masking to enhance model performance, multiple consecutive trajectory points may be masked simultaneously during the masking process, resulting in long gaps between remaining trajectory points and weakening the model's ability to capture long-term dependencies. Meanwhile, related works have proposed channel-independent models to enhance the adaptability, convergence speed, and reduce overfitting risk of trajectory prediction models. However, these channel-independent models only focus on feature representation learning along the time axis, neglecting the joint learning of different state sequences, leading to reduced prediction performance. Therefore, this invention proposes a trajectory prediction method and system based on block-labeled pre-training. Summary of the Invention

[0004] To address the aforementioned problems, this invention discloses a trajectory prediction method and system based on block-based pre-training. This method effectively extracts the dependencies between individuals in different state sequences. Simultaneously, an efficient block-based pre-training method is designed to jointly capture local information of trajectory mask points and long-term dependencies between trajectory non-masked points. Improving the accuracy of lane keeping in autonomous vehicles and ensuring vehicle stability is of practical significance for promoting the rapid development of autonomous driving technology.

[0005] To achieve the above objectives, the technical solution of the present invention is as follows:

[0006] The trajectory prediction method based on block-labeled pre-training includes the following steps:

[0007] S1: Randomly mask a portion of the trajectory points in the trajectory sequence and mark the temporal position of the masked trajectory points to characterize the dependency relationship between different distances between the trajectory points;

[0008] S2: For the pre-training process, construct a trajectory block-level reconstruction task to capture the local information of trajectory points. The reconstruction task refers to segmenting backtracking blocks and expectation blocks based on the masked trajectory sequence in step S1. Backtracking blocks are used as input to the pre-trained model, while expectation blocks are the training labels of the pre-trained model.

[0009] S3: During the pre-training process, a channel-independent feature extractor is established, and the feature extractor is trained using the backtracking blocks and expectation blocks divided in step S2; the feature extractor is a temporal neural network with the backtracking block sequence as input and the expectation block sequence as output.

[0010] S4: First, the target historical trajectory data is decoupled into a pseudo-state feature sequence with slots. Then, the slot part of the sequence is reconstructed using the feature extractor trained in step S3. The non-slot part of the pseudo-state sequence with slots contains time-series labels and state labels, while the slot part contains only time-series labels and the state labels are empty.

[0011] S5: Establish a feature integrator to predict the future trajectory of the target using the reconstructed slot portion sequence in step S4; the feature integrator is a temporal neural network with the input being the reconstructed slot portion sequence and the output being the future trajectory.

[0012] As an improvement of the present invention, in step S1, the timing position of the mask is explicitly marked as follows:

[0013]

[0014]

[0015] Among them, P 1:L+H The position of the mask's natural sequence is represented, for example, 1, 2, 3, ...; Tok(·) represents the labeling result of the mask; i∈d represents the feature index of the label; d represents the dimension of the model's hidden layer; L+H represents the length of the sequence.

[0016] As an improvement of the present invention, in step S2, the reconstruction task divides the original sequence of length N into a backtracking block of length L and a desired block of length H, satisfying L+H=N.

[0017] As an improvement of the present invention, the channel-independent feature extractor in step S3 uses a causal unit as its basic structure. This structure consists of a density layer with ReLU activation layer, a density layer with linear activation layer, a Dropout layer, and a normalization layer. The specific extraction process includes:

[0018] S31: Use causal units to mask and label P the backtracking block and the expected block. 1:L+H Mapping to a lower-dimensional space

[0019] S32: Using the mask tags (timing labels) and state labels of all backtracking blocks as input, the system performs tiling, stacking, and concatenation using causal units; then, it reshapes the output vector into... The t-th column of this vector represents the decoded vector at time t;

[0020] S33: By designing a residual module with an output size of 1, the decoding vector is connected to the mask mark (timing label) of the expected block;

[0021] S34: By adding a global residual connection, the backtracking block is mapped to a vector of the same size as the expected block and added to the predictive causal unit, thus obtaining the predicted expected block sequence.

[0022] As an improvement of the present invention, the decoupling method in step S4 is based on the following two templates: Cartesian template T cart And Agant template T arg According to the Cartesian template, the historical trajectory is decoupled into two pseudo-velocities: horizontal and vertical.

[0023] T cart :(x 1:t y 1:t )→(x 2:t -x 1:t-1 y 2:t -y 1:t-1 )

[0024] Among them, (x t y t () represents the trajectory point at time t; x 2:t -x 1:t-1 and y 2:t -y 1:t-1 These represent pseudo-velocities in the horizontal and vertical directions, respectively.

[0025] According to the Agant template, the historical trajectory is decoupled into two representations: the polar radius and the polar angle.

[0026]

[0027] (Ax,Ay)=(x 2:t -x 1:t-1 y 2:t -y 1:t-1 )

[0028]

[0029] in, Δarctan represents the polar radius; Δarctan represents the polar angle.

[0030] To achieve the above objectives, the present invention also adopts the following technical solution: a trajectory prediction system based on block tokenization pre-training, comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of any of the methods described above.

[0031] The beneficial effects of this invention are as follows:

[0032] This invention proposes a trajectory prediction method based on block-based pre-training, which effectively extracts the dependencies between individuals in different state sequences. Simultaneously, an efficient block-based tokenization method is designed to jointly capture local information of trajectory mask points and long-term dependencies between trajectory non-masked points. This improves the accuracy of lane keeping in autonomous vehicles, ensures vehicle stability, and has practical significance for promoting the rapid development of autonomous driving technology. Attached Figure Description

[0033] Figure 1 This is a flowchart of the steps of the method of the present invention;

[0034] Figure 2 The method of this invention is based on the causal unit of the density layer;

[0035] Figure 3 This is a channel-independent neural network extractor for the method of this invention. Detailed Implementation

[0036] The present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are for illustrative purposes only and are not intended to limit the scope of the present invention.

[0037] During the pre-training phase, before implementing random masking operations, the historical state sequence and the future state sequence are concatenated into a state sequence S. Then, three random masking operations are used to enhance the state sequence S: sampling, gap filling, and segmentation. Sampling masking involves sampling subsequences from the original sequence at equal intervals according to a given interval; gap filling is achieved by hiding data at random positions in the sequence; and segmentation refers to cutting subsequences into segments at least 30% of the original sequence length.

[0038] Based on the random masking operation, the reconstruction task divides the N-length sequence after random masking into two parts by setting a backtracking block of length L and an expected block of length H, while ensuring that L+H=N.

[0039] At the same time, the temporal position of the random mask is marked as follows:

[0040]

[0041]

[0042] Among them, P 1:L+H The position of the mask's natural sequence is represented, for example, 1, 2, 3, ...; Tok(·) represents the labeling result of the mask; i∈d represents the feature index of the label; d represents the dimension of the model's hidden layer; L+H represents the length of the sequence.

[0043] In the pre-training phase, a channel-independent extractor is first built using causal units as the basic structure, as shown in the example below. Figure 2 As shown, it consists of a density layer with ReLU activation, a density layer with linear activation, a Dropout layer, and a normalization layer. The specific extraction process is as follows: Figure 3 As shown, it includes:

[0044] S31: Use causal units to mask and label P the backtracking block and the expected block. 1:L+H Mapping to a lower-dimensional space

[0045] S32: Using the mask markers and state labels of all backtracking blocks as input, the system performs tiling, stacking, and concatenation using causal units. Then, the output vector is reshaped into... The t-th column of this vector represents the decoded vector at time t;

[0046] S33: By designing a residual module with an output size of 1, the decoding vector is connected to the mask marker of the desired block;

[0047] S34: By adding a global residual connection, the backtracking block is mapped to a vector of the same size as the expected block and added to the predictive causal unit, thus obtaining the predicted expected block sequence.

[0048] In the main training and inference phases, the decoupling template Kalman template T is first used. cart And Agant template T arg The historical trajectory data is decoupled into four pseudo-state sequence representations, as follows:

[0049] According to the Cartesian template, the historical trajectory is decoupled into two pseudo-velocity representations: horizontal and vertical.

[0050] T cart :(x 1:t y 1:t )→(x 2:t -x 1:t-1 y 2:t -y 1:t-1 )

[0051] Among them, (x t y t() represents the trajectory point at time t; x 2:t -x 1:t-1 and y 2:t -y 1:t-1 These represent pseudo-velocities in the horizontal and vertical directions, respectively.

[0052] According to the Agant template, the historical trajectory is decoupled into two representations: the polar radius and the polar angle.

[0053]

[0054] (Δx, Δy) = (x 2:t -x 1:t-1 y 2:t -y 1:t-1 )

[0055]

[0056] in, Δarctan represents the polar radius; Δarctan represents the polar angle.

[0057] The decoupled bit-state sequence is stored in a specially designed container, which is a slotted sequence whose length is equal to the sum of the historical trajectory length and the future trajectory length. The container has two data attributes: a time-series label recording the sequence's temporal information and a state label recording the state information. The historical state sequence contains both types of labels, while the future state sequence to be predicted only contains the sequence label. This can be labeled as follows:

[0058] τ={(z1,P1),…,(z L P L ), ([·], P L+1 ), …, ([·], P L+N )}

[0059] Where τ represents a state sequence container; z n Represents a status label; P n [·] represents the timing label; [·] represents the slot section.

[0060] Then, the state sequence is reconstructed using a pre-trained extractor, which is based on the historical state sequence (z). 1:L P 1:L ) and future sequence label P L:L+N Reconstruct the future state sequence, and the output is the future state sequence z. L+1:L+N .

[0061] Finally, a neural network predictor is built, which has a two-layer structure. The first layer uses causal units to encode each future state sequence, and the second layer uses causal units to integrate the four future state codes to output the future trajectory prediction result.

[0062] This invention demonstrates superior performance compared to other methods in rotation tests on the INTERACTION dataset. The test metrics are Final Displacement Error (FDE) and Average Displacement Error (ADE), and the comparison models are Long Short-Term Memory Network (LSTM) and Transformer model.

[0063] Table 1. Comparison of FDE / ADE (unit: meters) in the INTERACTION dataset.

[0064] 0 30 60 90 180 LSTM 4.37 / 1.33 6.75 / 2.43 8.26 / 2.94 6.71 / 2.43 5.35 / 1.86 Transformer 3.98 / 1.44 7.27 / 2.95 7.37 / 3.03 6.50 / 2.22 6.01 / 2.86 This invention 3.38 / 1.02 5.24 / 1.64 5.58 / 1.69 5.14 / 1.57 4.42 / 1.33

[0065] In summary, the trajectory prediction method based on block-labeled pre-training proposed in this invention can predict the future trajectory of a target by utilizing the target's historical trajectory. This effectively solves the problems of traditional trajectory prediction pre-training methods, such as poor performance in capturing long-distance dependencies and focusing on modeling the temporal relationship of trajectory sequences while ignoring multi-state joint learning. This improves the accuracy of trajectory prediction for autonomous vehicles and ensures the stability of vehicle operation.

[0066] It should be noted that the above content merely illustrates the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. For those skilled in the art, various improvements and modifications can be made without departing from the principle of the present invention, and all such improvements and modifications fall within the scope of protection of the claims of the present invention.

Claims

1. A trajectory prediction method based on block-labeled pre-training, characterized in that: Includes the following steps: S1: Randomly mask a portion of the trajectory points in the trajectory sequence and mark the temporal position of the masked trajectory points to characterize the dependence of different distances between the trajectory points; The temporal position of the random mask is explicitly marked as: ; ; in, Indicates the position of the mask's natural sequence; The result of the mask marking; Indicates the feature number of the marker; Represents the dimension of the model's hidden layers; Indicates the length of the sequence; S2: For the pre-training process, construct a trajectory block-level reconstruction task to capture the local information of trajectory points. The reconstruction task refers to segmenting backtracking blocks and expectation blocks based on the masked trajectory sequence in step S1. Backtracking blocks are used as input to the pre-trained model, while expectation blocks are the training labels of the pre-trained model. S3: During the pre-training process, a channel-independent feature extractor is established, and the feature extractor is trained using the backtracking blocks and expectation blocks divided in step S2; the feature extractor is a temporal neural network with the backtracking block sequence as input and the expectation block sequence as output. S4: First, the target historical trajectory data is decoupled into a pseudo-state feature sequence with slots. Then, the slot part of the sequence is reconstructed using the feature extractor trained in step S3. The non-slot part of the pseudo-state sequence with slots contains time-series labels and state labels, while the slot part contains only time-series labels and the state labels are empty. Decoupling methods are based on the following two templates: Cartesian template and Agant template According to the Cartesian template, the historical trajectory is decoupled into two pseudo-velocity representations: horizontal and vertical. ; in, The point representing the trajectory at time t; and These are pseudo-velocities in the horizontal and vertical directions, respectively. According to the Agant template, the historical trajectory is decoupled into two representations: the polar radius and the polar angle. ; ; ; in, Indicates polar diameter characterization; Indicates polar angle representation; S5: Establish a feature integrator to predict the future trajectory of the target using the reconstructed slot portion sequence in step S4; the feature integrator is a temporal neural network with the input being the reconstructed slot portion sequence and the output being the future trajectory.

2. The trajectory prediction method based on block-labeled pre-training as described in claim 1, characterized in that: In step S2, the reconstruction task will have a length of... The original sequence is divided into lengths of The backtracking block and its length are The expected block, simultaneously satisfying .

3. The trajectory prediction method based on block-labeled pre-training as described in claim 1, characterized in that: The channel-independent feature extractor described in step S3 uses causal units as its basic structure. Each causal unit consists of a density layer with ReLU activation, a density layer with linear activation, a Dropout layer, and a normalization layer. The specific extraction process includes: S31: Use causal units to mask backtracking blocks and expected blocks. Mapping to a lower-dimensional space ; S32: Take the mask markers and state labels of all backtracking blocks as input, and use causal units to tile, stack, and concatenate them; then reshape the output vector as... The t-th column of this vector represents the decoded vector at time t. S33: By designing a residual module with an output size of 1, the decoding vector is connected to the mask marker of the desired block; S34: By adding a global residual connection, the backtracking block is mapped to a vector of the same size as the expected block and added to the predictive causal unit, thus obtaining the predicted expected block sequence.

4. A trajectory prediction system based on block-labeled pre-training, comprising a computer program, characterized in that: When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1-3.