Motion fingerprint fusion and scene adaptive trajectory intention intelligent inference method and system
By employing a trajectory intent intelligent reasoning method that integrates multi-source data fusion and scene adaptation, motion fingerprint features are extracted and combined with high-precision maps and visual data to construct an adaptive reasoning engine. This solves the problems of single trajectory data and reliance on manual annotation in existing technologies, achieving high-precision, interpretable trajectory intent recognition and automatic annotation, supporting highly reliable applications in smart cities and autonomous driving.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- Chinese People's Liberation Army Cyberspace Force Information Engineering University
- Filing Date
- 2026-01-31
- Publication Date
- 2026-06-12
AI Technical Summary
Existing trajectory semanticization technologies suffer from problems such as single trajectory data sources, coarse-grained feature extraction, shallow scene semantic fusion, poor reasoning interpretability, and heavy reliance on manual annotation. These issues result in low accuracy and efficiency in behavioral intent recognition, making it difficult to meet the high reliability requirements of smart cities and autonomous driving.
A multi-source data fusion approach is adopted, which extracts motion fingerprint features through a neural network model, combines high-precision maps and visual data for scene semantic perception, and constructs a scene adaptive inference engine to realize intelligent inference and automatic annotation of trajectory intent. This includes multi-dimensional feature extraction, scene classification, adaptive feature weights, and dynamic loading of behavioral constraint sets. The hybrid inference model is used to output the probability distribution of trajectory intent.
It significantly improves the accuracy of trajectory semanticization and automatic annotation, solves the annotation problem in complex scenarios, enhances the robustness and adaptability of the system, supports stable output across regions and environments, and provides interpretable intent labels to support high-level autonomous driving decisions.
Smart Images

Figure CN122196622A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent analysis and processing technology of geospatial information data, and in particular to a trajectory intent intelligent reasoning method and system that integrates motion fingerprinting and scene adaptation, applicable to high-level application scenarios such as smart cities, intelligent transportation systems, autonomous driving decision support, and public safety early warning. Background Technology
[0002] Existing trajectory semanticization technologies mainly rely on the following process:
[0003] 1. Data preprocessing: Denoise, interpolate, and segment the original trajectory to form structured trajectory segments;
[0004] 2. Feature Extraction: Extracting motion features such as velocity changes, steering angle, and acceleration fluctuations;
[0005] 3. Behavior classification: Use a rule engine or machine learning model to map features to simple labels (such as "accelerate" or "turn");
[0006] 4. Manual annotation: Supplementary annotation is performed by manually watching videos or replaying tracks in key scenarios.
[0007] Although some studies have attempted to introduce deep learning models (such as LSTM and GRU) for trajectory classification or combine them with map matching techniques to enhance scene understanding, significant bottlenecks still exist overall.
[0008] In recent years, some advanced methods have begun to explore fusing visual information (such as dashcam images) with trajectory data to improve annotation accuracy. Meanwhile, some studies have proposed attention-based models to capture behavioral features at key time steps. However, these methods generally lack the ability to model "behavioral uniqueness" (i.e., motion fingerprints) and fail to achieve true scene-adaptive reasoning. Existing trajectory semanticization technologies mainly suffer from the following shortcomings:
[0009] (1) The trajectory data comes from a single source, resulting in distorted reconstruction of driving behavior.
[0010] Traditional technologies primarily rely on GNSS positioning data or vehicle OBD interface information, which has limited data dimensionality. In areas with weak or interrupted GNSS signals, such as urban canyons, tunnels, and underground parking lots, tracks may exhibit breakpoints or drift, leading to misjudgments of behavior. For example, a vehicle slowing down in the rain could be due to slippery roads or to avoid pedestrians, but speed changes alone cannot distinguish between the two. Furthermore, the lack of visual or environmental contextual support causes the same track to be assigned the same label in different scenarios, resulting in semantic ambiguity.
[0011] (2) The feature extraction is coarse-grained and lacks "behavioral fingerprint" level representation.
[0012] Existing methods often focus on geometric features (such as curvature and velocity gradient) or statistical features (such as average speed and acceleration variance), failing to construct a distinctive "motion fingerprint." For example, two "right turn" actions may show significant differences due to different driving styles, road conditions, and intentions, but traditional models treat them as the same category, leading to bias in intention inference.
[0013] (3) The scene semantic fusion is shallow and the dynamic coupling is insufficient.
[0014] Most systems fail to effectively integrate high-precision environmental models, obstacle information, or dynamic environmental states, or merely attach them as static background information, failing to achieve dynamic and deep coupling reasoning with motion behavior. For example, the trajectory characteristics of stopping at a red light at an intersection are similar to those of temporary parking on a normal road section; without the support of traffic light status and map topology, it is difficult to accurately label their intentions.
[0015] (4) The model lacks behavioral understanding ability and the annotation results have poor interpretability.
[0016] Existing methods often employ end-to-end black-box modeling, which, while achieving a certain level of accuracy, cannot explain "why a particular label was used." For example, a model might classify a trajectory as "suspicious loitering," but it cannot explain whether this is due to abnormal movement rhythm, excessively long dwell time, or a mismatch with the surrounding environment. This lack of causal reasoning mechanisms makes it difficult to meet the needs of high-reliability scenarios such as urban governance and public safety.
[0017] (5) The annotation efficiency is low and there is a heavy reliance on manual labor.
[0018] The current mainstream approach still relies on human annotators to label trajectories using video playback, which is costly, time-consuming, and highly subjective. While semi-automatic tools can assist in recognizing common behaviors, they are poorly adapted to complex interactive scenarios (such as multi-vehicle games and non-motorized vehicle crossings) and have a high error rate. Summary of the Invention
[0019] To address the problems of single trajectory data sources, coarse-grained feature extraction, shallow scene semantic fusion, poor interpretability of reasoning, and heavy reliance on manual annotation in existing technologies, this invention proposes an intelligent trajectory intent reasoning method and system that integrates motion fingerprinting and scene adaptation. This method achieves high precision, strong adaptability, and fully automated annotation of trajectory intent reasoning in complex scenarios, significantly improving the accuracy and efficiency of semantic parsing and reducing reliance on manual annotation.
[0020] To achieve the above objectives, the technical solution adopted is:
[0021] This invention provides a trajectory intent intelligent reasoning method that integrates motion fingerprinting and scene adaptation, comprising the following steps:
[0022] S1: Multi-source data acquisition and preprocessing: Acquire multi-source trajectory data of the moving vehicle during its journey and preprocess it to form a trajectory object dataset;
[0023] S2: Motion fingerprint modeling and extraction: Extract motion features of trajectory segments from trajectory object datasets to construct multi-dimensional feature vectors, extract local temporal patterns through neural network models and combine time attention mechanism to strengthen the weights of key behaviors to obtain motion fingerprints;
[0024] S3: Scene semantic perception and classification: Based on visual data and high-precision map data, identify the scene type and dynamic state of the moving vehicle, and output scene labels and state parameters;
[0025] S4: Scene Adaptive Inference Engine Construction: Configure exclusive feature weight sets, inference rule sets, and behavior constraint sets for different scene types, and dynamically load corresponding model branches or adjust parameters based on scene recognition results;
[0026] S5: Intelligent reasoning of trajectory intent: Construct a hybrid reasoning model, taking motion fingerprint, scene label, state parameters and context object state as input, and outputting the probability distribution of trajectory intent;
[0027] S6: Automatic multi-label annotation output: Based on a hierarchical labeling system, it generates structured semantic labeling results that include behavior categories, scene semantics, and intent interpretations.
[0028] According to the trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation of the present invention, the multi-source trajectory data in step S1 further includes GNSS trajectory data, IMU data, OBD data, visual data, and high-precision map data; the preprocessing includes Kalman filtering smoothing, linear interpolation completion of missing values, IMU data integral correction displacement, alignment of image frames and trajectory timestamps, and data redundancy filtering; the trajectory object dataset includes a trajectory motion feature data set and a trajectory scene semantic data set.
[0029] According to the trajectory intent intelligent reasoning method that integrates motion fingerprinting and scene adaptation of the present invention, the motion features in step S2 further include spatial features, elevation features and velocity features of the trajectory; the neural network model is a 1D-CNN network, and the time attention mechanism is implemented through a GRU neural network.
[0030] According to the trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation of the present invention, the structure of the 1D-CNN network further includes an input layer, a convolutional layer, and a fully connected layer. The input layer receives a multi-dimensional trend feature vector matrix composed of the cumulative trend changes of velocity, acceleration, heading angle, and elevation. The convolutional layer uses multiple convolutional layers and does not set a pooling layer. After convolution, a bias is added and a ReLU activation function is introduced. The fully connected layer uses a softmax activation function and outputs the motion fingerprint label probability.
[0031] According to the trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation of the present invention, the GRU neural network further includes an input layer, a GRU layer and a fully connected layer. The input layer receives the temporal feature vector output by the 1D-CNN network. The GRU layer controls the retention ratio of the previous hidden state and the current candidate hidden state through the update gate, and determines the dependence of the current candidate hidden state on the previous hidden state through the reset gate. The fully connected layer adopts the softmax activation function and outputs the probability of behavioral features.
[0032] According to the present invention, the trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation is further described in step S3 as follows: the scene types include ordinary roads, indoor roads, expressways / highways, and special areas. Ordinary roads include ground roads, intersections, and underpasses; indoor roads include tunnels and underground parking garages; expressways / highways include closed roads, ramps, and service areas; and special areas include school zones, hospitals, construction zones, and congestion hotspots. The dynamic state refers to the real-time motion of the moving vehicle, including acceleration, deceleration, straight-line, parking, turning, uphill / downhill, and U-turn states.
[0033] According to the trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation of the present invention, in step S4, the feature weight set reflects the relative importance of each perception feature under different scenarios through quantitative numerical allocation, and is dynamically adjusted by Bayesian optimization based on historical accident data and expert experience; the reasoning rule is based on a structural causal model to construct a "cause-effect" logical expression, and the logical expression adopts an IF-THEN-WITH structure; the behavioral constraint set includes physical limit constraints, comfort constraints and regulatory constraints, which are determined through causal intervention analysis.
[0034] According to the trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation of the present invention, the scene adaptive reasoning engine dynamically loads the corresponding feature weight set, reasoning rule set, and behavior constraint set based on the currently identified scene type, and inputs the reasoning rule set and behavior constraint set as logical constraints into the hybrid reasoning model to guide its intent reasoning process; at the same time, the scene adaptive reasoning engine dynamically adjusts the parameter configuration of the hybrid reasoning model according to the current scene complexity; the hybrid reasoning model performs intent reasoning under the combined effect of logical constraints and dynamic parameter configuration, and outputs the intent result and its confidence level.
[0035] According to the trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation of the present invention, the hybrid reasoning model in step S5 is a hybrid model of "graph neural network + spatiotemporal attention mechanism"; the graph neural network adopts a 3-layer GCN structure, and the graph structure is dynamically updated according to the spatiotemporal coordinates of each frame; the spatiotemporal attention mechanism is implemented by spatiotemporal Transformer, which includes an encoder and a decoder, each containing 2 layers of spatiotemporal attention blocks.
[0036] Furthermore, the present invention also provides a trajectory intent intelligent reasoning system that integrates motion fingerprinting and scene adaptation, comprising:
[0037] Data acquisition unit: includes GNSS receiver, IMU, camera, and environmental sensors, used to collect multi-source trajectory data;
[0038] Data processing unit: at least one processor for executing a computer program stored in memory to implement the method described above;
[0039] Data storage unit: used to store multi-source data, motion fingerprint template library, scene rule library, training model parameters and annotation results;
[0040] Data communication unit: used for data transmission, including uploading annotation results and downloading updated models;
[0041] Visual interactive unit: used to display trajectory, annotation results, and receive human feedback;
[0042] Each unit achieves data synchronization through a unified timestamp alignment mechanism, supporting online learning and model iteration.
[0043] The beneficial effects achieved by adopting the above technical solution are:
[0044] Compared with traditional trajectory labeling technologies in the background, which rely on single data, have limitations in feature extraction, and suffer from inefficient manual annotation, this invention has the following core advantages:
[0045] 1. Significantly improve the accuracy of trajectory semanticization: achieving a leap from "visible" to "understandable".
[0046] This invention introduces "motion fingerprint modeling" technology to extract dynamic features such as the speed, acceleration, and steering rhythm of the carrier. Combined with high-precision maps and traffic scene semantics, it achieves deep integration of motion patterns and environmental context, accurately identifies behaviors such as lane changes, turns, and parking, and significantly improves the recognition accuracy.
[0047] 2. Effectively solves the challenge of annotation in complex scenes: forms a highly consistent, low-noise semantic annotation system.
[0048] This invention effectively solves the problem of labeling complex scenes. In scenarios with severe interference such as narrow streets, dense dynamic obstacles, and rainy / foggy weather, it integrates multimodal perception data, introduces contextual reasoning and uncertainty modeling, intelligently completes occluded trajectories, corrects abnormal labels, improves the consistency of behavioral intent labeling, and significantly reduces the mislabeling rate.
[0049] 3. Enhance the robustness and adaptability of the reasoning process: Achieve stable generalization across regions and environments.
[0050] For complex environments such as rain, fog, mountainous areas, and new urban areas, this invention adopts adaptive transfer learning and inertial navigation compensation technology to ensure that the system maintains stable output under conditions such as weak GNSS and low visibility, and supports rapid migration and deployment across cities, with strong generalization ability.
[0051] 4. Supporting high-level autonomous driving decision-making: Building an explainable and reliable decision-making cognitive model.
[0052] This invention outputs interpretable intent labels, such as "preparing to change lanes" and "yielding to pedestrians," providing clear and reliable semantic input to the planning and control module. This enables the vehicle to anticipate and proactively avoid obstacles, making safer and more human-centered driving decisions. It not only enhances perception and understanding capabilities but also serves as a "cognitive bridge" connecting perception and decision-making, providing solid support for L3 and above autonomous driving and driving intelligent driving towards a safer, smarter, and more reliable direction. Attached Figure Description
[0053] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings of the embodiments of the present invention will be briefly described below. The drawings are merely illustrative of some embodiments of the present invention and are not intended to limit the scope of the present invention to all embodiments.
[0054] Figure 1 This is a flowchart illustrating the trajectory intent intelligent reasoning method that integrates motion fingerprinting and scene adaptation according to an embodiment of the present invention.
[0055] Figure 2This is a schematic diagram of motion fingerprint classification of motion trajectories according to an embodiment of the present invention;
[0056] Figure 3 This is a schematic diagram of scene perception classification of motion trajectory according to an embodiment of the present invention;
[0057] Figure 4 This is a schematic diagram of the structure of a 1D-CNN network according to an embodiment of the present invention;
[0058] Figure 5 This is a schematic diagram of the structure of the GRU (Gated Recurrent Unit) neural network according to an embodiment of the present invention;
[0059] Figure 6 This is a schematic diagram of the graph neural network structure according to an embodiment of the present invention;
[0060] Figure 7 This is a schematic diagram of the spatiotemporal attention mechanism according to an embodiment of the present invention;
[0061] Figure 8 This is a schematic diagram of the scene adaptation and hybrid reasoning model according to an embodiment of the present invention;
[0062] Figure 9 This is a structural block diagram of the trajectory intent intelligent reasoning system that integrates motion fingerprinting and scene adaptation according to an embodiment of the present invention. Detailed Implementation
[0063] The exemplary solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Unless otherwise defined, the technical or scientific terms used in this invention should have the ordinary meaning understood by one of ordinary skill in the art.
[0064] This invention discloses a trajectory intent intelligent reasoning method that integrates motion fingerprinting and scene adaptation. Through a complete process design of "multi-source data fusion—motion fingerprint modeling—scene semantic perception—adaptive reasoning—intelligent annotation output," it achieves high-precision, high-efficiency, and highly robust semantic parsing of trajectory data. Figure 1 As shown, the specific implementation steps of this method are as follows:
[0065] Step S1: Multi-source data acquisition: Acquire multi-source trajectory data generated by the moving vehicle during its operation, including GNSS trajectory data (latitude and longitude, elevation, speed, timestamp), IMU data (three-axis acceleration, angular velocity), OBD data (engine speed, brake signal, steering angle), visual data (forward / surround view camera images), and high-precision map data (road grade, lane lines, traffic light positions). Divide these data into two categories according to features and scenarios: motion feature information and scenario semantic information. Then, perform redundancy filtering on each type of information and store it in the database.
[0066] Step S2: Preprocess the multi-source trajectory data to form a trajectory object dataset: Perform trajectory Kalman filtering smoothing, missing value linear interpolation completion, IMU data integral correction displacement, image frame and trajectory timestamp alignment, and data redundancy filtering (removing stationary and duplicate sampling points) on the original trajectory data to retain valid trajectory segments under continuous driving conditions. The trajectory object dataset refers to the data organized according to the features corresponding to the trajectory object class when the basic trajectory data segments are classified and organized. Each feature class corresponds to a set of data tables, including a trajectory motion feature data table set and a trajectory scene semantic data table set.
[0067] Step S3: Motion fingerprint modeling and extraction: Based on the trajectory geometry and motion patterns, the motion features of each trajectory segment are extracted, which can be divided into three categories: spatial features, elevation features, and velocity features. First, a multi-dimensional feature vector is constructed, covering spatiotemporal parameters (latitude and longitude, elevation, velocity, heading angle) and dynamic behavior features (acceleration fluctuation, steering inertia, braking response delay). Local temporal patterns are extracted through a 1D-CNN network (one-dimensional convolutional neural network), and the weights of key behavioral segments are strengthened by combining a time attention mechanism to obtain the motion fingerprint.
[0068] (1) The structural design of the 1D-CNN network is as follows Figure 4 As shown, the 1D-CNN network architecture is a classification model that takes multi-dimensional feature vectors as input data. The input is a motion feature vector. Then, convolutional layers are used to extract features from trajectory segments, and then fully connected layers are used to obtain the probability of its motion fingerprint label (such as "turning", "U-turn", ..., "other"). The fingerprint label is as follows: Figure 2 As shown. The structure of the 1D-CNN network is as follows:
[0069] a. Input layer: Input the motion trend feature vector into the CNN network and extract features from each motion trend feature vector.
[0070] The motion trend characteristics are selected by using the cumulative change in four attribute values—velocity v, acceleration a, heading angle h, and elevation z—within a certain trajectory interval as the constituent elements of the trend feature vector. This represents the cumulative change in trend within a given interval relative to the previous interval. The calculation formula is:
[0071]
[0072]
[0073] Where h in the matrix i ,v i ,a i ,z i(i=1,2, …,n) represent the heading angle, velocity, acceleration, and elevation of the i-th trajectory point in the trajectory segment, respectively. The trend feature vector matrix is defined as n-dimensional, where n>10; the following explanation uses 50 dimensions as an example. The sign of the cumulative trend change indicates the trend direction relative to the vehicle's current state; positive indicates the same trend, and negative indicates a different trend. Trend Feature Vector Matrix T r It can be represented as:
[0074]
[0075] The matrix ultimately shows the trend distribution of a certain behavioral feature; the larger the value of a feature, the more pronounced the feature is within this range.
[0076] b. Convolutional Layers: For the geometric, velocity, and elevation attributes defined in the aforementioned motion feature definition, the 1D-CNN network employs multiple convolutional layers. To avoid loss of crucial information, pooling layers are not added between convolutional layers. After convolution, a bias is typically added, and a non-linear activation function ReLU is introduced. Here, the bias is defined as b, and after passing through the activation function f... a The result obtained was:
[0077]
[0078] c. Add a fully connected layer at the end for classification, with softmax as the activation function, and output the probability of the current motion feature vector.
[0079] (2) A neural network design enhanced by combining time attention mechanism is introduced on the basis of the temporal feature sequence output by CNN to achieve adaptive weighting of key time steps.
[0080] The temporal attention mechanism is implemented using a GRU (Gated Recurrent Unit) neural network, which can use temporal feature vectors as input data for the model, such as... Figure 5 As shown, its structure is as follows:
[0081] a. Input layer: The temporal feature vector T is input... i Input to the GRU network.
[0082] b. GRU Layer: In a GRU network, this is achieved by directly modifying the current network state h. t The network state h at the previous moment t-1 Adding a linear dependency between them can solve the problems of vanishing and exploding gradients.
[0083] In a GRU network, the update gate is used to control the final output hidden state h at the current time step. t How much of the hidden state h from the previous moment should be retained? t-1And how much of the current candidate hidden state to retain. The formula for calculating the updated door is as follows:
[0084]
[0085] in, W is the input vector at time t, which is the t-th component of the input sequence T; Tz and W hz To update the gate weight matrix, To update the gate's bias value.
[0086] The output of the updated gate is compared with the hidden state h from the previous time step. t-1 and the current hidden state of the candidates A multiplication operation was performed, where and The product is 1-z t The final output hidden state at time t is:
[0087]
[0088] The purpose of resetting the door is to determine the current candidate hidden state. Does it need to depend on the hidden state h from the previous moment? t-1 And how much it depends on. The hidden state from the previous moment. First, reset the output r of the gate. t After multiplication, the result is used as a parameter to calculate the current candidate hidden state. The calculation formula for resetting the door is as follows:
[0089]
[0090] in, To reset the gate's output value at time t, It is the Sigmoid activation function. and To reset the weight matrix of the gate, To reset the door's bias.
[0091] Calculate the current candidate hidden state At this time, the candidate state combines the current input and the effect of the reset gate, and the specific calculation formula is as follows:
[0092]
[0093] in, Let be the input vector at time t. This indicates element-wise multiplication; tanh is the hyperbolic tangent activation function; W h The weights of the candidate states, The bias term for the candidate state.
[0094] c. Fully connected layer: Add a fully connected layer at the end for classification, with softmax as the activation function, and output the probability that the current time-series feature vector is a behavioral feature.
[0095] Step S4: Scene semantic perception and classification: Combining video frames, images and high-precision map data of moving vehicles, firstly, the YOLOv8 model is used to detect traffic signs, traffic lights and pedestrians. Then, a semantic segmentation network (such as DeepLabV3+) is used to parse the environmental structure (passable area, obstacle area). Finally, the scene type and dynamic state of the moving vehicle are determined by combining the map topology.
[0096] like Figure 3 As shown, the scene types include ordinary roads, indoor roads, expressways / highways, and special areas. Ordinary roads include surface roads, intersections, and underpasses; indoor roads include tunnels and underground parking garages; expressways / highways include closed roads, ramps, and service areas; and special areas include school zones, hospitals, construction zones, and congestion hotspots.
[0097] The dynamic state refers to the real-time motion of the moving vehicle, specifically including: straight-line state (uniform speed straight-line, accelerating straight-line, decelerating straight-line), parking state (temporary parking, long-term parking), turning state (left turn, right turn), uphill / downhill state (uphill acceleration, uphill uniform speed, uphill deceleration, downhill acceleration, downhill uniform speed, downhill deceleration), U-turn state (normal U-turn, low-speed U-turn, fast U-turn), acceleration state (smooth acceleration, rapid acceleration), and deceleration state (smooth deceleration, rapid deceleration). The above dynamic states are determined by features such as speed changes, steering angle fluctuations, and elevation trends in the motion fingerprint.
[0098] Output scene labels Scene_Types and their state parameters P as input for adaptive inference.
[0099] Step S5: Build the scene-adaptive inference engine: For each scene type, there is a set of exclusive feature weights and inference rules. Predefine or learn the feature weight set W_S, inference rule set R_S, and behavior constraint set C_S for different scene types. Based on the structural causal model (SCM), use a lightweight scene classifier or rule engine to determine the current scene and state, and dynamically load the corresponding model branches or adjust the parameters.
[0100] a. Feature weight set W_S
[0101] The feature weight set reflects the relative importance of each perceived feature in different scenarios through quantified numerical allocation. In the SCM framework, the weight represents the strength of the causal path and directly affects the intervention effect. For example, in a school area, the pedestrian density weight is set to 0.9, reflecting the strong correlation of the causal chain "pedestrian appearance → emergency braking". The weight learning is based on historical accident data and expert experience, and is dynamically adjusted using Bayesian optimization. The weight value not only determines the feature priority, but also affects the rule triggering threshold and the strictness of constraints, forming a complete causal reasoning closed loop. An example of the feature weight set is shown in Table 1.
[0102]
[0103] b. Reasoning rule set R_S
[0104] The inference rule set is constructed based on a structural causal model, using explicit cause-effect logical expressions. Each rule contains a complete causal path description of antecedent conditions, mediating variables, and outcome behavior. The logical expressions adopt an IF-THEN-WITH structure, where the WITH part incorporates weighting factors when calculating confidence. Counterfactual reasoning is used to verify the rules during execution, ensuring the robustness of the decisions. Rules are connected through a causal graph, forming a hierarchical inference network that supports complex multi-factor decision-making scenarios.
[0105] Example of inference rules:
[0106] IF (Scene Conditions ∩ Trajectory State ∩ Causal Preconditions)
[0107] THEN (Behavioral Decision ∩ Constraint Activation)
[0108] WITH Confidence level = f(feature weight, strength of evidence)
[0109] Intersection turning rules:
[0110] R_cross_turn:
[0111] IF scene_type = "intersection"
[0112] ∩ signal_color = "green"
[0113] ∩ has_pedestrian = false
[0114] ∩ turn_intent = true
[0115] ∩ clearance_distance ≥ 5m
[0116] THEN execute_turn(speed = 15km / h, curvature = smooth)
[0117] WITH confidence = 0.8 × W_clearance + 0.2 × W_traffic
[0118] c. Behavioral constraint set C_S
[0119] The behavioral constraint set defines the hard boundaries and soft limits for safe operation, serving as intervention conditions for the "do-operator" in SCM. Constraints are categorized into physical limit constraints (e.g., maximum deceleration), comfort constraints (e.g., rate of change of acceleration), and regulatory constraints (e.g., speed limits). Constraint conditions are determined through causal intervention analysis to ensure that no cascading risks are triggered under any circumstances. Constraint enforcement employs a progressive strategy, providing early warnings as constraints approach their limits.
[0120] Behavioral safety boundary constraints
[0121] C_speed_limit:
[0122] FOR ALL scenes: speed ≤ scene_speed_limit × 1.1
[0123] C_min_distance:
[0124] IF scene_type = "Expressway" THEN min_gap ≥ 2s
[0125] IF scene_type = "City Road" THEN min_gap ≥ 1.5s
[0126] C_emergency_stop:
[0127] WHEN obstacle_distance < emergency_threshold
[0128] THEN max_deceleration ≥ 0.8g WITHIN 0.5s
[0129] Example of a complete reasoning process:
[0130] Scenario X: School district during school hours
[0131] enter:
[0132] Photo features: zebra crossing, school sign, dense pedestrian traffic.
[0133] Track location: within a 100m radius of the school's coordinates
[0134] Timestamp: 07:30 AM
[0135] Weather conditions: Sunny
[0136] Classifier output: scene_type = "School area" (confidence 0.92)
[0137] Weighted activation: W_S_school = [Pedestrian density: 0.9, Speed: 0.8, ...]
[0138] Rule matching:
[0139] R_school_slow: If the condition is met, deceleration will be applied.
[0140] R_pedestrian_yield: When a pedestrian approaches, yield and wait.
[0141] Constraint checks:
[0142] C_speed_limit: Current speed 30km / h → Limit to 20km / h
[0143] C_min_stop_distance: Maintain a safe distance of 3m
[0144] Ultimate behavioral intent: Smoothly reduce speed to 20 km / h and prepare to stop so pedestrians can give way.
[0145] When a new scene or significant change in environmental state is detected, an incremental learning or meta-learning process is triggered, employing an engineered combination algorithm of "MAML + LoRA". Utilizing a small amount of labeled data from the new scene or online feedback signals, local model parameters are quickly fine-tuned or the scene rule base is updated to improve system adaptability.
[0146] The MAML (Model-Agnostic Meta-Learning) algorithm provides a powerful "meta-initial model" that enables rapid learning. The LoRA (Low-Rank Adaptation) algorithm generates a dedicated lightweight adapter for the current specific scenario with extremely low computational cost when a change is detected.
[0147] For example: encountering a completely new environment (e.g., encountering dense fog for the first time).
[0148] Triggering mechanism: The perception module detects a huge difference between the image feature distribution and the training set (excessive divergence).
[0149] Execution steps:
[0150] ① Meta-learning (MAML): Load the pre-trained "general initial model".
[0151] ② Incremental Learning (LoRA): A LoRA adapter is trained using the currently collected foggy data. EWC constraints ensure that the original vehicle detection capabilities are not compromised when learning foggy features.
[0152] Result: A set of exclusive fog feature weights W_S_fog was generated and stored in the scene library.
[0153] It is now necessary to clarify the relationship between the scene-adaptive inference engine and the hybrid inference model below: The scene-adaptive inference engine dynamically loads the corresponding feature weight set, inference rule set, and behavior constraint set based on the currently identified scene type, and inputs the inference rule set and behavior constraint set as logical constraints into the hybrid inference model to guide its intent inference process; at the same time, the scene-adaptive inference engine dynamically adjusts the parameter configuration of the hybrid inference model according to the current scene complexity; the hybrid inference model performs intent inference under the combined effect of logical constraints and dynamic parameter configuration, and outputs the intent result and its confidence level.
[0154] Step S6: Intelligent reasoning of trajectory intent: Construct a hybrid reasoning model, taking motion fingerprint, scene label, state parameters and context object state as input, and outputting the probability distribution of trajectory intent.
[0155] The hybrid inference model is a hybrid model of "graph neural network + spatiotemporal attention mechanism", in which the graph structure includes: nodes representing the state of the same moving vehicle in different time segments, and edges connecting these temporally adjacent or related nodes.
[0156] The hybrid inference model employs a Graph Convolutional Network (GCN) to aggregate neighbor node information for modeling interactions, and uses a spatiotemporal Transformer to capture contextual information at long-term dependencies and critical decision moments. Based on an attention mechanism, it focuses on the most critical features and time steps for intent judgment. The model input consists of: motion fingerprint + scene label + state parameters + context object state; the model output is: intent probability distribution P, such as P(right turn), P(lane change), P(accelerate), P(parking wait), etc. The model supports combined intent outputs, such as "right turn + uphill," "lane change + U-turn," etc.
[0157] Graph convolutional networks model the dependencies, evolution, and propagation patterns of the state of a single entity (moving vehicle) over time. Updated node features: new features for each time segment node, aggregating information from its context time segments (such as preceding and succeeding segments). This ensures that the representation of each segment includes the encoding of its historical context and future trends.
[0158] a. Number of Graph Convolutional Network (GCN) Layers: As shown in Figure 6, the model adopts a 3-layer GCN structure. The first layer is used to extract first-order features of the local neighborhood and capture the influence of contextual trajectory motion features; the second layer extends to the second-order neighborhood and models indirect interactions; the third layer performs global feature aggregation to achieve the extraction of high-order semantic information. The 3-layer structure achieves the best balance between model expressive power and computational efficiency, avoiding the oversmoothing problem caused by deep GCNs.
[0159] b. Dynamic Graph Construction Mechanism: The graph structure is not static but dynamically updated based on the spatiotemporal coordinates of each frame. An adjacency matrix is constructed using the K-nearest neighbor algorithm or distance thresholding method, and learnable edge weights are introduced to enhance the model's ability to perceive the intensity of interactions.
[0160] The Spatiotemporal Transformer (STrans) is responsible for modeling the spatiotemporal evolution of trajectory segments. Traditional Transformers perform excellently in natural language processing, but require adaptation for direct application to trajectory data. Therefore, a spatiotemporal encoder-decoder structure needs to be designed, such as... Figure 7 As shown:
[0161] a. Encoder: Consists of two layers of spatiotemporal attention blocks, including:
[0162] Spatial self-attention mechanism: At each time step, the spatial correlation between the trajectory and the reference trajectory segment is calculated;
[0163] Temporal self-attention mechanism: performing temporal modeling of historical trajectories to capture motion trends;
[0164] Position encoding: Introducing a sine function and learnable position embeddings to preserve spatiotemporal order information.
[0165] b. Decoder: Also consists of 2 layers, with a structure similar to the encoder, but introduces an additional encoder-decoder based multi-head attention mechanism to refer to historical context when predicting future trajectories. The decoder generates trajectory points for the next T time steps in an autoregressive manner and combines them with an intent classification head to output the intent category (such as "turn left", "change lanes", "stop" etc.).
[0166] GNN-STrans forms a closed-loop feedback mechanism with the scene-adaptive inference engine in step S5 above. The specific relationship is as follows:
[0167] a. Input dependencies: such as Figure 8As shown, the graph structure construction of GNN-STrans relies on the scene semantic labels (such as "intersection", "congested road section", "school area", etc.) output in step S5. The interaction rules of the agents differ in different scenarios. For example, in a school area, vehicles are more likely to slow down and yield, so the graph edge weights should enhance the connection strength between pedestrians and vehicles. That is, the scene-adaptive inference engine provides the GNN module with prior knowledge of scene perception, guiding the dynamic construction of the graph structure, enabling the model to possess a progressive logic of "scene perception—interaction modeling—intent reasoning".
[0168] b. Adaptive Parameter Adjustment: Step S5 dynamically adjusts the model parameters of GNN-STrans based on the current scene complexity. For example, in high-density scenes, the receptive field of GCN is increased or the attention window of Transformer is expanded; in low-risk scenes, model complexity is reduced to save computing power. This can be achieved through a meta-controller, with scene features as input and model hyperparameter configuration as output, realizing true "scene adaptation".
[0169] Step S7: Multi-label automatic annotation output: First, construct a hierarchical label system, including the first layer: behavior category (turning, changing lanes, accelerating, etc.), the second layer: scene semantics (intersection, tunnel, rainy day, etc.), and the third layer: intent interpretation (avoidance, detour, waiting for green light, etc.).
[0170] Example of annotation format:
[0171] json
[0172] {
[0173] "timestamp": "2025-11-30T10:00:00Z",
[0174] "location": {"lat": 34.12345, "lon": 117.65432, "alt": 50.5},
[0175] "behavior": {"primary": "lane change", "confidence": 0.95},
[0176] "scene": {"type": "intersection", "condition": "light rain"},
[0177] "intention": {"primary": "turn left after changing lanes", "confidence": 0.88}
[0178] This invention also discloses a trajectory intent intelligent reasoning system that integrates motion fingerprinting and scene adaptation, such as... Figure 9 As shown, it includes:
[0179] Data acquisition unit: includes GNSS receiver, IMU, camera, and environmental sensors, used to collect multi-source trajectory data;
[0180] Data processing unit: at least one processor for executing a computer program stored in memory to implement the method described above;
[0181] Data storage unit: used to store multi-source data, motion fingerprint template library, scene rule library, training model parameters and annotation results;
[0182] Data communication unit: used for data transmission, including uploading annotation results and downloading updated models;
[0183] Visual interactive unit: used to display trajectory, annotation results, and receive human feedback;
[0184] Each unit achieves data synchronization through a unified timestamp alignment mechanism, supporting online learning and model iteration.
[0185] Finally, it should be noted that the above-described embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and not to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation, characterized in that, Includes the following steps: S1: Multi-source data acquisition and preprocessing: Acquire multi-source trajectory data of the moving vehicle during its journey and preprocess it to form a trajectory object dataset; S2: Motion fingerprint modeling and extraction: Extract motion features of trajectory segments from trajectory object datasets to construct multi-dimensional feature vectors, extract local temporal patterns through neural network models and combine time attention mechanism to strengthen the weights of key behaviors to obtain motion fingerprints; S3: Scene semantic perception and classification: Based on visual data and high-precision map data, identify the scene type and dynamic state of the moving vehicle, and output scene labels and state parameters; S4: Scene Adaptive Inference Engine Construction: Configure exclusive feature weight sets, inference rule sets, and behavior constraint sets for different scene types, and dynamically load corresponding model branches or adjust parameters based on scene recognition results; S5: Intelligent reasoning of trajectory intent: Construct a hybrid reasoning model, taking motion fingerprint, scene label, state parameters and context object state as input, and outputting the probability distribution of trajectory intent; S6: Automatic multi-label annotation output: Based on a hierarchical labeling system, it generates structured semantic labeling results that include behavior categories, scene semantics, and intent interpretations.
2. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 1, characterized in that, The multi-source trajectory data mentioned in step S1 includes GNSS trajectory data, IMU data, OBD data, visual data, and high-precision map data; The preprocessing includes Kalman filtering smoothing, linear interpolation completion of missing values, IMU data integral correction displacement, alignment of image frames and trajectory timestamps, and data redundancy filtering; the trajectory object dataset includes a trajectory motion feature data set and a trajectory scene semantic data set.
3. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 1, characterized in that, The motion features mentioned in step S2 include spatial features, elevation features, and velocity features of the trajectory; the neural network model is a 1D-CNN network, and the time attention mechanism is implemented through a GRU neural network.
4. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 3, characterized in that, The structure of the 1D-CNN network includes an input layer, a convolutional layer, and a fully connected layer. The input layer receives a multi-dimensional trend feature vector matrix composed of the cumulative changes in velocity, acceleration, heading angle, and elevation. The convolutional layer uses multiple convolutional layers without pooling layers. After convolution, a bias is added and a ReLU activation function is introduced. The fully connected layer uses a softmax activation function and outputs the probability of motion fingerprint labels.
5. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 4, characterized in that, The GRU neural network includes an input layer, a GRU layer, and a fully connected layer. The input layer receives the temporal feature vector output by the 1D-CNN network. The GRU layer controls the retention ratio between the previous hidden state and the current candidate hidden state through the update gate, and determines the dependence of the current candidate hidden state on the previous hidden state through the reset gate. The fully connected layer uses the softmax activation function and outputs behavioral feature probabilities.
6. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 1, characterized in that, The scenario types mentioned in step S3 include ordinary roads, indoor roads, expressways / highways, and special areas. Ordinary roads include surface roads, intersections, and underpasses; indoor roads include tunnels and underground parking garages; expressways / highways include closed roads, ramps, and service areas; and special areas include school zones, hospitals, construction zones, and congestion hotspots. The dynamic state refers to the real-time movement of the vehicle, including acceleration, deceleration, straight-line movement, parking, turning, uphill / downhill movement, and U-turn.
7. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 1, characterized in that, The feature weight set mentioned in step S4 reflects the relative importance of each perception feature under different scenarios through quantitative numerical allocation, and is dynamically adjusted using Bayesian optimization based on historical accident data and expert experience; the inference rule is based on a structural causal model to construct a "cause-effect" logical expression, and the logical expression adopts an IF-THEN-WITH structure; the behavioral constraint set includes physical limit constraints, comfort constraints and regulatory constraints, which are determined through causal intervention analysis.
8. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 1, characterized in that, The scene adaptive reasoning engine dynamically loads the corresponding feature weight set, reasoning rule set, and behavior constraint set based on the currently identified scene type, and inputs the reasoning rule set and behavior constraint set as logical constraints into the hybrid reasoning model to guide its intention reasoning process; Meanwhile, the scene-adaptive inference engine dynamically adjusts the parameter configuration of the hybrid inference model according to the complexity of the current scene; the hybrid inference model performs intent inference under the combined effect of logical constraints and dynamic parameter configuration, and outputs the intent result and its confidence level.
9. The trajectory intent intelligent reasoning method integrating motion fingerprinting and scene adaptation according to claim 1, characterized in that, The hybrid inference model mentioned in step S5 is a hybrid model of "graph neural network + spatiotemporal attention mechanism"; the graph neural network adopts a 3-layer GCN structure, and the graph structure is dynamically updated according to the spatiotemporal coordinates of each frame; the spatiotemporal attention mechanism is implemented by spatiotemporal Transformer, which includes an encoder and a decoder, each containing 2 layers of spatiotemporal attention blocks.
10. A trajectory intent intelligent reasoning system integrating motion fingerprinting and scene adaptation, characterized in that, include: Data acquisition unit: includes GNSS receiver, IMU, camera, and environmental sensors, used to collect multi-source trajectory data; Data processing unit: at least one processor for executing a computer program stored in a memory to implement the method as described in any one of claims 1-9; Data storage unit: used to store multi-source data, motion fingerprint template library, scene rule library, training model parameters and annotation results; Data communication unit: used for data transmission, including uploading annotation results and downloading updated models; Visual interactive unit: used to display trajectory, annotation results, and receive human feedback; Each unit achieves data synchronization through a unified timestamp alignment mechanism, supporting online learning and model iteration.