Road surface automatic driving vehicle dynamic decision method, device, system and storage medium
By constructing a semantically enhanced state space and introducing a brain-like inhibitory control mechanism, the system addresses the issues of decision-making rigidity and safety in complex scenarios of autonomous driving systems. It achieves automatic identification and safe decision-making for vehicles with critical risks, thereby improving the robustness and ride comfort of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG UNIV OF TECH
- Filing Date
- 2026-03-04
- Publication Date
- 2026-06-12
AI Technical Summary
Existing autonomous driving decision-making methods struggle to exhaust all possibilities when faced with complex and ever-changing long-tail conditions. They consume significant computational resources, have low utilization rates of perception information, cannot handle changes in the number of vehicles, lack adaptability to long traffic flow sequences, and are difficult to guarantee safety.
We construct a semantic state space that includes TTC and entry intent, use multi-frame stacking to simulate brain-like short-term memory, extract interactive features through Ego-GraphTransformer, and combine Dueling DQN and action masking mechanisms to generate safe decision instructions. We also introduce a brain-like inhibitory control mechanism to shield impulsive behaviors that violate safety constraints.
It enhances the ability to predict the movement trends of surrounding vehicles, automatically identifies key risk vehicles, ensures that decision outputs comply with physical safety constraints, improves the robustness and training efficiency of the system, and enhances the interactive perception capabilities and ride comfort in complex dynamic scenarios.
Smart Images

Figure CN122186207A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of autonomous driving technology, specifically relating to a dynamic decision-making method, device, system, and storage medium for autonomous vehicles on the road. Background Technology
[0002] Lane merging and lane changing in congested areas are highly dynamic and interactive, posing significant challenges to the deployment of autonomous driving technology. Traditional decision-making and planning methods primarily rely on finite state machines (FSMs) or rule bases. However, when faced with complex and ever-changing long-tail scenarios, rule bases struggle to enumerate all possibilities, leading to rigid decision-making. In recent years, data-driven methods based on deep reinforcement learning (DRL) have attracted considerable attention due to their self-evolutionary capabilities. However, in practical applications, they still face challenges such as low utilization of perception information, slow training convergence, and difficulties in ensuring safety.
[0003] Existing methods are mainly divided into two categories: end-to-end decision-making methods based on convolutional neural networks (CNNs) and vectorized decision-making methods based on fully connected networks (MLPs). The CNN-based end-to-end decision-making method uses the raw images acquired by sensors as input, extracts features using CNNs, and directly outputs control signals. The fully connected network-based vectorized decision-making method concatenates the states of surrounding vehicles into a fixed-length vector and inputs it into the MLP network.
[0004] Despite the significant achievements of existing methods, several areas still require improvement. First, end-to-end decision-making methods based on convolutional neural networks (CNNs) are computationally resource-intensive, and the presence of a large amount of irrelevant background information in the images makes it difficult to accurately capture subtle kinematic changes between vehicles. Vectorized decision-making methods based on fully connected networks (MLPs) lack adaptability to variable-length traffic flow sequences, cannot handle situations with varying vehicle numbers, and lack attention mechanisms, making it difficult to identify key risk targets from numerous traffic participants. Summary of the Invention
[0005] To address the problems existing in the prior art, this invention provides a method, device, system, and storage medium for dynamic decision-making of autonomous vehicles on the road.
[0006] To achieve the above objectives, the present invention provides the following solution: A dynamic decision-making method for autonomous vehicles on roads includes: A semantic state space containing TTC and entry intent is constructed, and multi-frame stacking is used to simulate brain-like short-term memory; Based on neuromorphic short-term memory, interactive features are extracted using Ego-GraphTransformer; Based on the extracted interaction features, the Dueling DQN simulation brain value assessment framework is used to evaluate state value and action advantage, and generate decision instructions. Based on the decision-making instructions, a brain-like inhibitory control mechanism and action mask are introduced to forcibly block impulsive behaviors that violate safety constraints.
[0007] As a preferred approach, the Ego-GraphTransformer is used to simulate a visual attention mechanism to dynamically calculate the interaction weights of surrounding vehicles in order to automatically lock onto key risk targets.
[0008] As a preferred approach, a decision-making mechanism combining Dueling DQN value decoupling and physical rule-based action masking is adopted; a security check is introduced before the decision output layer to forcibly block illegal actions that violate physical constraints.
[0009] The present invention also provides a dynamic decision-making device for autonomous vehicles on roads, comprising: The first processing module is used to construct a semantic state space containing TTC and entry intent, and to simulate brain-like short-term memory by stacking multiple frames. The second processing module is used to extract interactive features based on Ego-GraphTransformer according to neuromorphic short-term memory. The third processing module is used to evaluate the value of a state and the advantages of a action based on the extracted interaction features, using the Dueling DQN simulation brain value assessment framework, and to generate decision instructions. The fourth processing module is used to introduce brain-like inhibitory control mechanisms and action masks according to decision instructions, and to forcibly block impulsive behaviors that violate safety constraints.
[0010] Preferably, the second processing module uses Ego-GraphTransformer to simulate a visual attention mechanism and dynamically calculates the interaction weights of surrounding vehicles in order to automatically lock onto key risk targets.
[0011] As a preferred option, the fourth processing module adopts a decision-making mechanism that combines Dueling DQN value decoupling with physical rule-based action masking; a security check is introduced before the decision output layer to forcibly block illegal actions that violate physical constraints.
[0012] The present invention also provides a dynamic decision-making system for autonomous vehicles on the road, comprising: a memory and a processor, wherein the memory stores a computer program executed by the processor, and the computer program executes a dynamic decision-making method for autonomous vehicles on the road when executed by the processor.
[0013] The present invention also provides a storage medium storing a computer program, which executes a dynamic decision-making method for autonomous vehicles on the road when running.
[0014] Compared with the prior art, the beneficial effects of the present invention are as follows: 1. Construct a semantically enhanced spatiotemporal perception model: By introducing high-level semantic features such as TTC and cutting intent, as well as multi-frame stacking technology, some observable problems are solved, and the ability to predict the movement trend of surrounding vehicles is improved.
[0015] 2. Achieve dynamic locking of key targets: Utilize Transformer's self-attention mechanism to automatically identify and lock onto key risk vehicles that have the greatest impact on the vehicle's decision-making.
[0016] 3. Establish a hierarchical security constraint mechanism: Through action masking and hierarchical control architecture, ensure that decision outputs comply with physical security constraints and improve the robustness of the system. Attached Figure Description
[0017] To more clearly illustrate the technical solution of the present invention, the drawings used in the embodiments are briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 This is a flowchart of the dynamic decision-making method for autonomous vehicles on the road, according to an embodiment of the present invention. Detailed Implementation
[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0020] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0021] Example 1 like Figure 1 As shown, the present invention provides a dynamic decision-making method for autonomous vehicles on roads, comprising: Step S1: Construct a semantically enhanced object-level state space. Unlike traditional image input, this invention constructs a dimension of F Vehicle feature matrix This serves as the system input at time t. Here, N represents the number of nearest neighbor vehicles within the observation range (set to N=5, i.e., the vehicle itself plus the four nearest surrounding vehicles), and F represents the feature dimension.
[0022] For each row in the matrix (representing a car i), its eigenvector Includes basic kinematic features and high-level semantic features: Basic kinematic characteristics: , : Normalized values of the longitudinal and lateral positions of vehicle i relative to its own vehicle; , : Normalized longitudinal and lateral velocities of vehicle i relative to its own vehicle; : Vehicle presence marker (0 if there is no vehicle at the location, 1 otherwise).
[0023] High-level semantic features (the innovation of this invention): (1) Inverse Time to Collision (InverseTTC): Used to quantify the risk of longitudinal rear-end collisions. The calculation formula is as follows: in, and These are the vehicle's speed and position, respectively. To prevent the use of a tiny constant (with a value of 1e-5) in the denominator, the formula is only calculated when vehicles are approaching each other; otherwise, it is set to zero.
[0024] (2) Cut-in Risk Index: Used to quantify the tendency of adjacent vehicles to intrude into the lane of the user vehicle. The calculation formula is as follows: Risk_cut=alphaabs(vy_i)+beta(1-abs(y_i) / (W_lane / 2)) in, Lane width; and The weighting factor is denoted by . This formula indicates that the greater the lateral speed of the adjacent vehicle and the farther it is from the lane centerline (i.e., the closer it is to the lane divider), the higher the risk of it cutting in.
[0025] Step S2: Temporal Feature Stacking Processing (FrameStacking) To address the partial observability issue caused by the inability to directly observe acceleration, this invention employs a sliding window mechanism to fuse time-series information. The sliding window size is defined as K (e.g., K=4). At time t, the system acquires the historical time sequence { ,..., , The K-frame matrices are concatenated along the feature channel dimension to generate an enhanced spatiotemporal state matrix. Its dimensions become Through this step, the neural network can implicitly infer the acceleration and steering intentions of surrounding vehicles from continuous changes in position and velocity.
[0026] Step S3: Interaction Feature Extraction Based on Ego-GraphTransformer The spatiotemporal state matrix S'_t obtained in step S2 is input into the Ego-GraphTransformer encoder, which consists of two parts: linear mapping and multi-head self-attention. Linear embedding: Utilizes fully connected layers to reduce the dimension of each vehicle node from... Mapping to hidden layer dimension (e.g., 128-dimensional), to obtain the embedding matrix. .
[0027] Attention weight calculation: The interaction relationships between vehicles are calculated using a multi-head self-attention mechanism. For the h-th attention head, the Query(Q), Key(K), and Value(V) matrices are calculated: Then, the attention weight matrix Attn is calculated: Key technical point: The first row of the matrix Attn (corresponding to the weights of the vehicle's Ego relative to other vehicles) intuitively reflects the risk distribution in the current scenario. For example, if the weight value of the vehicle to the left rear surges, it indicates that this vehicle has a critical influence on the vehicle's decision-making and requires close attention.
[0028] Brain-like mechanism explanation: This step simulates the "selective attention" mechanism of the human visual system. When a human driver is driving, although the retina receives the entire image, the brain only allocates high attention to vehicles with potential threats (such as rapidly approaching or crossing the line), while ignoring irrelevant vehicles in the distance. This invention reproduces this biological cognitive process through the attention weight matrix Attn.
[0029] Step S4: Decision instruction generation based on Dueling DQN The aggregated feature vector output from the Transformer is input into the Dueling network architecture. The network tail is decoupled into two independent branches: ValueStream: Output scalar , representing the overall value of the current traffic state s.
[0030] Advantage Stream: Output vector , which represents the advantage of taking action a in state s compared to the average level.
[0031] The final Q-value synthesis formula adopts a decentralized form: ) The action space contains five discrete commands: {left lane change, right lane change, hold, accelerate, decelerate}.
[0032] Step S5: Security Correction and Layered Execution Based on Action Mask To prevent the "suicidal" behavior of neural networks in the early stages of exploration, a hard constraint mechanism is introduced: Safety rule verification and mask generation: Define a mask vector M. The system monitors the physical environment in real time. If action a violates a safety rule (e.g., distance behind the target lane),... ), then let ,otherwise .
[0033] Probability correction: When selecting an action, only the action with the largest Q value is selected from the set of legal actions where M[a]=1.
[0034] Layered control execution: The corrected discrete commands from the upper layer are sent to the lower layer controller. Laterally, a PD controller adjusts the steering wheel angle to track the target lane centerline. Vertically, a PID controller adjusts the throttle / brake to achieve target speed tracking.
[0035] Brain-like mechanism explanation: This step simulates the "inhibitory control" mechanism of the human brain. Similar to how a driver's cerebral cortex sends inhibitory signals to forcibly interrupt the current lane-changing intention when they perceive danger, this invention uses action masks to build a similar "safety red line" at the algorithm's underlying layer, ensuring that the system's decisions are always within the physical safety boundary.
[0036] Furthermore, Long Short-Term Memory (LSTM) networks or Gated Recurrent Units (GRUs) can be used to replace the frame stacking technique in this embodiment to process the temporal information of vehicle trajectories, which can also solve some of the observability problems.
[0037] Furthermore, a graph neural network (GNN) or a graph convolutional network (GCN) can be used to replace the Ego-GraphTransformer in this embodiment. By constructing a topological graph between vehicles to aggregate the features of neighboring nodes, interactive feature extraction can also be achieved.
[0038] Furthermore, Proximal Policy Optimization (PPO), Soft Actor-Commentator (SAC), or Double Deep Q Network (DoubleDQN) can be used to replace the Dueling DQN algorithm in this embodiment, which can also achieve policy learning and optimization.
[0039] Furthermore, Model Predictive Control (MPC) or PurePursuit can be used to replace the PID controller in this embodiment to convert the upper-level discrete commands into specific vehicle control signals.
[0040] This invention has the following characteristics: 1. Significantly improves the safety and training efficiency of autonomous driving decision-making. This invention explicitly introduces high-level semantic features such as the reciprocal collision time count (TTC) and the intention to engage into the state space, enabling the model to perceive potential collision risks in advance. Simultaneously, by combining a rule-based action masking mechanism, illegal actions that violate physical safety constraints are forcibly masked at the decision output layer. This "hardware-software combined" approach not only effectively avoids the "suicidal" collision behavior of reinforcement learning agents in the early stages of exploration, ensuring the bottom-line safety of the training process, but also significantly shortens the algorithm's convergence time.
[0041] 2. Enhanced interactive perception and environmental adaptability in complex dynamic scenarios. Unlike traditional fully connected network (MLP) methods that use static weights to process surrounding vehicles, this invention utilizes the Ego-GraphTransformer structure to dynamically calculate the importance weights of surrounding vehicles to the vehicle's decision-making through a multi-head self-attention mechanism. This enables the system to automatically filter and lock key risk targets (such as rapidly merging vehicles) in multi-vehicle game scenarios such as merging and lane changing in congestion, effectively solving the problem of processing variable-length traffic flow sequences. Furthermore, the object-list-based calculation method offers stronger real-time performance compared to image processing.
[0042] 3. This invention addresses the decision oscillation problem caused by partial observability deviations (POMDP), improving ride comfort. It utilizes multi-frame state stacking technology to fuse historical time-series information, enabling the neural network to implicitly infer the acceleration and steering trends of surrounding vehicles. Combined with the value decoupling architecture of Dueling DQN and a hierarchical control strategy (upper-level discrete decision-making + lower-level PID smooth execution), it effectively suppresses vehicle swaying (the "dragon-drawing" phenomenon) caused by incomplete environmental perception or Q-value overestimation, achieving smoother, more human-like driving control.
[0043] 4. Excellent algorithm interpretability. By visualizing the attention weight heatmap of the Transformer module, this invention can intuitively show the focus of the agent at a specific decision moment (e.g., when changing lanes to the left, the heatmap shows that the agent pays more attention to the vehicle behind it on the left), providing an intuitive basis for algorithm logic verification and failure analysis, and overcoming the "black box" and uninterpretable defects of traditional deep reinforcement learning models.
[0044] Compared to existing state-of-the-art techniques (such as CNN-based end-to-end methods or traditional MLP decision-making methods), this invention significantly improves the algorithm's interactive perception capabilities and computational efficiency in complex dynamic scenarios by combining the Ego-GraphTransformer architecture with a semantically enhanced state space. Traditional CNN methods suffer from high computational redundancy and struggle to capture subtle kinematic changes, while this invention's object list-based input drastically reduces computational requirements, and its attention mechanism intuitively displays the agent's focus (e.g., paying close attention to the car behind when changing lanes), effectively solving the problem of the "black box" nature of deep learning models.
[0045] Furthermore, this invention offers enhanced adaptability and safety. Addressing the shortcomings of existing reinforcement learning methods, such as the susceptibility to collisions and slow convergence in the early stages of exploration, this invention introduces action masking technology and high-level semantic warnings. This "hardware-software combined" approach embeds physical safety constraints at the algorithm's core, preventing "suicidal" exploration behavior. Simultaneously, the hierarchical control architecture (upper-level decision-making + lower-level PID) effectively smooths control commands, resolving vehicle vibration issues caused by end-to-end output and significantly improving ride comfort.
[0046] Example 2 The present invention also provides a dynamic decision-making device for autonomous vehicles on roads, comprising: The first processing module is used to construct a semantic state space containing TTC and entry intent, and to simulate brain-like short-term memory by stacking multiple frames. The second processing module is used to extract interactive features based on Ego-GraphTransformer according to neuromorphic short-term memory. The third processing module is used to evaluate the value of a state and the advantages of a action based on the extracted interaction features, using the Dueling DQN simulation brain value assessment framework, and to generate decision instructions. The fourth processing module is used to introduce brain-like inhibitory control mechanisms and action masks according to decision instructions, and to forcibly block impulsive behaviors that violate safety constraints.
[0047] As one embodiment of the present invention, the second processing module uses Ego-GraphTransformer to simulate a visual attention mechanism and dynamically calculates the interaction weights of surrounding vehicles in order to automatically lock onto key risk targets.
[0048] As one embodiment of the present invention, the fourth processing module adopts a decision mechanism that combines Dueling DQN value decoupling with physical rule-based action masking; a security check is introduced before the decision output layer to forcibly block illegal actions that violate physical constraints.
[0049] Example 3 The present invention also provides a dynamic decision-making system for autonomous vehicles on the road, comprising: a memory and a processor, wherein the memory stores a computer program executed by the processor, and the computer program executes a dynamic decision-making method for autonomous vehicles on the road when executed by the processor.
[0050] Example 4 The present invention also provides a storage medium storing a computer program, which executes a dynamic decision-making method for autonomous vehicles on the road when running.
[0051] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.
Claims
1. A dynamic decision-making method for autonomous vehicles on roads, characterized in that, include: A semantic state space containing TTC and entry intent is constructed, and multi-frame stacking is used to simulate brain-like short-term memory; Based on neuromorphic short-term memory, interactive features are extracted using Ego-GraphTransformer; Based on the extracted interaction features, the Dueling DQN simulation brain value assessment framework is used to evaluate state value and action advantage, and generate decision instructions. Based on the decision-making instructions, a brain-like inhibitory control mechanism and action mask are introduced to forcibly block impulsive behaviors that violate safety constraints.
2. The dynamic decision-making method for automated driving vehicles on roads as described in claim 1, characterized in that, By using Ego-GraphTransformer to simulate a visual attention mechanism, the interaction weights of surrounding vehicles are dynamically calculated to automatically lock onto key risk targets.
3. The dynamic decision-making method for autonomous vehicles on roads as described in claim 2, characterized in that, A decision-making mechanism combining DuelingDQN value decoupling and physical rule-based action masking is adopted; a security check is introduced before the decision output layer to forcibly block illegal actions that violate physical constraints.
4. A dynamic decision-making device for autonomous vehicles on roads, characterized in that, include: The first processing module is used to construct a semantic state space containing TTC and entry intent, and to simulate brain-like short-term memory by stacking multiple frames. The second processing module is used to extract interactive features based on Ego-GraphTransformer according to neuromorphic short-term memory. The third processing module is used to evaluate the value of a state and the advantages of a action based on the extracted interaction features, using the Dueling DQN simulation brain value assessment framework, and to generate decision instructions. The fourth processing module is used to introduce brain-like inhibitory control mechanisms and action masks according to decision instructions, and to forcibly block impulsive behaviors that violate safety constraints.
5. The dynamic decision-making device for automated driving vehicles on the road as described in claim 4, characterized in that, The second processing module uses Ego-GraphTransformer to simulate a visual attention mechanism, dynamically calculating the interaction weights of surrounding vehicles to automatically lock onto key risk targets.
6. The dynamic decision-making device for automated driving vehicles on the road as described in claim 5, characterized in that, The fourth processing module adopts a decision-making mechanism that combines Dueling DQN value decoupling with physical rule-based action masking; a security check is introduced before the decision output layer to forcibly block illegal actions that violate physical constraints.
7. A dynamic decision-making system for autonomous vehicles on roads, characterized in that, include: A memory and a processor, wherein the memory stores a computer program executed by the processor, the computer program performing the dynamic decision-making method for automated driving vehicles on the road as described in any one of claims 1-3 when executed by the processor.
8. A storage medium, characterized in that, The storage medium stores a computer program that, when executed, performs the dynamic decision-making method for automated driving vehicles on the road as described in any one of claims 1-3.