A method and system for dynamic decision-making of unprotected left turn of an unmanned vehicle based on a brain-like neural network

By simulating the cognitive mechanism of the human brain through a brain-like neural network, a traffic scene map is constructed and a structured left-turn action map is generated, which solves the problems of decision-making instability and safety of unmanned vehicles at complex urban traffic intersections, and realizes safe and efficient unmanned vehicle decision-making and cross-scene adaptation.

CN122275918APending Publication Date: 2026-06-26GUANGDONG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGDONG UNIV OF TECH
Filing Date
2026-04-01
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In complex urban traffic intersections, unprotected left-turn scenarios for existing autonomous vehicles suffer from an excessively large decision-making search space, unstable behavioral choices, difficulty in balancing safety and traffic efficiency, and insufficient cross-scenario generalization ability.

Method used

A brain-like neural network is used to simulate the cognitive mechanism of the human brain. By combining prefrontal cortex-like collaborative spatial reasoning and hippocampal-like episodic memory retrieval, a traffic scene map is constructed and a structured left-turn action map is generated. An action masking mask is generated through feasibility and risk assessment. The optimal driving behavior is selected using a proximal strategy optimization algorithm and combined with a trajectory tracking controller to generate vehicle control commands.

Benefits of technology

It significantly improves the decision-making stability and safety of autonomous vehicles in complex intersection scenarios, achieves an adaptive balance between safety and efficiency, and enhances the generalization ability across scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122275918A_ABST
    Figure CN122275918A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of intelligent transportation and autonomous driving technology, and discloses a dynamic decision-making method and system for unprotected left turns of unmanned vehicles based on neuromorphic neural networks. The method constructs a traffic scene graph and introduces a neuromorphic neural network for prefrontal cortex-like collaborative spatial reasoning to generate scene-level semantic representations. Based on this, a structured left-turn action graph is constructed, and feasibility and risk assessments are performed on candidate driving behaviors. An action mask is generated to dynamically compress the decision space. Within the framework of a proximal policy optimization algorithm, a risk budget mechanism is combined to achieve an adaptive balance between safety and efficiency. Simultaneously, a scene graph memory is constructed, and historically successful strategies are reused through similarity retrieval to improve cross-scene generalization ability. This invention effectively solves the problems of excessively large decision space, unstable behavior, difficulty in balancing safety and efficiency, and insufficient generalization ability of existing unmanned vehicles in complex intersection scenarios, significantly improving the stability, safety, and adaptability of decision-making.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent transportation and autonomous driving technology, specifically relating to a dynamic decision-making method and system for unprotected left turns of unmanned vehicles based on brain-like neural networks. Background Technology

[0002] With the development of autonomous driving technology, the behavioral decision-making ability of unmanned vehicles in complex urban environments has become one of the key bottlenecks restricting their large-scale application. Existing unmanned vehicle decision-making and planning methods can be mainly divided into three categories: rule-based methods, optimization search-based methods, and learning-based methods. Rule-based methods rely on manually designed traffic rules and decision logic, such as finite state machines or decision trees. These methods can ensure basic safety and interpretability in simple scenarios, but they are difficult to fully characterize the dynamic interaction relationships between traffic participants. When the complexity of the scenario increases or traffic behavior patterns change, their adaptability is significantly insufficient. Optimization search-based methods usually model unified unmanned vehicle decision-making and path planning as an optimization problem, generating feasible paths in a continuous space through search or sampling methods, such as methods based on rapidly expanding random trees or model predictive control. However, in complex scenarios with multiple traffic participants and high uncertainty, such as urban intersections, the candidate decision space grows exponentially, the search efficiency decreases significantly, and they are highly sensitive to the prediction errors of traffic participant behavior, which can easily lead to unstable or overly conservative decisions. In recent years, decision-making methods for autonomous vehicles based on deep learning and reinforcement learning have gradually attracted attention. These methods model traffic scenarios through neural networks and directly output driving behaviors or trajectories, which improves the flexibility and adaptability of decision-making to a certain extent. However, existing learning-based methods mostly adopt an end-to-end decision structure, which highly couples environmental perception, spatial structure understanding, and behavioral decision-making processes. They lack explicit structured modeling of candidate driving behaviors, causing the policy network to directly select actions in a high-dimensional, continuous, and uncertain decision space. This results in problems such as an excessively large decision space, insufficient training stability, and poor interpretability of the decision-making process.

[0003] In typical interaction-intensive scenarios such as unprotected left turns, autonomous vehicles not only need to understand the road topology and conflict zone distribution, but also need to comprehensively consider the motion states and uncertain behaviors of oncoming vehicles, pedestrians, and other traffic participants. Existing technologies often struggle to effectively constrain the risks inherent in different behavioral choices while ensuring spatial feasibility, easily leading to an imbalance between safety and traffic efficiency, manifesting as overly conservative or aggressive driving behaviors. Furthermore, current autonomous vehicle decision-making methods generally lack a systematic memory and reuse mechanism for complex historical traffic scenarios. When scenarios change or similar interactions reappear, they struggle to quickly adapt using past successful decision-making experience, resulting in limited cross-scenario generalization capabilities. Human drivers, when dealing with such complex traffic scenarios, can integrate spatial and interaction information through the co-reasoning ability of the prefrontal cortex, while simultaneously utilizing the episodic memory function of the hippocampus to recall past successful experiences, thereby making safe, efficient, and highly adaptable decisions.

[0004] Inspired by this, this invention introduces a brain-like neural network to simulate the cognitive mechanism of the human brain, combining prefrontal cortex-like collaborative spatial reasoning with hippocampal-like episodic memory retrieval, aiming to fundamentally solve the core problems faced by existing technologies. Summary of the Invention

[0005] To address the problems of excessively large decision-making search space, unstable behavior selection, difficulty in dynamically balancing safety and traffic efficiency, and insufficient cross-scenario generalization ability in existing autonomous driving technologies at complex urban traffic intersections, especially in highly interactive and uncertain scenarios such as unprotected left turns, this invention provides a dynamic decision-making method and system for unprotected left turns of autonomous vehicles based on brain-like neural networks. This effectively solves the problems of excessively large decision-making space, unstable behavior, difficulty in balancing safety and efficiency, and insufficient generalization ability of existing autonomous vehicles in complex intersection scenarios, and significantly improves the stability, safety, and adaptability of decision-making.

[0006] To achieve the above objectives, the present invention provides the following solution: A dynamic decision-making method for unprotected left turns of autonomous vehicles based on brain-like neural networks, the method comprising: Acquire multi-source traffic environment information for autonomous vehicles and construct a unified decision input representation; Based on a unified decision input representation, a traffic scene map is constructed and a brain-like neural network is introduced to perform prefrontal collaborative spatial reasoning to generate scene-level semantic representations. Based on scene-level semantic representation, a structured left-turn action graph is constructed; Based on the trained brain-like neural network, the feasibility and risk assessment of each candidate driving behavior in the structured left turn action map is performed, and an action masking mask is generated. Based on scene-level semantic representation and action masking, under the framework of near-end policy optimization algorithm, the conditional probability distribution of each candidate behavior is obtained and the candidate behavior with the highest probability is selected as the optimal driving behavior at the current moment. Based on the optimal driving behavior and its associated candidate paths, combined with the vehicle's current motion state, specific vehicle control commands are generated through the trajectory tracking controller.

[0007] Preferably, methods for constructing a unified decision input representation include: ; in, Indicates the feature fusion function; This indicates the motion state of the vehicle at time t; Indicates each traffic participant The state vector; Indicates the number of traffic participants; Represents road topology and lane geometry information; Indicates the status of traffic lights; This indicates the remaining time of the corresponding signal phase.

[0008] Preferred methods for constructing traffic scene maps include: At time t, the traffic scenario diagram is constructed as follows: Node set Defined as: ; Among them, the node set Including this vehicle node Traffic Participant Nodes and road structure nodes ; Graph edge set This is used to describe the spatial adjacency and potential conflict relationships between nodes. When a preset spatial distance constraint is met or there is a possibility of trajectory conflict, an edge connection is established between the corresponding nodes. The determination condition is expressed as follows: ; in, This is a distance threshold function. Determine if there are conflicts between nodes. For traffic participants nodes.

[0009] Preferably, methods for introducing brain-like neural networks to perform prefrontal cortex-like collaborative spatial reasoning and generate scene-level semantic representations include: ; Where L represents the last layer of the graph neural network; In the In a layered graph neural network, the update rule for node features is expressed as follows: ; in, Represents a node The neighborhood set, and It is a learnable nonlinear mapping function. Let v be the feature representation of node v in the l-th layer of the graph neural network. Let be the feature representation of neighbor node u in the l-th layer. This is used to describe the relationship between node u and node v.

[0010] Preferably, the structured left-turn action graph includes multiple candidate driving behavior nodes, including at least: waiting behavior, forward movement behavior, entering the conflict zone behavior, and rapid passage behavior.

[0011] Preferred methods for conducting feasibility and risk assessments of candidate driving behaviors in structured left-turn motion diagrams include: For any path Its feasibility evaluation function is defined as: ; Simultaneously, a path risk function is introduced to characterize the degree of risk of potential conflicts with surrounding traffic participants during path execution, which is defined as: ; in, Indicates along the path During the journey, within the predicted time This car and the first The distance between traffic participants To predict the length of the time domain, Let m be the safe distance threshold, representing the m-th candidate path. .

[0012] Preferred methods for obtaining the conditional probability distribution of each candidate behavior and selecting the candidate behavior with the highest probability as the optimal driving behavior at the current moment, based on scene-level semantic representation and action masking within the framework of near-end policy optimization algorithms, include: ; ; in, The optimal path; For efficient policy distribution; The number of candidate paths. This represents the policy function based on the policy parameter θ, which outputs the selection probability of each candidate path. This represents the policy distribution after mask constraint correction. This is the path mask.

[0013] Preferably, the method for generating specific vehicle control commands through a trajectory tracking controller based on optimal driving behavior and its associated candidate paths, combined with the vehicle's current motion state, includes: ; in, This represents the control input for the vehicle at time t. and These represent the steering control quantity and the longitudinal acceleration control quantity, respectively. (·) is the mapping function from trajectory to control command. The currently selected optimal path, Current status of the driverless car; driverless car status for: ; in, The vehicle's current location coordinates Current heading angle, Current speed, Current acceleration; Optimal path express: ; Each point contains: location and expected speed , It represents the number of path sampling points.

[0014] The present invention also provides a dynamic decision-making system for unprotected left turns of unmanned vehicles based on brain-like neural networks. The system is used to implement the aforementioned method. The system includes: a multi-source information fusion and unified representation construction module, a scene graph construction and brain-like spatial reasoning module, a structured action graph generation module, a behavior feasibility and risk assessment module, a proximal strategy optimization and behavior decision-making module, and a trajectory tracking and vehicle control module. The multi-source information fusion and unified representation construction module is used to acquire multi-source information on the traffic environment of unmanned vehicles and construct a unified decision input representation. The scene graph construction and brain-like spatial reasoning module is used to construct a traffic scene graph based on a unified decision input representation and introduce a brain-like neural network to perform prefrontal cortex collaborative spatial reasoning to generate scene-level semantic representations. The structured action graph generation module is used to construct a structured left-turn action graph based on scene-level semantic representation; The behavior feasibility and risk assessment module is used to assess the feasibility and risk of each candidate driving behavior in the structured left turn action diagram based on the trained brain-like neural network, and generate an action masking mask. The near-end policy optimization and behavior decision module is used to obtain the conditional probability distribution of each candidate behavior based on scene-level semantic representation and action masking, and select the candidate behavior with the highest probability as the optimal driving behavior at the current moment within the near-end policy optimization algorithm framework. The trajectory tracking and vehicle control module is used to generate specific vehicle control commands through the trajectory tracking controller based on the optimal driving behavior and its associated candidate paths, combined with the current motion state of the vehicle.

[0015] Compared with the prior art, the beneficial effects of the present invention are as follows: (1) Improve decision-making stability and safety: By using structured action graphs and action masking mechanisms, combined with the spatial reasoning ability of brain-like neural networks, the decision-making space is effectively reduced, the instability caused by high-dimensional decision-making problems is reduced, and the driving safety of unmanned vehicles in complex intersection scenarios is guaranteed through explicit risk assessment and spatial feasibility checks.

[0016] (2) Adaptive balance between safety and efficiency: By introducing a risk budget constraint mechanism, the risk tolerance can be dynamically adjusted according to the uncertainty level of the current traffic environment, so as to achieve efficient passage under the premise of ensuring safety and avoid low traffic efficiency due to overly conservative or aggressive decisions.

[0017] (3) Rapid adaptation and experience reuse across scenarios: Through the scenario graph memory bank and similarity retrieval mechanism, the present invention can use historical successful driving strategies to quickly adapt to new scenarios, significantly improve the generalization ability of the model in long-tail scenarios, and realize the simulation of hippocampal memory function. Attached Figure Description

[0018] To more clearly illustrate the technical solution of the present invention, the drawings used in the embodiments are briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 is a traffic scene diagram and a left-turn action diagram in an embodiment of the present invention; Figure 2 is a schematic diagram of risk assessment and action masking in an embodiment of the present invention; Figure 3 is a schematic diagram of strategy decision-making and scenario graph memory construction in an embodiment of the present invention; Figure 4 is a flowchart of the unprotected left turn decision method for unmanned vehicles at complex intersections in an embodiment of the present invention. Detailed Implementation

[0020] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0021] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0022] Example 1 like Figures 1-4 As shown, this invention provides a dynamic decision-making method for unprotected left turns of autonomous vehicles based on brain-like neural networks, which is particularly suitable for behavioral decision-making in complex interactive scenarios such as unprotected left turns in urban areas. It includes the following steps: Step S1: Acquisition of Multi-Source Traffic Environment Information and Construction of Traffic Scene Map. The autonomous vehicle acquires real-time traffic environment information from its surroundings through onboard perception systems (such as cameras, LiDAR, and millimeter-wave radar), and obtains beyond-line-of-sight traffic information through a vehicle-to-infrastructure (V2I) communication system (effectively compensating for the limitations of onboard perception systems, enabling early perception of traffic conditions ahead, and improving autonomous driving decision-making capabilities). This includes traffic light status and remaining time. The traffic environment information includes at least: road topology (lane lines, intersection boundaries, conflict zone distribution), the vehicle's motion state (position, speed, acceleration, heading angle, yaw rate), and the position, speed, acceleration, heading angle, and category information of surrounding traffic participants (vehicles, pedestrians, non-motorized vehicles). Based on this traffic environment information, an intersection traffic scene map is constructed (i.e., a unified decision input representation; the intersection traffic scene map is an abstract representation). Graph nodes represent lanes, conflict zones, or traffic participants, and graph edges represent spatial connections or potential conflict relationships.

[0023] Step S2: Traffic Scene Graph Construction and Prefrontal Cortex-like Collaborative Spatial Reasoning. Based on a unified decision input representation, an intersection traffic scene graph is constructed. This traffic scene graph explicitly models spatial entities and their interactions within the traffic environment in a graph structure. The graph node set includes: the vehicle node, nodes of each traffic participant, and road structure nodes (such as lane segments and conflict zones). The graph edge set describes spatial adjacency relationships (e.g., distance less than a threshold) or potential conflict relationships (e.g., intersecting driving trajectories) between nodes. Building upon this, a prefrontal cortex-like multi-region collaborative mechanism is introduced, utilizing a graph neural network to perform multi-layer information propagation and node feature updates on the traffic scene graph. Each layer of the graph neural network aggregates neighboring node information, gradually fusing spatial structure and interactive dynamics to ultimately generate a scene spatial representation. This representation comprehensively characterizes the spatial layout of the traffic environment, the motion states of each entity, and their interactions.

[0024] Step S3: Generation of a Structured Left-Turn Action Graph. Based on the scene-level graph representation generated in Step S2, a structured left-turn action graph (LTAG) for unprotected left-turn scenarios is constructed. The left-turn action graph includes multiple candidate driving behavior nodes, including at least: waiting behavior (waiting for a crossable gap before the stop line), forward movement behavior (slowly entering the central waiting area of ​​the intersection to shorten the subsequent crossing time), entering the conflict zone behavior (starting a left turn and occupying the space of the oncoming lane), and rapid passage behavior (accelerating to complete the left turn to escape the conflict zone). The directed edges in the left-turn action graph are used to describe the legal switchable relationships between each candidate driving behavior and their corresponding safety prerequisites (e.g., the prerequisite for switching from "forward movement" to "entering the conflict zone" is the existence of a crossable gap).

[0025] Step S4: Spatial Feasibility and Risk Assessment of Candidate Driving Behaviors. Based on the brain-like neural network trained in Step S2, each candidate driving behavior in the left-turn action diagram is evaluated. The evaluation includes two aspects: spatial feasibility assessment and risk assessment. Spatial feasibility assessment is used to determine whether the candidate path meets road boundary constraints, curvature constraints, vehicle dynamics limitations, etc., and outputs a feasibility score. Risk assessment is used to quantify the probability and severity of potential conflicts with surrounding traffic participants during the execution of the candidate path, and outputs a risk index. The risk index can be calculated by integrating the cumulative time when the distance between the vehicle and each traffic participant is below the safety threshold in the prediction time domain.

[0026] Step S5: Learnable Action Mask Generation. Based on the feasibility score and risk index output in Step S4, a learnable action mask corresponding to each candidate driving behavior is generated. The action mask is a binary vector. Candidate behaviors with a feasibility score higher than a preset threshold and a risk index lower than a preset threshold correspond to a mask value of 1 (retained); otherwise, it is 0 (masked). This mechanism dynamically removes high-risk or infeasible candidate behaviors from the decision space, ensuring that subsequent strategy decisions are made only within a safe and feasible subspace, effectively compressing the decision space and improving decision stability and safety. This mask generation process is learnable, and the threshold can be adaptively adjusted through reinforcement learning.

[0027] Step S6: Constrained Policy-Level Decision Making. A decision network is constructed using the Proximal Policy Optimization (PPO) algorithm. The policy network takes the scene-level graph representation generated in Step S2 and the action mask generated in Step S5 as input, and outputs the conditional probability distribution of each candidate behavior. Under the action mask, the probability of the masked behavior is set to zero, and the probabilities of the remaining behaviors are renormalized to obtain a constrained effective policy distribution. The policy network optimizes by maximizing the cumulative discount reward. The reward function comprehensively considers traffic efficiency (e.g., travel time, waiting time), driving smoothness (e.g., rate of change of acceleration), and safety (e.g., risk penalty). Finally, the candidate behavior with the highest probability is selected as the optimal driving behavior at the current moment.

[0028] Step S7: Introduction of Risk Budget Constraint Mechanism. To prevent overly risky or overly conservative decisions in high-uncertainty environments, this invention introduces a risk budget constraint mechanism. First, based on the scenario-level semantic representation from Step S2, the uncertainty measure (e.g., feature variance) of the current traffic environment is calculated. Then, an acceptable risk budget is dynamically allocated according to the uncertainty level (the higher the uncertainty, the lower the risk budget). Within the prediction time domain of path planning, the cumulative risk of the selected behavioral path is monitored and constrained in real time. If the cumulative risk exceeds the current risk budget, the system will automatically reduce the probability of selecting high-risk paths or trigger replanning, thereby dynamically adjusting the aggressiveness of decisions based on environmental risks while ensuring a safety baseline, achieving an adaptive balance between safety and traffic efficiency.

[0029] Step S8: Context Graph Memory Construction and Similarity Retrieval. To mimic the context memory and pattern completion capabilities of a hippocampus-like system, a context graph memory is constructed. This memory stores historically successfully navigated complex traffic scene graphs and their corresponding optimal driving strategies (i.e., the behavior sequences or paths output in Step S6) in the form of a graph database. When the autonomous vehicle faces a new traffic scene, the system first calculates the similarity between the current scene graph and each historical scene graph in the memory. Similarity can be measured using methods such as cosine similarity based on the feature vectors of graph neural network nodes or graph edit distance. After retrieving the most similar historical scenes, their corresponding successful strategies are used as prior knowledge and integrated into the decision-making process of Step S6 through attention mechanisms or strategy initialization, enabling rapid reuse of historical experience and improving the model's adaptability and decision-making efficiency in long-tail scenarios.

[0030] Step S9: Driving Decision Generation and Iterative Update. Based on the optimal driving behavior output in Step S6 and its associated candidate paths, combined with the vehicle's current motion state, specific vehicle control commands are generated through a trajectory tracking controller (such as model predictive control or a pure tracking controller), including steering wheel angle and desired acceleration / deceleration. After the vehicle executes the commands, the system enters the next time step, repeating steps S1 to S9 to form a closed-loop continuous dynamic decision-making and planning process.

[0031] This invention is implemented in detail according to the following steps: Step 1: Acquire multi-source traffic environment information for autonomous vehicles and construct a unified decision input representation.

[0032] In this step, at time t, the unmanned vehicle acquires multi-source traffic environment information through the onboard perception system and the vehicle-road cooperative system. The multi-source traffic environment information includes at least road topology and lane geometry information, vehicle motion state information, surrounding traffic participant state information, and traffic light and traffic rule information.

[0033] Road topology and lane geometry information are used to describe the drivable space constraints of autonomous vehicles, and can be represented as a set of lanes: Each lane It includes lane centerline geometry information and corresponding lane attribute information, used to characterize road structure features.

[0034] The motion state of this vehicle at time t can be represented as a state vector: in, Indicates the location of this vehicle. Indicates speed, Indicates acceleration. Indicates the heading angle. This represents the yaw rate.

[0035] The set of states of surrounding traffic participants at time t is represented as follows: Every traffic participant The state vector can be represented as: in, This indicates information about the categories of traffic participants. The position, velocity, acceleration, and heading angle of the i-th traffic participant.

[0036] Traffic light states can be represented as discrete variables: in, These represent the red, yellow, and green light signals, respectively. The remaining time for each corresponding signal phase is also included. It is used to depict the trend of traffic signals changing over time.

[0037] The aforementioned multi-source traffic environment information is fused after feature encoding and mapping to construct a unified decision input representation: in, The unified decision input represents the feature fusion function. This serves as the foundational input for subsequent steps such as traffic scenario modeling, path evaluation, and collaborative decision-making and path planning across multiple brain regions in the prefrontal cortex, similar to the human brain.

[0038] Step 2: Prefrontal co-space reasoning based on graph neural networks.

[0039] Based on the construction of traffic scene interaction graphs, a prefrontal cortex-like multi-brain region collaboration mechanism is introduced. Graph neural networks are used to model and infer the interaction relationship between spatial structure and traffic participants in the traffic environment, thereby extracting high-level scene semantic representations to provide input support for subsequent path generation and risk assessment.

[0040] The node features in the traffic scene graph include motion state information of traffic participants (such as position, velocity, and acceleration) and road structure information (such as lane width and curvature). All input features are standardized before being fed into the graph neural network. The traffic scene interaction graph is constructed at time t as follows: Node set Defined as: Among them, the node set Including this vehicle node Traffic Participant Nodes and road structure nodes .

[0041] Graph edge set This is used to describe the spatial adjacency and potential conflict relationships between nodes. When a preset spatial distance constraint is met or there is a possibility of trajectory conflict, an edge connection is established between the corresponding nodes. The determination condition can be expressed as: in, This is a distance threshold function. Determine if there are conflicts between nodes. For traffic participants nodes.

[0042] After completing the traffic scene interaction map Subsequently, graph neural networks are used to infer the spatial structure and interaction relationships in traffic scenarios, achieving collaborative modeling of multiple traffic participants and road structure information. This process can be viewed as a prefrontal cortex-like multi-brain region collaborative mechanism, achieving a holistic understanding of complex traffic scenarios through multiple rounds of information propagation and fusion. In the... In a layered graph neural network, the update rule for node features is expressed as follows: in, Represents a node The neighborhood set, and It is a learnable nonlinear mapping function. Let v be the feature representation of node v in the l-th layer of the graph neural network. Let be the feature representation of neighbor node u in the l-th layer. This is used to describe the relationship between node u and node v.

[0043] After completing the spatial information propagation of the multi-layer graph neural network, the high-level features of all nodes are aggregated to obtain a scene-level semantic embedding representation. : Here, L represents the last layer of the graph neural network.

[0044] The obtained scene-level semantic representation It comprehensively depicts the spatial structural features of the traffic environment and the interactive relationships among traffic participants, reflects the overall scene cognition ability under the prefrontal cortex-like collaborative mechanism, and serves as the input for subsequent path generation and risk assessment modules.

[0045] Step 3: Generate a structured left turn animation.

[0046] Based on traffic scene graph representation, a structured left-turn action graph (LTAG) is constructed for unprotected left-turn scenarios. The left-turn action graph includes multiple candidate driving behavior nodes, including at least waiting behavior, forward movement behavior, entering the conflict zone behavior, and rapid passage behavior.

[0047] The candidate path set is defined as follows: Each candidate path Composed of a sequence of discrete trajectory points, represented as: in, The first two are the two-dimensional spatial coordinates of the m-th path at time n. The third is the velocity value of the m-th path at the n-th trajectory point. The number of trajectory points contained in the m-th path.

[0048] Meanwhile, in order to achieve synergy between high-level decision-making and low-level path planning, candidate paths are mapped to a predefined set of behavioral semantics.

[0049] in, These are respectively: maintaining the current driving state, deceleration behavior, yielding behavior, accelerating through behavior, and state change behavior.

[0050] The mapping relationship between path and behavior satisfies If and only if ,in, This is a path type discrimination function used to determine the path type. The corresponding behavioral category No. j Semantic behavior.

[0051] In this way, the path planning problem is transformed into a joint representation of finite candidate paths and behavioral semantics, which can be used for subsequent decision-making and risk assessment.

[0052] Step 4: Conduct a feasibility and risk assessment of the candidate paths.

[0053] Based on a graph neural network, each candidate driving behavior in the left-turn action graph is evaluated, and the spatial feasibility and risk characteristics of each candidate driving behavior in the current traffic environment are output. This is done after obtaining a set of candidate paths. Then, the feasibility and safety risks of each candidate path under the current traffic scenario are evaluated to eliminate path schemes that do not meet the constraints.

[0054] For any path Its feasibility evaluation function is defined as: The feasibility evaluation comprehensively considers lane boundary constraints, curvature constraints, and vehicle dynamics (speed and acceleration) limitations.

[0055] Lane boundary constraints: in, Candidate Path The horizontal coordinate of the nth trajectory point Candidate Path The vertical coordinate of the nth trajectory point, where n is the trajectory point index and m is the candidate path index, and the road area refers to the space within which vehicles can travel, usually defined by lane boundaries or road network maps.

[0056] Curvature constraint: in, Candidate Path The curvature at the nth trajectory point The maximum permissible curvature of a vehicle is limited by its minimum turning radius.

[0057] Velocity / acceleration constraints: in, Candidate Path The velocity at the nth trajectory point The maximum speed allowed for the vehicle Candidate Path The acceleration at the nth trajectory point The maximum permissible acceleration of the vehicle.

[0058] Simultaneously, a path risk function is introduced to characterize the degree of risk of potential conflicts with surrounding traffic participants during path execution, which is defined as: in, Indicates along the path During the journey, within the predicted time This car and the first The distance between traffic participants To predict the length of the time domain, This is a safety distance threshold; if the distance is less than this value, a potential collision risk is considered to exist. m represents the m-th candidate path. .

[0059] Based on the above evaluation, a corresponding feasibility score and risk quantification index are generated for each candidate path.

[0060] Step 5: Generate candidate path masking.

[0061] Based on spatial feasibility and risk characteristics, learnable action masking masks are generated for each candidate driving behavior, which are used to dynamically mask high-risk or infeasible candidate driving behaviors.

[0062] For candidate paths Its corresponding path mask is defined as: in, and These represent the feasibility threshold and the risk threshold, respectively.

[0063] This forms the path mask vector: in, The path mask vector at the current time t. This represents the mask value corresponding to each candidate path, where M is the total number of candidate paths.

[0064] The path mask dynamically blocks high-risk or infeasible candidate paths from the decision space, ensuring that subsequent decisions are made only within the safe and feasible path subspace.

[0065] Step Six: Constrained Strategy-Level Decision Making.

[0066] Policy decision-making is based on PPO (Progressive Point of Action). The state space includes the autonomous vehicle's state and surrounding traffic information, while the action space includes candidate behaviors such as "wait," "probe," and "enter the conflict zone." The reward function is designed based on the safety and risk of the behavior, and the PPO optimization strategy achieves efficient and safe decision-making. PPO strategy optimization: in, This represents the policy function based on the policy parameter θ, which outputs the selection probability of each candidate path.

[0067] By applying constraints to the policy output under the influence of path masking, the effective policy distribution is obtained: ; in, This represents the policy distribution after masking and correction, considering only safe and feasible candidate paths.

[0068] Ultimately, the optimal path is selected: The above methods enable collaborative decision-making among path efficiency, security, and rule constraints.

[0069] Step 7: Introduce risk budget constraints to achieve an adaptive balance between safety and efficiency.

[0070] To prevent excessive risk-taking or over-conservatism in complex or highly uncertain traffic environments, a risk budget constraint mechanism is introduced in this step to limit the accumulation of risks during the route decision-making process.

[0071] First, the uncertainty measure of the current traffic environment is defined as: Allocate acceptable risk budgets based on uncertainty levels: in, This is the risk adjustment coefficient.

[0072] Constraints are applied to risk accumulation within the path planning prediction time domain: in, For time step indexing, starting from the current time By the end of the prediction time domain , For risk assessment, predict the time domain length. For path In time The risk value, Budget for current risks.

[0073] When the risk of a route exceeds the risk budget, the system automatically reduces the probability of selecting high-risk routes or replans candidate routes, thereby achieving an adaptive balance between safety and traffic efficiency.

[0074] Step 8: Contextual Graph Memory Construction and Similarity Retrieval.

[0075] To mimic the episodic memory and pattern completion capabilities of the hippocampus, a scenario graph memory bank is constructed to store historical traffic scenario graphs and their corresponding successful driving strategies. When a new scenario arrives, relevant experiences can be quickly recalled through similarity retrieval to provide a reference for current decision-making.

[0076] The scenario graph memory uses a graph database to store the structure and feature information of historical traffic scenario graphs. Each historical scenario graph consists of nodes (traffic participants, road structures, traffic lights, etc.) and edges (spatial adjacency relationships and potential conflict relationships). These are encoded using a graph neural network to obtain a scenario embedding representation, which, along with successful strategies, is written into the memory to form a set of historical experiences. in, This represents a historical traffic scene. Representing historical scene illustrations The corresponding successful driving strategy.

[0077] When the current scene image appears At that time, the system calculates With each in the memory bank The similarity is used to retrieve the most similar historical scenes and determine their success strategies. As a decision-making reference, similarity can be measured using cosine similarity, for example, the calculation form based on node feature vectors is: in, and These represent nodes in the current scene graph and the historical scene graph, respectively. The eigenvectors of , where n is the total number of nodes. It is a Euclidean norm.

[0078] This similarity retrieval mechanism enables rapid recall of historical "scenario-policy" experiences, thereby enhancing the adaptability of autonomous vehicles to new scenarios.

[0079] Step 9: Driving decisions are generated and updated cyclically.

[0080] Based on the target driving behavior output by the strategy decision neural network, corresponding vehicle control commands are generated and executed; simultaneously, traffic environment information is continuously updated, and steps S1 to S8 are executed iteratively to achieve continuous decision-making and planning for the autonomous vehicle in complex intersection scenarios. This process involves determining the optimal path. Then, a continuous executable trajectory is generated based on the path, and the corresponding vehicle control commands are output.

[0081] Vehicle control inputs are represented as follows: in, This represents the control input for the vehicle at time t. and These represent the steering control quantity and the longitudinal acceleration control quantity, respectively. (·) is the mapping function from trajectory to control command. The currently selected optimal path, Current status of the driverless car.

[0082] driverless car status Typically includes: in, The vehicle's current location coordinates Current heading angle, Current speed, Current acceleration.

[0083] Optimal path express: Each point contains: location and expected speed , It represents the number of path sampling points.

[0084] After the vehicle executes the control command, the system enters the next time step t+1, reacquires traffic environment information, and repeats steps one through nine, thereby realizing continuous path planning and decision-making for the unmanned vehicle in complex traffic environments.

[0085] Example 2 The present invention also provides a dynamic decision-making system for unprotected left turns of unmanned vehicles based on brain-like neural networks. The system is used to implement the method described in Embodiment 1. The system includes: a multi-source information fusion and unified representation construction module, a scene graph construction and brain-like spatial reasoning module, a structured action graph generation module, a behavior feasibility and risk assessment module, a proximal strategy optimization and behavior decision-making module, and a trajectory tracking and vehicle control module. The multi-source information fusion and unified representation construction module is used to acquire multi-source information on the traffic environment of unmanned vehicles and construct a unified decision input representation; The scene graph construction and brain-like spatial reasoning module is used to construct traffic scene graphs based on a unified decision input representation and introduce brain-like neural networks to perform prefrontal cortex collaborative spatial reasoning to generate scene-level semantic representations. The structured action graph generation module is used to construct a structured left-turn action graph based on scene-level semantic representation; The behavior feasibility and risk assessment module is used to assess the feasibility and risk of each candidate driving behavior in the structured left turn action map based on the trained brain-like neural network, and generate action masking. The near-end policy optimization and behavior decision module is used to obtain the conditional probability distribution of each candidate behavior based on scene-level semantic representation and action masking, and select the candidate behavior with the highest probability as the optimal driving behavior at the current moment within the near-end policy optimization algorithm framework. The trajectory tracking and vehicle control module is used to generate specific vehicle control commands through the trajectory tracking controller based on the optimal driving behavior and its associated candidate paths, combined with the current motion state of the vehicle.

[0086] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims

1. A dynamic decision-making method for unprotected left turns of unmanned vehicles based on brain-like neural networks, characterized in that, The method includes: Acquire multi-source traffic environment information for autonomous vehicles and construct a unified decision input representation; Based on a unified decision input representation, a traffic scene map is constructed and a brain-like neural network is introduced to perform prefrontal collaborative spatial reasoning to generate scene-level semantic representations. Based on scene-level semantic representation, a structured left-turn action graph is constructed; Based on the trained brain-like neural network, the feasibility and risk assessment of each candidate driving behavior in the structured left turn action map is performed, and an action masking mask is generated. Based on scene-level semantic representation and action masking, under the framework of near-end policy optimization algorithm, the conditional probability distribution of each candidate behavior is obtained and the candidate behavior with the highest probability is selected as the optimal driving behavior at the current moment. Based on the optimal driving behavior and its associated candidate paths, combined with the vehicle's current motion state, specific vehicle control commands are generated through the trajectory tracking controller.

2. The method according to claim 1, characterized in that, Methods for constructing unified decision input representations include: ; in, Represents the feature fusion function; This indicates the motion state of the vehicle at time t; Indicates each traffic participant The state vector; Indicates the number of traffic participants; Represents road topology and lane geometry information; Indicates the status of traffic lights; This indicates the remaining time of the corresponding signal phase.

3. The method according to claim 2, characterized in that, Methods for constructing traffic scene maps include: At time t, the traffic scenario diagram is constructed as follows: Node set Defined as: ; Among them, the node set Including this vehicle node Traffic Participant Nodes and road structure nodes ; Graph edge set This is used to describe the spatial adjacency and potential conflict relationships between nodes. When a preset spatial distance constraint is met or there is a possibility of trajectory conflict, an edge connection is established between the corresponding nodes. The determination condition is expressed as follows: ; in, This is a distance threshold function. Determine if there are conflicts between nodes. For traffic participants nodes.

4. The method according to claim 3, characterized in that, Methods for introducing brain-like neural networks for prefrontal cortex collaborative spatial reasoning to generate scene-level semantic representations include: ; Where L represents the last layer of the graph neural network; In the In a layered graph neural network, the update rule for node features is expressed as follows: ; in, Represents a node The neighborhood set, and It is a learnable nonlinear mapping function. Let v be the feature representation of node v in the l-th layer of the graph neural network. Let be the feature representation of neighbor node u in the l-th layer. This is used to describe the relationship between node u and node v.

5. The method according to claim 4, characterized in that, The structured left-turn action graph includes multiple candidate driving behavior nodes, including at least: waiting behavior, forward movement behavior, entering the conflict zone behavior, and rapid passage behavior.

6. The method according to claim 5, characterized in that, Methods for conducting feasibility and risk assessments of candidate driving behaviors in structured left-turn motion diagrams include: For any path Its feasibility evaluation function is defined as: ; Simultaneously, a path risk function is introduced to characterize the degree of risk of potential conflicts with surrounding traffic participants during path execution, which is defined as: ; in, Indicates along the path During the journey, within the predicted time This car and the first The distance between traffic participants To predict the length of the time domain, Let m be the safe distance threshold, representing the m-th candidate path. .

7. The method according to claim 6, characterized in that, Based on scene-level semantic representation and action masking, within the framework of near-end policy optimization algorithms, methods for obtaining the conditional probability distribution of each candidate behavior and selecting the candidate behavior with the highest probability as the optimal driving behavior at the current moment include: ; ; in, The optimal path; For efficient policy distribution; The number of candidate paths. This represents the policy function based on the policy parameter θ, which outputs the selection probability of each candidate path. This represents the policy distribution after mask constraint correction. This is the path mask.

8. The method according to claim 7, characterized in that, The method for generating specific vehicle control commands through a trajectory tracking controller based on optimal driving behavior and its associated candidate paths, combined with the vehicle's current motion state, includes: ; in, This represents the control input for the vehicle at time t. and These represent the steering control quantity and the longitudinal acceleration control quantity, respectively. (·) is the mapping function from trajectory to control command. The currently selected optimal path, Current status of the driverless car; driverless car status for: ; in, The vehicle's current location coordinates Current heading angle, Current speed, Current acceleration; Optimal path express: ; Each point contains: location and expected speed , It represents the number of path sampling points.

9. A dynamic decision-making system for unprotected left turns of an unmanned vehicle based on a brain-like neural network, the system being used to implement the method described in any one of claims 1-8, characterized in that, The system includes: a multi-source information fusion and unified representation construction module, a scene graph construction and brain-like spatial reasoning module, a structured action graph generation module, a behavior feasibility and risk assessment module, a proximal strategy optimization and behavior decision-making module, and a trajectory tracking and vehicle control module. The multi-source information fusion and unified representation construction module is used to acquire multi-source information on the traffic environment of unmanned vehicles and construct a unified decision input representation. The scene graph construction and brain-like spatial reasoning module is used to construct a traffic scene graph based on a unified decision input representation and introduce a brain-like neural network to perform prefrontal cortex collaborative spatial reasoning to generate scene-level semantic representations. The structured action graph generation module is used to construct a structured left-turn action graph based on scene-level semantic representation; The behavior feasibility and risk assessment module is used to assess the feasibility and risk of each candidate driving behavior in the structured left turn action diagram based on the trained brain-like neural network, and generate an action masking mask. The near-end policy optimization and behavior decision module is used to obtain the conditional probability distribution of each candidate behavior based on scene-level semantic representation and action masking, and select the candidate behavior with the highest probability as the optimal driving behavior at the current moment within the near-end policy optimization algorithm framework. The trajectory tracking and vehicle control module is used to generate specific vehicle control commands through the trajectory tracking controller based on the optimal driving behavior and its associated candidate paths, combined with the current motion state of the vehicle.