A traffic signal lamp multi-node cooperative control method
By combining graph attention networks and multi-agent reinforcement learning, the problems of multi-intersection coordination and decision transparency in traffic signal control systems are solved, achieving efficient and interpretable traffic signal control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TONGJI UNIV
- Filing Date
- 2026-01-30
- Publication Date
- 2026-06-12
AI Technical Summary
Existing traffic signal control methods lack multi-intersection coordination mechanisms, making it difficult to cope with complex traffic flow distribution and tidal phenomena in urban road networks. Furthermore, the system decision-making transparency is insufficient, affecting practical applications.
A graph attention network is used for adaptive modeling of road network topology. Multi-agent reinforcement learning is combined to generate dynamic signal timing schemes. A decision interpreter is built through a large language model to provide transparent decision support.
It achieves efficient coordinated control of multiple traffic lights, can adapt to dynamic changes in road network topology, improves the reliability and transparency of the system and enhances the explainability of traffic management.
Smart Images

Figure CN122201019A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent transportation, and in particular to a multi-node collaborative control method for traffic lights. Background Technology
[0002] Existing traffic signal control methods are mainly divided into three types: fixed timing, inductive control, and adaptive control. Fixed timing methods are based on historical traffic data and pre-design the signal cycle and phase duration. Although simple to implement, they cannot cope with real-time fluctuations in traffic flow and are prone to causing intersection congestion. Inductive control detects vehicle arrival by installing sensors such as inductive loops and can make limited adjustments based on the current traffic flow, but it is difficult to achieve effective coordination between multiple intersections.
[0003] In recent years, significant progress has been made in adaptive signal control methods based on reinforcement learning, which learn optimal timing strategies through interaction between agents and the traffic environment. However, existing methods often treat each traffic light as an independent decision-making unit, lacking an effective multi-intersection coordination mechanism. This makes them unable to cope with the complex traffic flow distribution and tidal phenomena in urban road networks, potentially leading to timing conflicts at adjacent intersections and creating new traffic bottlenecks. Furthermore, traditional methods lack adaptability and struggle to quickly adjust control strategies when the road network topology changes due to construction closures, temporary traffic controls, or other circumstances.
[0004] More critically, current intelligent traffic signal control systems generally suffer from a "black box" problem, making it difficult to explain the decision-making rationale and expected effects to traffic management personnel and the general public, thus reducing the system's credibility and acceptability. Traffic management departments need to understand why the system selects specific timing schemes during particular periods in order to assess their rationality and make necessary human interventions. This lack of decision-making transparency severely restricts the large-scale practical application of advanced traffic signal control technologies, urgently requiring an innovative control method that can achieve multi-node collaboration and provide decision explanations. Summary of the Invention
[0005] The purpose of this invention is to overcome the shortcomings of the existing technology and provide a multi-node coordinated control method for traffic lights. This method has the advantages of high efficiency in multi-signal light coordination timing, strong adaptability to road conditions, and strong interpretability.
[0006] The objective of this invention can be achieved through the following technical solutions: A multi-node collaborative control method for traffic lights includes: S1. Real-time collection of road network status data; S2. Considering the dynamic changes in the road network, a graph attention network is used to adaptively model the road network topology. S3. After receiving real-time traffic flow data, a multi-agent reinforcement learning algorithm is used to coordinate the timing of multiple traffic lights and generate a dynamic signal timing scheme. S4. Use a large language model to build a decision interpreter and generate natural language interpretation files for traffic light phase decisions corresponding to dynamic signal timing schemes. S5. The dynamic signal timing scheme is sent to the intersection signal controller for execution, and the natural language interpretation file of the traffic signal phase decision is pushed to the traffic monitoring terminal at the same time as the semantic basis for real-time rationality assessment and intervention of the dynamic signal timing scheme.
[0007] Preferably, the road network status data in S1 includes: Traffic flow parameters: vehicle throughput of each lane, queue length, average vehicle speed, and vehicle type classification, wherein the vehicle type classification includes motor vehicles, non-motor vehicles, and pedestrians; Map elements: lane topology, variable lane markings, traffic light phase rules, and right-of-way markings, including bus lanes; Time dimension parameters: the traffic mode classification of the current time period and the holiday marker, wherein the traffic mode classification includes morning peak, evening peak and off-peak; Environmental perception data: weather conditions, visibility, and road surface slippage coefficient; Dynamic event markers: coordinates of temporary road closures, radius of impact of traffic accidents, and emergency vehicle passage requests.
[0008] Preferably, in step S2, based on the dynamic changing characteristics of the traffic network, a graph attention network is used to adaptively model the network topology, specifically including: Adjacency matrix ∈ This characterizes the connection relationships between intersections in the road network, where... The number of nodes at the intersection, when the node When there is a connection with node j, 1, otherwise ; Based on the dynamic adjacency matrix update mechanism, when a road network change is detected, the adjacency matrix is automatically updated. The update expression is: In the formula: This represents the adjusted adjacency matrix; Represents element-wise multiplication; Let the mask matrix be the connection between closed roads. The elements in the middle are set to 0; for newly added connections, the adjacency matrix is then updated. Add a new edge.
[0009] Preferably, in the graph attention network, a multi-head attention mechanism is used, allowing each intersection to simultaneously pay attention to the traffic conditions of multiple related intersections from different angles, dynamically calculating attention weights, and automatically adjusting the degree of attention to different neighboring intersections. The calculation expression for the graph attention layer is as follows: In the formula: To correct the leakage of linear units; and These are the feature vectors of nodes i and j, respectively; W is the weight matrix. For the first The weight matrix corresponding to each attention head For the number of heads; For attention vectors, Indicates transpose. These are the attention coefficients corresponding to the channels between nodes i and j. These are the attention coefficients corresponding to the channels between nodes i and j after normalization. For the first The attention coefficients corresponding to the normalized channels between nodes i and j for each attention head; This is the normalization function; It is an activation function; This is a vector concatenation operation; This represents the set of neighboring nodes corresponding to node i; Let be the updated feature vector of node i.
[0010] Preferably, in step S3, after receiving real-time traffic flow data, a multi-agent reinforcement learning algorithm is used to coordinate the timing of multiple traffic lights, generating a dynamic signal timing scheme, specifically including: Building a policy-based network and value network The Actor-Critic architecture outputs the probability distribution of actions, while the latter evaluates the value of states. Initialization: Model each traffic light as an independent agent and initialize the policy network. and value network Set the quintuple<S, A, P, R,γ> Where S is the state space, A is the action space, P is the state transition probability, R is the reward function, γ is the discount factor, and the time step T, the number of agents N, and the maximum number of training rounds U are set. in, State space S is defined as the joint set of the traffic flow feature vectors of all approach lanes at the current intersection and the current signal phase vector. At time step t, for the i-th traffic light agent, its state space is... Represented as ,in The total number of approach lanes at the intersection where the i-th traffic light agent is located; The current phase state of the traffic light at time t is represented using one-hot encoding; For the first The micro-level mixed traffic flow feature vector of each lane includes vehicle throughput, queue length, average vehicle speed and vehicle type classification for each lane in the road network; Action space: ,in, It is a north-south straight phase. It is a north-south left-turn phase. It is an east-west reciprocal phase. This is the east-west left-turn phase; Action selection and execution: At time step t, for each traffic light agent... According to the policy network Select the optimal action To carry out joint operations Then, obtain the new state. ,award and termination mark , experience Store to buffer, update state Calculate the dominance function at each time step t. and cumulative returns ; In each update iteration, the empirical frequency is sampled from the buffer, the policy loss and value loss are calculated, and the policy network is updated. and value network The gradient is calculated, and the cache is cleared. When the current training round reaches the maximum training round U, the output result is obtained, and the dynamic signal timing scheme is obtained.
[0011] Preferably, the reward function R includes rewards for reduced waiting time, reduced queue length, improved traffic efficiency, improved global road network status, and traffic optimization target rewards for inter-intersection cooperation, which are represented by a weighted sum. The waiting time reduction reward is represented by subtracting the total waiting time at the current time from the total waiting time at the previous moment; The reward for reducing queue length is represented by subtracting the queue length at the current time from the queue length at the previous time step. The traffic efficiency improvement reward aims to maximize the traffic throughput at intersections per unit time. The global road network condition improvement reward is represented by the regional average normalized vehicle speed.
[0012] Preferably, a multi-agent proximal policy optimization algorithm is used to update the policy network parameters.
[0013] Preferably, the optimization objective adopted by the multi-agent proximal policy optimization algorithm includes a multi-agent proximal policy optimization loss term. The expression is: = E In the formula: E is the expected value; The probability ratio between the old and new strategies. For the probability of the new strategy, The probability is the old strategy probability; These are the parameters to be optimized for the policy network; The action at time t represents the phase selection of the traffic light; This represents the agent's local observations at time t, including the queue length, lane occupancy rate, and current traffic light phase state at this intersection. The state at time t; The probability ratio between the old and new strategies is used to measure the magnitude of change in the updated strategy relative to the original strategy. The dominant function; This is the clipping function; These are the trimming parameters.
[0014] Preferably, the optimization objective of the proximal strategy optimization constraint strategy is... It also includes the value function loss term. And entropy regularization term The optimization objective The calculation expression is: (ϕ) = E[ ] In the formula: The loss coefficient is the value function coefficient, used to balance the weights of policy optimization and value estimation. The predicted value of the value network is represented by the parameter. Below, based on current local observations and global state An assessment of the value of the current state; The target value is calculated based on the real reward or generalized advantage estimate and is used as a supervision label for training the value network. This is the entropy regularization coefficient, used to adjust the exploration intensity; The action probability distribution output by the policy network; This is the entropy regularization term; The coefficient is constant. This is the information entropy function, used to measure the uncertainty of policy distribution.
[0015] Preferably, in step S4, a decision interpreter is constructed using a large language model to generate a natural language explanation file for the traffic light phase decision corresponding to the dynamic signal timing scheme, specifically including: The decision interpreter collects key decision data of traffic lights in real time, including waiting time, queue length, traffic flow, average speed, congestion level, and the status of adjacent traffic lights for each lane. Construct structured prompts, transforming numerical data and status information into descriptive text, and adding professional traffic management terminology and background knowledge; The prompt words are sent to a large language model enhanced with traffic domain knowledge to generate decision explanations. These explanations include operational-level explanations to explain the specific reasons for phase switching, strategy-level explanations for long-term traffic optimization goals, and coordination-level explanations to clarify the relationship with adjacent intersections.
[0016] Compared with the prior art, the present invention has the following advantages: (1) The present invention adopts a method combining graph attention network and multi-agent reinforcement learning to achieve efficient collaborative control of traffic lights and maintain stable performance under dynamic changes in road network topology.
[0017] (2) This invention innovatively introduces a large language model to construct a decision interpreter, which solves the "black box" problem of existing intelligent traffic control systems, improves the credibility of the system, and provides transparent decision support for traffic management.
[0018] (3) Considering that existing conventional graph attention networks usually pre-set fixed road network connections or only adjust based on traffic weights, they cannot effectively cope with temporary lane closures, tidal lane adjustments, traffic accidents or sudden physical blockages caused by construction. This application introduces element-wise multiplication of the mask matrix and the adjacency matrix, which can "cut off" invalid physical connections in real time without retraining the model, and can effectively adapt to sudden changes in the traffic network. Attached Figure Description
[0019] Figure 1 This is a flowchart of the method of the present invention; Figure 2This is a flowchart illustrating the structured input of real-time traffic data in the embodiment. Figure 3 This is a diagram illustrating the process of adaptive road network topology calculation using the attention network in the example. Figure 4 This is a flowchart of the multi-agent reinforcement learning algorithm for multi-node collaborative control of traffic lights in the embodiment; Figure 5 This is a flowchart illustrating the interpretation and generation process for traffic light decisions in this embodiment. Detailed Implementation
[0020] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0021] This embodiment provides a multi-node collaborative control method for traffic lights, the method flow is as follows: Figure 1 As shown, it includes the following steps: S1. Real-time collection of road network status data to provide structured input for subsequent algorithms.
[0022] First, real-time road network status data is collected by combining multi-source traffic flow sensors with a high-precision map. This technology employs a layered data processing architecture, including a physical sensing layer, a data fusion layer, and a feature extraction layer. In the physical sensing layer, various sensor data sources are integrated, including induction coils buried in the road surface, video surveillance equipment on both sides of the road, vehicle-mounted GPS signals, and floating car data. For coil sensor data, a time window smoothing algorithm is used to eliminate instantaneous fluctuations; for video data, a deep learning object detection algorithm is used to extract vehicle trajectory information; and for GPS and floating car data, Map-Matching technology is used to map them onto a high-precision map. In the data fusion layer, a Kalman filter is used to fuse the multi-source data, and its state update equation is: In the formula: Z(k) is the optimal state estimate at time k; K(k) is the Kalman gain; Z(k) is the actual observation value of the sensor at time k; H is the observation matrix.
[0023] This method effectively reduces noise and uncertainty from a single data source, improving the accuracy of traffic state estimation. In the feature extraction layer, key features including vehicle queue length, average waiting time, travel time, traffic flow, average speed, and congestion index are calculated and standardized to form a structured road network state representation matrix. In the formula: N represents the number of nodes in the road network; D represents the feature dimension of each node.
[0024] In this embodiment, the road network status data includes: Traffic flow parameters: vehicle throughput of each lane, queue length, average vehicle speed, and vehicle type classification, wherein the vehicle type classification includes motor vehicles, non-motor vehicles, and pedestrians; Map elements: lane topology, variable lane markings, traffic light phase rules, and right-of-way markings, including bus lanes; Time dimension parameters: the traffic mode classification of the current time period and the holiday marker, wherein the traffic mode classification includes morning peak, evening peak and off-peak; Environmental perception data: weather conditions, visibility, and road surface slippage coefficient; Dynamic event markers: coordinates of temporary road closures, radius of impact of traffic accidents, and emergency vehicle passage requests.
[0025] Specifically, the "traffic flow parameters" (vehicle queue length, average vehicle speed, and vehicle type classification) are obtained by the simulator's real-time statistics interface (TraCI) and directly mapped to the state space observations and reward function calculation terms in the multi-agent reinforcement learning in step S3, guiding policy optimization.
[0026] The following data will serve as the environment modeling content—a key element of reinforcement learning—and, through interaction with the traffic light agent's actions, will help reinforcement learning converge to the optimal phase and timing strategy of the traffic lights: 1) Map elements (lane topology): In SUMO, these are presented as .net.xml files and are directly used to build the foundational adjacency matrix of the simulation environment base and the graph attention network in step S2.
[0027] 2) Time dimension parameters: Extracted from the OD matrix of real traffic data, implemented in SUMO as a traffic flow generation file, driving the traffic flow generation module in the simulator to establish traffic flow benchmarks under different traffic modes.
[0028] 3) Environmental perception data: By adjusting the simulation physical parameters in SUMO (such as the road surface friction coefficient to simulate slippery weather), the following behavior of the vehicle is changed, and such environmental constraints are used as global state features input into the algorithm.
[0029] 4) Dynamic event marking: Convert 'road closure / accident' information into a mask matrix in S2, and use physical blockage maps to dynamically constrain the control range.
[0030] S2. To address the dynamic changes in traffic networks, such as temporary closures and tidal lane adjustments, a graph attention network is used to adaptively model the network topology.
[0031] Adjacency matrix ∈ This characterizes the connection relationships between intersections in the road network, where... The number of nodes at the intersection, when the node When there is a connection with node j, 1, otherwise .
[0032] To handle changes in road network topology, a dynamic adjacency matrix update mechanism is designed. When a road network change is detected (such as road closure or lane adjustment), the adjacency matrix is automatically updated to achieve real-time adjustment of the topology. In the formula: This represents the adjusted adjacency matrix; Represents element-wise multiplication; Let the mask matrix be the connection between closed roads. The elements in the middle are set to 0; for newly added connections, the adjacency matrix is then updated. Add a new edge.
[0033] In graph attention networks, a multi-head attention mechanism is used to allow each intersection to simultaneously monitor the traffic conditions of multiple related intersections from different angles. Attention weights are dynamically calculated, and the degree of attention given to different neighboring intersections is automatically adjusted. The calculation expression for the graph attention layer is as follows: LeakyReLU In the formula: and ... It is the attention coefficient. It is an activation function.
[0034] To enhance the model's expressive power, a multi-head attention mechanism is employed, concatenating the outputs of K independent attention heads: This technology enables automatic adaptation to changes in road network structure without requiring retraining of the model infrastructure.
[0035] S3. Receive real-time traffic data, employ a multi-agent reinforcement learning algorithm to coordinate the timing of multiple traffic lights, and generate a dynamic signal timing scheme, specifically including: After receiving real-time traffic flow data, a multi-agent reinforcement learning algorithm is used to coordinate the timing of multiple traffic lights, generating a dynamic signal timing scheme, which specifically includes: The real-time traffic data received by the traffic signal agent specifically refers to the "traffic flow parameters" (vehicle throughput of each lane, queue length, average vehicle speed, and vehicle type classification) obtained in real time through the TRACI interface, as well as map elements, time dimension parameters, environmental perception data, and dynamic event tags, all of which are modeling data for the reinforcement learning environment.
[0036] Building a policy-based network and value network The Actor-Critic architecture outputs the probability distribution of actions, while the latter evaluates the value of states. Initialization: Model each traffic light as an independent agent and initialize the policy network. and value network Set a quintuple<S, A, P, R,γ> Where S is the state space, A is the action space, P is the state transition probability, R is the reward function, γ is the discount factor, and the time step T, the number of agents N, and the maximum number of training rounds U are set. in, State space S: At time step t, for the i-th traffic light agent, its state space is... Defined as the joint set of the traffic flow feature vectors of all approach lanes at the current intersection and the current signal phase vector, denoted as... , where M is the total number of approach lanes at intersection i; The current phase state of the traffic light at time t is represented using one-hot encoding; is the micro-mixed traffic flow feature vector of the l-th lane, which includes vehicle throughput, queue length, average vehicle speed and vehicle type classification for each lane in the road network; Action space: ,in, This is the north-south straight-ahead phase, indicating a green light for north-south straight-ahead lanes and a red light for the rest. This indicates a green light for north-south left-turn lanes, while the rest are red. This is the east-west straight-ahead phase, indicating a green light for east-west straight-ahead lanes and a red light for the rest. This is the east-west left-turn phase, indicating a green light for east-west left-turn lanes and a red light for the rest. Action selection and execution: At time step t, for each traffic light agent... According to the policy network Select the optimal action To carry out joint operations Then, obtain the new state. ,award and termination mark , experience Store to buffer, update state Calculate the dominance function for each time step t. and cumulative returns ; Advantage function Measured in a specific state Next, take specific actions The degree of advantage relative to the average performance in this state is calculated as follows: in, The timing difference error is calculated using the following expression: It is the cumulative sum of all discount rewards starting from the current time t and continuing until a certain future period, expressed as: In each update iteration, the empirical frequency is sampled from the buffer, the policy loss and value loss are calculated, and the policy network is updated. and value network The gradient is calculated, and the cache is cleared. When the current training round reaches the maximum training round U, the output result is obtained, and the dynamic signal timing scheme is obtained.
[0037] In this embodiment, the reward function includes rewards for reduced waiting time, reduced queue length, improved traffic efficiency, improved global road network status, and traffic optimization targets related to inter-intersection cooperation. These are represented using a weighted sum. In the formula: , , , These are rewards for reducing waiting time, reducing queue length, improving traffic efficiency, and improving the overall road network status. , , , These are the weighting coefficients for the corresponding rewards, and each coefficient was determined through experimental optimization to balance the improvement effects of various traffic indicators. Specifically, in this embodiment, they are set to 0.35, 0.25, 0.20, and 0.20, respectively. This establishes a control hierarchy that prioritizes efficiency while also considering fairness, ensuring rapid convergence and stability of the system; and balancing the conflict between "local optima" and "global coordination."
[0038] Specifically, the rewards are set as follows: (1) Reduction in waiting time reward The goal is to reward agents for reducing the total cumulative waiting time at intersections through reasonable timing. The calculation logic is the total waiting time at the previous moment minus the total waiting time at the current moment.
[0039] In the formula: This is the set of all approach lanes at the current intersection. The sum of the cumulative waiting times of all vehicles in lane l at the previous decision step t-1. Let t be the sum of the cumulative waiting times of all vehicles in lane l at the current decision step t. A value greater than 0 indicates that the current action has successfully reduced the cumulative waiting time at the intersection, thus providing a positive incentive.
[0040] (2) Reward for reduced queue length The goal is to reward agents for quickly dispersing queued vehicles. The calculation logic is the queue length at the previous time step minus the queue length at the current time step.
[0041] like < If the queue is shortening, the reward is positive; conversely, if the queue is growing, the reward is negative.
[0042] (3) Traffic efficiency improvement reward The aim is to maximize the traffic throughput at intersections per unit of time.
[0043] In the formula: For the exit lanes of the intersection, This represents the total number of vehicles that successfully exit lane l and pass through the intersection within the current time step t. This is a formation integrity indicator factor (1 if the passing autonomous vehicle convoy is not cut off at the end of the current phase; otherwise, 0 or a negative value). This is the formation bonus coefficient. Encourage more vehicles to pass through, and prioritize the continuous passage of autonomous vehicle fleets.
[0044] (4) Global road network status improvement reward The aim is to guide the intelligent agent to focus on the overall traffic efficiency of the road network and avoid local optima. Regional average normalized vehicle speed is used as the representation.
[0045] In the formula: This is a set of intersections or road segments adjacent to the current intersection. Let j be the average speed of neighboring road segment j at time t. Free-flow speed limit (maximum speed limit) designed for roads. The closer the value is to 1, the smoother the surrounding roads are; this incentivizes the traffic light intelligence to actively cooperate in alleviating pressure on the surrounding areas when it is not congested itself.
[0046] In this embodiment, the MAPPO algorithm for updating policy network parameters is optimized using a multi-agent proximal strategy.
[0047] The core optimization objective function of MAPPO is: = E In the formula: E is the expected value; The probability ratio between the old and new strategies; The probability of the new strategy; The probability is the old strategy probability; These are the parameters to be optimized for the policy network; The action at time t is the selection of the traffic light phase. The local observations of the agent at time t include the queue length, lane occupancy rate, and current traffic light phase status at this intersection; The state at time t; The probability ratio between the old and new strategies is used to measure the magnitude of change in the updated strategy relative to the original strategy. The advantage function comprises a feature vector containing dynamic topological weights of the road network, extracted via a graph attention network (GAT). This is the clipping function; The trimming parameter is set to 0.2 in this embodiment.
[0048] To ensure training stability and promote exploration, this embodiment simultaneously optimizes the value function loss and the entropy regularization term: (ϕ) = E[ ] The final overall objective function is: Based on this, the present invention introduces an experience-first replay mechanism, which assigns a higher sampling probability to rare but important state-action pairs, thereby accelerating the learning process and improving the ability to handle complex traffic scenarios.
[0049] S4. To improve interpretability, a large language model is used to build a decision interpreter to generate natural language explanations for traffic signal phase decisions that comply with traffic management regulations.
[0050] To improve interpretability, a large language model is used to build a decision interpreter, which generates natural language interpretation files for traffic light phase decisions corresponding to dynamic signal timing schemes.
[0051] This embodiment uses a large language model to construct a traffic light decision interpreter. The interpreter is based on the Transformer architecture and uses a multi-layer self-attention mechanism to process input data, realizing the mapping from numerical signals to natural language interpretation.
[0052] The interpreter's input consists of three parts: 1) Traffic state data 1) Including indicators such as lane flow and waiting time; 2) Decision data 3) Background knowledge It includes traffic rules and best practices.
[0053] After the input is processed by the embedding layer, it is computed using a multi-head self-attention mechanism: V In the formula: These represent the query vector, key vector, and value vector, respectively. For vector dimensions.
[0054] To enhance the professionalism and reliability of the explanations, a knowledge augmentation technique in the transportation domain is employed. This involves incorporating a large amount of traffic management literature and regulatory texts during the pre-training phase, enabling the model to master traffic terminology and standards. Explanation generation utilizes the BeamSearch algorithm, selecting the explanation sequence Y with the highest probability. = P (Y | , , ) The decision interpreter can generate multi-level interpretations, including operational-level interpretations (explaining the specific reasons for phase switching), strategy-level interpretations (explaining medium- and long-term traffic optimization goals), and coordination-level interpretations (clarifying the coordination relationship with adjacent intersections), to meet the interpretation needs of different users.
[0055] S5. Send the dynamic signal timing plan to the traffic signal controller at the intersection for execution.
[0056] It adopts a layered design, including a cloud-based decision-making layer, an edge processing layer, and a field execution layer. In terms of protocol design, it uses the lightweight MQTT protocol for data transmission and has designed a dedicated signal control message format.
[0057] To ensure transmission security, TLS 1.3 encryption protocol and X.509 certificate are used for authentication to prevent the signal from being maliciously tampered with.
[0058] The message queue buffering mechanism ensures reliable instruction delivery when the network is unstable.
[0059] At the execution level, a degradation mechanism is designed: when communication is interrupted, the signal controller automatically switches to local adaptive control mode to ensure the continuous operation of the traffic system.
[0060] At the same time, a complete feedback loop is built, with intersection equipment reporting execution status and traffic data in real time, forming a closed-loop control that enables the system to continuously optimize timing strategies and respond quickly to sudden traffic events.
[0061] This example implements an interpretable multi-node collaborative control method for traffic lights. Figure 1 This is a flowchart of the method of the present invention. Figure 2 This is a flowchart of the structured input process for real-time traffic data in the embodiment. Figure 3 The diagram below illustrates the process of adaptive calculation of road network topology using the attention network in this embodiment. Figure 4 The flowchart of the multi-agent reinforcement learning algorithm for multi-node collaborative control of traffic lights in this embodiment is shown. Figure 5 This is a flowchart illustrating the interpretation and generation process for traffic light decisions in this embodiment.
[0062] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A multi-node collaborative control method for traffic lights, characterized in that, include: S1. Real-time collection of road network status data; S2. Considering the dynamic changes in the road network, a graph attention network is used to adaptively model the road network topology. S3. After receiving real-time traffic flow data, a multi-agent reinforcement learning algorithm is used to coordinate the timing of multiple traffic lights and generate a dynamic signal timing scheme. S4. Use a large language model to build a decision interpreter and generate natural language interpretation files for traffic light phase decisions corresponding to dynamic signal timing schemes. S5. The dynamic signal timing scheme is sent to the intersection signal controller for execution, and the natural language interpretation file of the traffic light phase decision is pushed to the traffic monitoring terminal at the same time as the semantic basis for real-time rationality evaluation and intervention of the dynamic signal timing scheme.
2. The multi-node collaborative control method for traffic lights according to claim 1, characterized in that, The road network status data in S1 includes: Traffic flow parameters: vehicle throughput of each lane, queue length, average vehicle speed, and vehicle type classification, wherein the vehicle type classification includes motor vehicles, non-motor vehicles, and pedestrians; Map elements: lane topology, variable lane markings, traffic light phase rules, and right-of-way markings, including bus lanes; Time dimension parameters: the traffic mode classification of the current time period and the holiday marker, wherein the traffic mode classification includes morning peak, evening peak and off-peak; Environmental perception data: weather conditions, visibility, and road surface slippage coefficient; Dynamic event markers: coordinates of temporary road closures, radius of impact of traffic accidents, and emergency vehicle passage requests.
3. The multi-node collaborative control method for traffic lights according to claim 1, characterized in that, In step S2, based on the dynamic changing characteristics of the traffic network, a graph attention network is used to adaptively model the network topology, specifically including: Adjacency matrix ∈ This characterizes the connection relationships between intersections in the road network, where... The number of nodes at the intersection, when the node When there is a connection with node j, 1, otherwise ; Based on the dynamic adjacency matrix update mechanism, when a road network change is detected, the adjacency matrix is automatically updated. The update expression is: In the formula: This represents the adjusted adjacency matrix; Represents element-wise multiplication; Let the mask matrix be the connection between closed roads. The elements in the middle are set to 0; for newly added connections, the adjacency matrix is then updated. Add a new edge.
4. The multi-node collaborative control method for traffic lights according to claim 3, characterized in that, In the graph attention network, a multi-head attention mechanism is used to allow each intersection to simultaneously pay attention to the traffic conditions of multiple related intersections from different angles. Attention weights are dynamically calculated, and the degree of attention given to different neighboring intersections is automatically adjusted. The calculation expression for the graph attention layer is as follows: In the formula: To correct the leakage of linear units; and These are the feature vectors of nodes i and j, respectively; W is the weight matrix. For the first The weight matrix corresponding to each attention head For the number of heads; For attention vectors, Indicates transpose. These are the attention coefficients corresponding to the channels between nodes i and j. These are the attention coefficients corresponding to the channels between nodes i and j after normalization. For the first The attention coefficients corresponding to the normalized channels between nodes i and j for each attention head; This is the normalization function; It is an activation function; This is a vector concatenation operation; This represents the set of neighboring nodes corresponding to node i; Let be the updated feature vector of node i.
5. The multi-node collaborative control method for traffic lights according to claim 1, characterized in that, After receiving real-time traffic flow data, S3 employs a multi-agent reinforcement learning algorithm to perform multi-signal light coordination timing and generate a dynamic signal timing scheme, specifically including: Building a policy-based network and value network The Actor-Critic architecture outputs the probability distribution of actions, while the latter evaluates the value of states. Initialization: Model each traffic light as an independent agent and initialize the policy network. and value network Set the quintuple<S, A, P, R,γ> Where S is the state space, A is the action space, P is the state transition probability, R is the reward function, γ is the discount factor, and the time step T, the number of agents N, and the maximum number of training rounds U are set. in, State space S is defined as the joint set of the traffic flow feature vectors of all approach lanes at the current intersection and the current signal phase vector. At time step t, for the i-th traffic light agent, its state space is... Represented as ,in The total number of approach lanes at the intersection where the i-th traffic light agent is located; The current phase state of the traffic light at time t is represented using one-hot encoding; For the first The micro-level mixed traffic flow feature vector of each lane includes vehicle throughput, queue length, average vehicle speed and vehicle type classification for each lane in the road network; Action space: ,in, It is a north-south straight phase. It is a north-south left-turn phase. It is an east-west reciprocal phase. This is the east-west left-turn phase; Action selection and execution: At time step t, for each traffic light agent... According to the policy network Select the optimal action To carry out joint operations Then, obtain the new state. ,award and termination mark , experience Store to buffer, update state Calculate the dominance function at each time step t. and cumulative returns ; In each update iteration, the empirical frequency is sampled from the buffer, the policy loss and value loss are calculated, and the policy network is updated. and value network The gradient is calculated, and the cache is cleared. When the current training round reaches the maximum training round U, the output result is obtained, and the dynamic signal timing scheme is obtained.
6. The multi-node coordinated control method for traffic lights according to claim 5, characterized in that, The reward function R includes rewards for reducing waiting time, reducing queue length, improving traffic efficiency, improving the overall road network status, and traffic optimization target rewards for inter-intersection cooperation, which are represented by a weighted sum. The waiting time reduction reward is represented by subtracting the total waiting time at the current time from the total waiting time at the previous moment; The reward for reducing queue length is represented by subtracting the queue length at the current time from the queue length at the previous time step. The traffic efficiency improvement reward aims to maximize the traffic throughput at intersections per unit time. The global road network condition improvement reward is represented by the regional average normalized vehicle speed.
7. The multi-node coordinated control method for traffic lights according to claim 5, characterized in that, The policy network parameters are updated using a multi-agent proximal policy optimization algorithm.
8. The multi-node collaborative control method for traffic lights according to claim 7, characterized in that, The optimization objective adopted by the multi-agent proximal policy optimization algorithm includes a multi-agent proximal policy optimization loss term. The expression is: = E In the formula: E is the expected value; The probability ratio between the old and new strategies. For the probability of the new strategy, The probability is the old strategy probability; These are the parameters to be optimized for the policy network; The action at time t represents the phase selection of the traffic light; This represents the agent's local observations at time t, including the queue length, lane occupancy rate, and current traffic light phase state at this intersection. The state at time t; The probability ratio between the old and new strategies is used to measure the magnitude of change in the updated strategy relative to the original strategy. The dominant function; This is the clipping function; These are the trimming parameters.
9. A multi-node coordinated control method for traffic lights according to claim 7, characterized in that, The optimization objective of the near-end strategy optimization constraint strategy It also includes the value function loss term. And entropy regularization term The optimization objective The calculation expression is: (ϕ) = E[ ] In the formula: The loss coefficient is the value function coefficient, used to balance the weights of policy optimization and value estimation. The predicted value of the value network is represented by the parameter. Below, based on current local observations and global state Assessment of the value of the current state; The target value is calculated based on the real reward or generalized advantage estimate and is used as a supervision label for training the value network. This is the entropy regularization coefficient, used to adjust the exploration intensity; The action probability distribution output by the policy network; This is the entropy regularization term; The coefficient is constant. This is the information entropy function, used to measure the uncertainty of policy distribution.
10. A multi-node collaborative control method for traffic lights according to claim 1, characterized in that, In step S4, a decision interpreter is constructed using a large language model to generate a natural language explanation file for the traffic light phase decision corresponding to the dynamic signal timing scheme, specifically including: The decision interpreter collects key decision data of traffic lights in real time, including waiting time, queue length, traffic flow, average speed, congestion level, and the status of adjacent traffic lights for each lane. Construct structured prompts, transforming numerical data and status information into descriptive text, and adding professional traffic management terminology and background knowledge; The prompt words are sent to a large language model enhanced with traffic domain knowledge to generate decision explanations. These explanations include operational-level explanations to explain the specific reasons for phase switching, strategy-level explanations for long-term traffic optimization goals, and coordination-level explanations to clarify the relationship with adjacent intersections.