Intersection signal hierarchical control method and system

By using a hierarchical signal control method, a decision state vector is generated by fusing features from historical data and vehicle status data, thus coordinating short-term and long-term information. This solves the problem of unstable traffic state perception under low network coverage and improves the stability and efficiency of traffic flow control.

CN122245087APending Publication Date: 2026-06-19ZHEJIANG SUPCON INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG SUPCON INFORMATION TECH CO LTD
Filing Date
2025-12-03
Publication Date
2026-06-19

Smart Images

  • Figure CN122245087A_ABST
    Figure CN122245087A_ABST
Patent Text Reader

Abstract

This invention discloses a hierarchical control method and system for intersection signals, comprising: collecting intersection traffic data, including lane state data, vehicle state data, and historical data; obtaining long-term memory features based on encoding time and location information from historical data, and capturing spatial dependencies to output a macroscopic state vector; obtaining vehicle-level car-following dependencies based on vehicle state data and outputting a road segment compression representation; integrating lane state data, long-term memory features, and road segment compression representation based on a cross-attention mechanism to generate a decision state vector; and employing a hierarchical control model, where the strategy layer generates a macroscopic control objective based on the macroscopic state vector, and the execution layer generates control actions in real time based on the decision state vector and the macroscopic control objective. This invention integrates historical data with short-term real-time data, overcoming the data sparsity problem of low network coverage. By setting macroscopic control objectives at the strategy layer and coordinating with the real-time action generation at the execution layer, a balance between robustness and flexibility is achieved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intersection signal control technology, and in particular to a method and system for hierarchical control of intersection signals. Background Technology

[0002] Traffic signal control focuses on optimizing signal phase sequence and timing to alleviate traffic conflicts and improve intersection efficiency. Currently widely used fixed-time control methods generate signal timing plans based on human experience and historical traffic data. Inductive control employs simple logical rules and utilizes real-time data to optimize signals, providing better short-term adaptability than fixed-time control. Adaptive control uses optimization algorithms to manage traffic demand; existing research typically employs methods such as Mixed Integer Programming (MIP) and Dynamic Programming (DP), but these struggle to handle the complexity, nonlinearity, and stochasticity of traffic signal control, often exhibiting instability and decision delays. In recent years, deep reinforcement learning methods have been introduced into the field of traffic signal control, modeling the control problem as a Markov decision process and optimizing it through states, actions, and reward functions. Common methods such as DQN and PPO outperform traditional methods under ideal conditions, but they have significant drawbacks in mixed traffic scenarios with low network coverage. Data sparsity issues: Low network vehicle coverage leads to insufficient real-time vehicle data, making it difficult for deep reinforcement learning frameworks to obtain sufficient state information, affecting decision accuracy. Noise sensitivity: Short-term reward functions are sensitive to traffic flow fluctuations, easily leading to frequent phase switching and poor control stability. Long-term reliance on insufficient capture: Deep reinforcement learning struggles to extract experiential knowledge from sparse long-term rewards.

[0003] The "Intelligent Traffic Light Adjustment Method and System Integrating Vehicle-to-Everything (V2X) Information," published in Chinese patent literature (CN120220435A) on June 27, 2025, includes: collecting multi-source traffic data through roadside equipment, performing spatiotemporal alignment and outlier filtering; constructing a deep learning prediction model based on the filtered data, combining historical and real-time V2X information to predict traffic demand; generating a dynamic timing strategy based on the prediction, and synchronizing it to related intersections via V2X; deploying edge computing nodes to adjust traffic light parameters in real time and sending suggested speeds to vehicles; and establishing a closed-loop feedback mechanism to correct the strategy and model online based on trajectory data and efficiency indicators. This technology solves the technical problem of existing traffic light timing systems being unable to accurately predict and adaptively adjust demand in dynamic traffic environments by integrating multi-source V2X data in real time, achieving a significant improvement in traffic efficiency and a reduction in emergency response delays in complex traffic scenarios. While this technology relies on high-density V2X data to improve prediction and adjustment accuracy, it still cannot solve the problems of data sparsity, noise sensitivity, and long-term reliance on insufficient data capture in situations with low network coverage. Summary of the Invention

[0004] This invention aims to overcome the problems in existing technologies where signal control methods based on deep reinforcement learning struggle to acquire sufficient real-time vehicle status information and extract empirical knowledge from sparse long-term rewards under conditions of low network coverage. It provides a hierarchical control method and system for intersection signals.

[0005] To achieve the above objectives, the present invention adopts the following technical solution: A method for hierarchical control of intersection signals, comprising: Collect traffic data at intersections, including lane status data, vehicle status data, and historical data; Long-term memory features are obtained by encoding time and location information based on historical data, and spatial dependencies are captured to output macroscopic state vectors. Based on vehicle state data, vehicle-level car-following dependency is obtained, and a road segment compression representation is output. Based on the cross-attention mechanism, lane state data, long-term memory features, and road segment compression representation are integrated to generate a decision state vector; A hierarchical control model is adopted, in which the strategy layer generates macro-control objectives based on macro-state vectors, and the execution layer generates control actions in real time based on decision state vectors and macro-control objectives.

[0006] This invention proposes a hierarchical signal control method for mixed traffic scenarios with low network coverage, achieving significant performance improvements through multi-dimensional technology fusion. Enhanced state observation stability: The spatiotemporal feature fusion model based on an attention mechanism effectively integrates long-term historical data with short-term real-time data, overcoming the data sparsity problem under low network coverage, resulting in more accurate traffic state perception and stronger anti-interference capabilities. Optimized traffic flow processing efficiency: By setting macro-control objectives at the strategy layer and coordinating with real-time action generation at the execution layer, a balance is achieved between long-term strategy robustness and short-term execution flexibility, significantly improving the ability to handle traffic flow imbalances. Reduced delays and accelerated congestion dissipation: The hierarchical control structure reduces average intersection delays and accelerates the congestion dissipation process. Strong robustness and adaptability: In scenarios with low network coverage, the performance degradation of this invention is relatively small, demonstrating excellent environmental adaptability.

[0007] Preferably, the process of outputting the macroscopic state vector includes: Input historical data and environmental variables, use an encoder based on the Transformer architecture to encode temporal and location information, and retrieve contextualized memories from the historical view to obtain long-term memory features; Furthermore, a macroscopic state vector is output through lane-dimensional convolutional layers and inlet-dimensional fully connected layers.

[0008] Preferably, the process of compressing and characterizing the output road segment includes: For vehicle status data, local temporal features are obtained by feature extraction using a convolutional neural network; The local temporal features of each vehicle are sorted in descending order of the distance between the vehicle and the stop line to obtain the vehicle-level state tensor; The road segment compression representation is further calculated using an encoder based on the Transformer architecture.

[0009] Preferably, the process of generating the decision state vector includes: Calculate the attention product between long-term memory features and lane state data, and extract long-short-term features through a decoder; Long-term and short-term features and road segment compressed representations are mapped to query vectors and key vectors, respectively. Spatiotemporal attention weights are calculated through dot products, and further output decision state vectors through lane-dimensional convolutional layers and inlet-dimensional fully connected layers.

[0010] Preferably, the strategy layer generates macroscopic control objectives based on macroscopic state vectors, including: The phase sequence manager generates a reference phase sequence for a preset future time period based on the macroscopic state vector. The green light time manager generates green light performance indicators based on macro-state vectors and environmental variables, including at least queue length, headway, and lane speed.

[0011] Preferably, the execution layer generates control actions in real time based on the decision state vector and the macro-control objective, including: The phase sequence executor selects the candidate sequence with the highest probability from the candidate sequence set as the phase sequence selection action for the next cycle, based on the decision state vector and the reference phase sequence. The green light timing executor calculates the green time adjustment value with the highest probability based on the decision state vector, the green end performance index, and the current phase, and uses this value as the green light duration adjustment action.

[0012] Preferably, the lane status data is obtained by measuring video or radar equipment installed at each entrance of the intersection; the vehicle status data includes vehicle data of the onboard sensor vehicle and data of surrounding vehicles; the historical data is historical traffic data at the level of each entrance lane.

[0013] As a preferred approach, after generating the control action, the control action is quantitatively evaluated using a reward function, and then trained and optimized using hierarchical reinforcement learning. The reward function includes a phase sequence reward based on changes in traffic flow and a green light time reward based on a reduction in average delay.

[0014] Preferably, the training optimization using hierarchical reinforcement learning includes: Noise is injected into the policy layer for exploration in the first few training cycles. The execution layer evaluates the results through a reward function. The loss function in the network update process replaces the preset sub-objective with the sub-objective that is actually achieved. The preset sub-objective is the baseline phase sequence and the green end performance index in the policy layer.

[0015] A traffic signal layered control system, comprising: The data acquisition module collects traffic data, vehicle status data, and historical data at intersections. The long-term fusion module extracts macroscopic state vectors based on historical data; The state embedding module extracts compressed representations of road segments based on vehicle state data; The long-term and short-term fusion module generates a decision state vector based on lane state data, long-term memory features, and road segment compression representation. The hierarchical control module is equipped with a hierarchical control model and uses a hierarchical reinforcement learning method to train and optimize the model.

[0016] This invention has the following beneficial effects: by fusing attention-based spatiotemporal features, a macroscopic state vector from historical data is obtained as a long-term representation, and a decision state vector is obtained by combining the macroscopic state vector, vehicle state data, and lane state data as a short-term fusion representation, thereby enhancing the stability of traffic state observation of mixed traffic flow under low network coverage; the long-term strategy layer generates macroscopic control objectives (such as baseline phase sequences and green end performance indicators), and the short-term execution layer adjusts actions in real time, balancing long-term robustness and short-term optimality; a hierarchical reinforcement learning framework is adopted, and the control actions generated by the model are quantitatively evaluated through a posterior transfer mechanism and a reward function, thereby improving the convergence and control effect of the hierarchical control model. Attached Figure Description

[0017] Figure 1 This is a flowchart of the intersection signal layered control method in this invention.

[0018] Figure 2 This is a schematic diagram of the long-term and short-term fusion network model in this invention.

[0019] Figure 3 This is a schematic diagram of the hierarchical control model in this invention.

[0020] Figure 4 This is a schematic diagram of the action space definition for signal control in this invention.

[0021] Figure 5 This is a flowchart of the hierarchical reinforcement learning in this invention.

[0022] Figure 6 This is a schematic diagram comparing the signal control method of the present invention with other existing technologies. Detailed Implementation

[0023] The present invention will now be further described with reference to the accompanying drawings and specific embodiments.

[0024] like Figure 1 As shown, a hierarchical control method for intersection signals includes: Collect traffic data at intersections, including lane status data, vehicle status data, and historical data; Long-term memory features are obtained by encoding time and location information based on historical data, and spatial dependencies are captured to output macroscopic state vectors. Based on vehicle state data, vehicle-level car-following dependency is obtained, and a road segment compression representation is output. Based on the cross-attention mechanism, lane state data, long-term memory features, and road segment compression representation are integrated to generate a decision state vector; A hierarchical control model is adopted, in which the strategy layer generates macro-control objectives based on macro-state vectors, and the execution layer generates control actions in real time based on decision state vectors and macro-control objectives.

[0025] This invention addresses mixed traffic scenarios with low network coverage by proposing a hierarchical signal control method, HiTSC. By fusing attention-based spatiotemporal features, HiTSC effectively solves the problem of vehicle data sparsity and significantly improves the stability of traffic state observation for mixed traffic flows under low network coverage. Specifically, this method integrates long-term roadside data with short-term vehicle-side data, using macroscopic state vectors from historical data as long-term representations and combining vehicle state data and lane state data to generate decision state vectors as short-term fusion representations, thereby enhancing the dynamic perception capability of mixed traffic flows.

[0026] In terms of control architecture, this invention adopts a hierarchical control model, divided into a long-term strategy layer and a short-term execution layer. The long-term strategy layer is responsible for setting control scenarios and generating macro-level control objectives (such as baseline phase sequences and green-end performance indicators) to ensure the robustness of decisions. The short-term execution layer adjusts control actions in real time under the constraints of macro-level control objectives to achieve short-term optimal responses. Through a posterior transfer mechanism and reward function, this method quantitatively evaluates control actions, improves model convergence and control effectiveness, effectively reduces average intersection delays, and accelerates congestion dissipation. Compared with existing technologies, this invention exhibits lower performance degradation in low-coverage scenarios, is more efficient in handling traffic flow imbalances, and achieves a balance between long-term robust decision-making and short-term optimal actions, thereby improving the overall stability and efficiency of traffic signal control.

[0027] In addition to a method for hierarchical control of intersection signals, the present invention also provides a hierarchical control system for intersection signals, comprising: The data acquisition module collects traffic data, vehicle status data, and historical data at intersections. The long-term fusion module extracts macroscopic state vectors based on historical data; The state embedding module extracts compressed representations of road segments based on vehicle state data; The long-term and short-term fusion module generates a decision state vector based on lane state data, long-term memory features, and road segment compression representation. The hierarchical control module is equipped with a hierarchical control model and uses a hierarchical reinforcement learning method to train and optimize the model.

[0028] Hierarchical control systems in mixed traffic environments can be represented by Markov decision processes (MDPs). The formal characteristic of the MDP framework is that it consists of tuples.<S,A,P,R> S represents the environmental state, A represents feasible control actions, P is used to model stochastic state evolution conditioned on the current state and control actions, and R is a multi-objective metric for quantifying the control effect.

[0029] In this invention, S represents the environmental traffic state, which includes multiple factors (such as weather, time of day, etc.). A represents the multidimensional discrete action space, including phase sequence selection (PSS) and green light time adjustment (GID). P represents the stochastic evolution of the state with actions, corresponding to the spatial state encoding process. R is used to quantify the multi-objective measure of control effectiveness, such as traffic flow changes and delay reduction.

[0030] Environment variables This is used to represent how traffic conditions are affected by multiple environmental factors, including weather, season, time of day, week, and holidays. In intersection modeling, 'n' represents the approach number, and 'l' represents the lane number under the approach.

[0031] Lane status data Data such as traffic flow, queue length, and speed in the entrance area, obtained by measuring video or radar equipment installed at each entrance of the intersection, is a state vector of the l-th lane of the n-th entrance. It includes a feature vector composed of lane configuration type (one-hot encoding), lane flow permission (0 / 1 value), current period and past 5-minute traffic flow, queue length, average vehicle speed, lane occupancy rate in the detection area, and embedded features (position / direction / speed / acceleration) of the first / last vehicle.

[0032] Vehicle status data It uses data measured by onboard positioning and kinematic sensors, as well as data measured by vision or radar sensors on intelligent vehicles, to stitch together the position, orientation, speed, and acceleration of each vehicle into an embedded vector. Historical data. This represents data such as 5-minute traffic flow, maximum queue length, and average speed for lane l at the nth entrance.

[0033] As a specific example, such as Figure 2The diagram illustrates a long-term and short-term fusion network model. After collecting intersection traffic data, features are extracted and fused from heterogeneous data sources (such as historical data, vehicle status data, and lane status data) through a spatial state encoding process to support hierarchical signal control decisions. The overall technical framework is based on attention mechanisms and neural networks, emphasizing the synergy between long-term pattern mining and short-term dynamic capture. Through attention-based feature coordination, high-precision trajectories of connected vehicles and traditional sensor data are processed in a dual-stream manner. Under the premise of ensuring spatiotemporal consistency, transient traffic states and historical evolution trends are jointly modeled, effectively overcoming some observability limitations.

[0034] It should be noted that traffic data contains complex spatiotemporal dynamic correlations, and the coexistence of real-time data streams and historical patterns requires multi-scale feature alignment, especially in scenarios with low network coverage. This invention uses an attention framework for extracting spatiotemporal features of traffic states in mixed environments. This state encoder establishes hierarchical representations for heterogeneous traffic observation data through a dual-branch neural network architecture.

[0035] To address the spatiotemporal hybrid characteristics of traffic conditions, separate data encoding processes are designed at the intersection level (corresponding to lane state data) and the road segment level (corresponding to vehicle state data, as road segments typically lack detectors and are almost entirely obtained through connected vehicle data). Notably, the basic units used in the model are the encoder (E unit) and decoder (D unit) of the Transformer architecture. Each encoder / decoder is constructed using a multi-head self-attention mechanism, a feedforward neural network, and layer normalization stacking.

[0036] The long-short fusion network model can be divided into three different branches: the first branch is the long-term feature fusion branch, the second branch is the long-short feature fusion branch, and the third branch is the state embedding branch. The three branches are combined with each other to finally output the macroscopic state vector representing the long-term features and the decision state vector representing the short-term fusion features.

[0037] Optionally, the process of outputting the macroscopic state vector in the long-term feature fusion branch includes: Input historical data and environmental variables, use an encoder based on the Transformer architecture to encode temporal and location information, and retrieve contextualized memories from the historical view to obtain long-term memory features; Furthermore, a macroscopic state vector is output through lane-dimensional convolutional layers and inlet-dimensional fully connected layers.

[0038] Long-term feature fusion focuses on uncovering potential pattern characteristics of intersection flow dynamics. It addresses the data sparsity problem under low network coverage by analyzing long-term historical data (such as 5-minute traffic flow, maximum queue length, and average speed) to identify periodic patterns in traffic flow (such as morning and evening rush hour patterns), thus providing a macro-level basis for strategy-level decision-making. It provides long-term trend information to the strategy layer, reducing decision instability caused by short-term data fluctuations, and exhibits robust performance, especially in scenarios with low network coverage.

[0039] Similarity analysis between historical data and current traffic conditions can enhance the effectiveness of traffic pattern extraction (for example, when traffic signals are adjusted at 7:30, traffic flow data at 7:00 is more valuable than that at 5:00). This is a qualitative description; it captures and forms feature vectors through a time-attention mechanism between historical and current data, rather than an explicitly interpretable method.

[0040] The inputs for long-term feature fusion include historical data and environmental variables, with the historical data at action step t being the primary input. Environmental variables including lane configuration, traffic flow, maximum queue length, average speed, and recent time window H. It is pieced together. The recent time window H is a time period, which can be represented as the time interval between two adjacent action steps, for example, 30 minutes can be selected as a time window.

[0041] For the concatenated input, the encoder Encode time and location information and extract it from the historical view. Recalling contextualized memories and generating lane-level long-term memory features encoder Includes a 3-layer encoder E-unit. Historical view. It is a time series with a weekly unit and a granularity of 30 minutes (similar to a time window H).

[0042] After acquiring long-term memory features, further discrete state space modeling is used, employing lane axis convolutional layers and approach lane fully connected layers to capture spatial dependencies and output macroscopic state vectors. It can effectively characterize potential traffic patterns such as vehicle arrival rate and turning ratio. The lane axis convolutional layer uses a conventional convolutional layer, and the description of the lane axis is to represent the data dimension along the 1D convolution; the approach lane fully connected layer uses a conventional fully connected layer, and the description of the approach lane is the same as that of the lane axis.

[0043] Optionally, the process of outputting the compressed representation of the road segment in the state embedding branch includes: For vehicle status data, local temporal features are obtained by feature extraction using a convolutional neural network; The local temporal features of each vehicle are sorted in descending order of the distance between the vehicle and the stop line to obtain the vehicle-level state tensor; The road segment compression representation is further calculated using an encoder based on the Transformer architecture.

[0044] State embedding branch processing short window Vehicle status data (Missing points are filled with zeros). Compared to recurrent neural networks (RNNs), convolutional neural networks (CNNs) offer superior computational efficiency due to their parallel training architecture, and their local receptive field characteristics can accurately capture short-term, rapidly changing features. The short time window is a 3-5 minute range with data granular at the second level.

[0045] By leveraging high-precision trajectory data (such as position, speed, and acceleration) from connected vehicles to supplement the deficiencies of traditional sensor data, the accuracy of short-term traffic condition observations is improved. This provides the execution layer with fine-grained real-time traffic conditions, supporting the optimization of short-term actions (such as adjusting green light times), and enhancing control response speed, especially when connected vehicle data is available.

[0046] Local temporal features are extracted from vehicle state data using time-axis convolution operations. The corresponding vehicle state embedding layer consists of two convolutional layers, one max-pooling layer, and one fully connected layer. The local temporal features of each vehicle are arranged in descending order of distance from the stop line (from farthest to nearest), forming the vehicle-level state tensor. An encoder composed of three layers of encoding E modules Computational segment compression characterization Simultaneously capture vehicle-level car-following dependency and upstream / downstream traffic wave propagation patterns. Vehicle-level car-following dependency can be described as the car-following characteristics of preceding and following vehicles.

[0047] Optionally, the process of generating the decision state vector in the long-short-term feature fusion branch includes: Calculate the attention product between long-term memory features and lane state data, and extract long-short-term features through a decoder; Long-term and short-term features and road segment compressed representations are mapped to query vectors and key vectors, respectively. Spatiotemporal attention weights are calculated through dot products, and further output decision state vectors through lane-dimensional convolutional layers and inlet-dimensional fully connected layers.

[0048] Long-term and short-term feature fusion integrates long-term and short-term features through an attention mechanism to achieve multi-scale data alignment. It overcomes some observability limitations by fusing long-term historical patterns with short-term real-time dynamics to generate a comprehensive decision state vector. It ensures spatiotemporal consistency, improves the robustness of state coding, and enables the control model to maintain stable performance even with sparse data. It serves as a core bridge for hierarchical control, effectively balancing long-term strategy and short-term execution.

[0049] The inputs include long-term memory features, lane state data, and road segment compression representations.

[0050] Using the cross-attention mechanism, long-term memory features are first calculated. Lane status data The attention products are initially fused and then input into the D unit of the 3-layer decoder to extract long and short-term features. Then, the long-term and short-term characteristics Mapping to query vectors to compress and represent road segments Mapped to key vectors, spatiotemporal attention weights are calculated via dot product, and the output is a fusion tensor. .

[0051] The decision state vector is generated after the fusion tensor is modeled with spatial dependencies (such as lane axis convolution and fully connected layers at the approach lanes). This vector serves as the direct input to the execution layer's actions. The lane axis convolutional layer uses a conventional convolutional layer, and the lane axis description is used to represent the data dimension along which the 1D convolution occurs; the approach lane fully connected layer uses a conventional fully connected layer, and the approach lane description is similar to that of the lane axis.

[0052] As a specific example, such as Figure 3 The diagram illustrates a hierarchical control model. The strategy layer and execution layer constitute the core architecture of the hierarchical signal control method of this invention, aiming to improve the stability and efficiency of traffic signal control in scenarios with low network coverage by decoupling long-term strategy from short-term execution.

[0053] First, it is necessary to define the space of discrete actions in the hierarchical control process, such as... Figure 4 As shown, multidimensional discrete action space This achieves coordinated optimization of Phase Sequence Selection (PSS) and Green Light Timing Adjustment (GID). Select an action for the phase sequence. For the candidate sequence set; Adjust actions for green light time. For the set of adjustment quantities.

[0054] Phase control follows basic traffic logic: each phase activates the right-of-way for a specific direction (straight ahead / left turn), while right-turning vehicles are unrestricted (always a green light). A typical phase set contains 8 non-conflicting combinations (e.g., eastbound straight ahead + westbound straight ahead, denoted as (ES, WS)). The phase sequence is periodically selected from a candidate sequence set. For each action step t in each cycle, a new sequence is selected from the set of candidate sequences.

[0055] In green light time control, the total phase duration is the sum of the red, flashing yellow, and green light times. With a decision step size of 10 seconds h, the phase green light time is adjusted in real time for [-h, h] seconds during the current green light period (i.e., the adjustment range of the phase green light time is selected from decreasing h seconds to increasing h seconds), while satisfying three constraints: the minimum green light time is greater than or equal to 5 seconds, the maximum duration of a single phase is less than or equal to 120 seconds, and adjustment is prohibited during the countdown phase.

[0056] As a high-level decision-making unit, the strategy layer focuses on macro-level traffic pattern analysis and long-term goal setting. Its decision-making cycle is relatively long (e.g., 30 minutes). It generates macro-level control objectives, such as phase sequence benchmarks and green light time strategies, to provide constraints and guidance for the execution layer, provide a strategic framework, reduce frequent phase sequence switching, enhance the robustness of control under low network coverage, and ensure long-term traffic efficiency.

[0057] The execution layer is responsible for real-time signal control. Its decision-making cycle is short (seconds). Under the goals set by the strategy layer, it generates fine-grained actions to achieve real-time signal optimization, such as adjusting the phase sequence and green light duration to respond to instantaneous traffic changes (such as a surge in vehicle queues). Under the constraints of the strategy, it achieves local optima, reduces average delays, and improves intersection efficiency.

[0058] Optionally, the strategy layer generates macroscopic control objectives based on the macroscopic state vector, including: Phase sequence manager According to the macroscopic state vector Generate a reference phase sequence for a future preset time period H. ; Green Light Time Manager According to the macroscopic state vector Generate green end performance indicators for the future preset H time period based on environmental variables. This includes at least queue length, headway, and lane speed. The optimal solution is selected from several preset strategy combinations, such as balanced strategy, single-import priority, dual-import priority, and overflow prevention.

[0059] The manager and executor in this invention both use deep neural networks, which have already been used for detailed action space representation. Therefore, both the manager and executor use conventional 3 fully connected layers and activation layers.

[0060] The green light time manager's optimization process involves a deep neural network model within the manager calculating the optimal probability of each option based on environmental variables. Specifically, there are several preset strategy combinations (including queue length, headway, and lane speed). Each strategy includes configuration values ​​for different parameters at different inputs. The deep neural network model calculates the probability of each strategy based on the spatial state and ultimately selects the one with the highest probability to output the green light performance index.

[0061] The output green-end performance index serves as the target for the green light timing actuator. This means that at the end of the phase, the intersection traffic should reach a state as close to the index as possible. Therefore, in the subsequent hierarchical learning framework, the actuator's learning objective is also evaluated based on the similarity between the actual traffic state obtained from control and the preset target. The macro-control objectives output by the strategy layer include the baseline phase sequence and the green-end performance index, both of which are sub-objectives of the subsequent execution layer.

[0062] Optionally, the execution layer generates control actions in real time based on the decision state vector and the macro-control objective, including: Phase sequence actuator Based on decision state vector and reference phase sequence (One of the macro-control objectives) is to select the candidate sequence with the highest probability from the candidate sequence set as the phase sequence selection action for the next cycle. ; Green light time actuator Based on decision state vector and green end performance indicators (One of the macroeconomic control objectives) and the current phase The green time adjustment value with the highest probability is used as the green light duration adjustment action. The final action is executed by the signal controller to control the traffic lights.

[0063] The phase sequence executor in this invention calculates the probability of each sequence from a set of candidate sequences (within a manually configured range) based on the state and the target set by the manager, and selects the sequence with the highest probability as the target. Both the green light time executor and the phase sequence executor are network models. Similarly, based on the state and the target set by the manager, they calculate the target probability for each integer value in the range of -10 seconds to 10 seconds, and select the value with the highest probability as the value for adjusting the green light duration.

[0064] As a specific implementation, after generating control actions through a hierarchical control model, the control actions are quantitatively evaluated through a reward function, and hierarchical reinforcement learning is used for training and optimization. The reward function includes a phase sequence reward based on changes in traffic flow and a green light time reward based on a reduction in average delay.

[0065] The control system uses an environmental reward function to quantitatively evaluate the executed actions. Given the relatively weak coupling between long-term manager decisions, two independent environmental reward indices are defined. The Phase Sequence Manager (PSS) decisions primarily affect the timing characteristics of the signal timing structure and the traffic efficiency of sequence associations; its reward function is defined as the change in intersection traffic flow between action step t and historical step tH. The Green Light Time Adjustment (GID) decisions dominate the traffic priority for different turns; its reward function is defined as the reduction in the average delay time of vehicles passing between action step t and historical step tH.

[0066] Training optimization using hierarchical reinforcement learning includes: In the first few training cycles, noise is injected into the policy layer for exploration. The execution layer evaluates the results using a reward function. During network updates, the loss function replaces the preset sub-objective with the actual sub-objective achieved, which is the baseline phase sequence in the policy layer. and green end performance indicators .

[0067] The hierarchical reinforcement learning in this invention introduces a posterior transfer mechanism to achieve parallel policy learning. The policy layer and the execution layer can be co-trained using any actor-critic algorithm (such as Q-learning or DDPG). In each training cycle, an exploration method is used to inject noise into the policy of the policy layer. In each signal cycle... At any given moment, execution layer performance is assessed through cross-point rewards and sub-goal rewards. , Dual assessment.

[0068] The evaluation and reward for phase sequence selection are based on phase utilization rate (total traffic flow at the intersection divided by total green light time) and the selected phase sequence action. right The calculation is weighted by the percentage of implementation. The manager's action steps are in 30-minute increments. Sub-targets used at all times It was generated previously. The superscript t1 is used for the sub-target to reflect the asynchronous nature of the manager and the executor.

[0069] Evaluation and rewards for green light time adjustments are based on... When the target state and phase end The actual state (which can be observed through intersection sensing devices) is measured by Euclidean distance.

[0070] like Figure 5The diagram shows the flowchart of the hierarchical reinforcement learning in this invention (the manager's policy corresponds to the policy layer, and the executor's policy corresponds to the execution layer). The policy layer and execution layer networks can be updated using the commonly used Q-learning algorithm. To overcome the exploration challenge brought about by the dynamic changes of the lower-level policies, the loss function adopts the sub-objectives actually achieved. Replace preset sub-targets , This is used to update the network. All actor and critic networks contain 3 hidden layers, each with 64 nodes and using the ReLU activation function.

[0071] Compared to existing traditional control methods and DRL methods, such as Figure 6 As shown, this invention achieves lower traffic delay control and delay volatility in scenarios with low network coverage, reducing average delay by 6.7%, and its performance is less affected by changes in network coverage. Furthermore, although the model's learning speed is not the fastest due to the introduction of long-term strategies, its final convergence is superior to similar methods.

[0072] The above embodiments are further elaborations and descriptions of the present invention to facilitate understanding, and are not intended to limit the present invention in any way. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for intersection signal hierarchical control, characterized by, include: Collect traffic data at intersections, including lane status data, vehicle status data, and historical data; Long-term memory features are obtained by encoding time and location information based on historical data, and spatial dependencies are captured to output macroscopic state vectors. Based on vehicle state data, vehicle-level car-following dependency is obtained, and a road segment compression representation is output. Based on the cross-attention mechanism, lane state data, long-term memory features, and road segment compression representation are integrated to generate a decision state vector; A hierarchical control model is adopted, in which the strategy layer generates macro-control objectives based on macro-state vectors, and the execution layer generates control actions in real time based on decision state vectors and macro-control objectives.

2. The intersection signal hierarchical control method according to claim 1, wherein, The process of outputting the macroscopic state vector includes: Input historical data and environmental variables, use an encoder based on the Transformer architecture to encode temporal and location information, and retrieve contextualized memories from the historical view to obtain long-term memory features; Furthermore, a macroscopic state vector is output through lane-dimensional convolutional layers and inlet-dimensional fully connected layers.

3. The intersection signal hierarchical control method of claim 1, wherein, The process of compressing and characterizing the output segment includes: For vehicle status data, local temporal features are obtained by feature extraction using a convolutional neural network; The local temporal features of each vehicle are sorted in descending order of the distance between the vehicle and the stop line to obtain the vehicle-level state tensor; The road segment compression representation is further calculated using an encoder based on the Transformer architecture.

4. The intersection signal hierarchical control method according to claim 1 or 2 or 3, characterized in that, The process of generating the decision state vector includes: Calculate the attention product between long-term memory features and lane state data, and extract long-short-term features through a decoder; Long-term and short-term features and road segment compressed representations are mapped to query vectors and key vectors, respectively. Spatiotemporal attention weights are calculated through dot products, and further output decision state vectors through lane-dimensional convolutional layers and inlet-dimensional fully connected layers.

5. The intersection signal hierarchical control method of claim 1, wherein, The strategy layer generates macro-control objectives based on macro-state vectors, including: The phase sequence manager generates a reference phase sequence for a preset future time period based on the macroscopic state vector. The green light time manager generates green light performance indicators based on macro-state vectors and environmental variables, including at least queue length, headway, and lane speed.

6. The intersection signal hierarchical control method of claim 1 or 5, wherein, The execution layer generates control actions in real time based on the decision state vector and the macro-control objective, including: The phase sequence executor selects the candidate sequence with the highest probability from the candidate sequence set as the phase sequence selection action for the next cycle, based on the decision state vector and the reference phase sequence. The green light timing executor calculates the green time adjustment value with the highest probability based on the decision state vector, the green end performance index, and the current phase, and uses this value as the green light duration adjustment action.

7. The intersection signal hierarchical control method of claim 1, wherein, The lane status data is obtained by measuring video or radar equipment installed at each entrance of the intersection; the vehicle status data includes vehicle data of the vehicle itself and surrounding vehicle data from onboard sensors; the historical data is historical traffic data at the level of each entrance lane.

8. The intersection signal hierarchical control method of claim 1 or 2 or 3 or 5 or 7, wherein, After generating control actions, the control actions are quantitatively evaluated using a reward function, and then trained and optimized using hierarchical reinforcement learning. The reward function includes a phase sequence reward based on changes in traffic flow and a green light time reward based on a reduction in average delay.

9. The intersection signal hierarchical control method of claim 8, wherein, The training optimization using hierarchical reinforcement learning includes: Noise is injected into the policy layer for exploration in the first few training cycles. The execution layer evaluates the results through a reward function. The loss function in the network update process replaces the preset sub-objective with the sub-objective that is actually achieved. The preset sub-objective is the baseline phase sequence and the green end performance index in the policy layer.

10. A hierarchical intersection signal control system adapted to the hierarchical intersection signal control method according to any one of claims 1 to 9, characterized in that, include: The data acquisition module collects traffic data, vehicle status data, and historical data at intersections. The long-term fusion module extracts macroscopic state vectors based on historical data; The state embedding module extracts compressed representations of road segments based on vehicle state data; The long-term and short-term fusion module generates a decision state vector based on lane state data, long-term memory features, and road segment compression representation. The hierarchical control module is equipped with a hierarchical control model and uses a hierarchical reinforcement learning method to train and optimize the model.