A holographic perception traffic signal dynamic timing method and system based on proximal policy optimization
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SICHUAN RONGHAI ZHICHENG TECH GRP CO LTD
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-12
AI Technical Summary
Existing traffic signal control systems struggle to respond accurately to complex traffic scenarios, especially failing to adequately consider the needs of vulnerable groups such as pedestrians and cyclists, and lacking regional-level optimization, resulting in low traffic efficiency and insufficient safety.
A holographic perception system based on smart light poles and a near-end policy optimization reinforcement learning model are constructed. Data is collected through multimodal perception devices, and features are extracted by combining convolutional neural networks and long short-term memory networks. Near-end policy optimization reinforcement learning is used for policy iteration to dynamically adjust the green light duration. Finally, a genetic algorithm is used for regional coordination to generate an enhanced safety priority timing scheme.
It enables comprehensive perception and dynamic adjustment of traffic conditions, improves intersection traffic efficiency, reduces vehicle delays, ensures pedestrian safety when crossing the street, forms a closed loop of full-link adaptive evolution, and improves system operation and maintenance efficiency and equipment reliability.
Smart Images

Figure CN122201017A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of intelligent traffic control and artificial intelligence technology, specifically a holographic perception traffic signal dynamic timing method and system based on near-end strategy optimization. Background Technology
[0002] Traffic signal control, as a crucial component of urban traffic management, plays an irreplaceable role in alleviating congestion, improving traffic efficiency, and ensuring travel safety. With the acceleration of urbanization and the increasing complexity of traffic demands, optimizing signal timing and improving intersection management have become critical tasks urgently needing resolution in urban governance. However, current technologies are still insufficient in dealing with complex traffic scenarios, necessitating innovative breakthroughs to meet new challenges. Existing traffic signal control methods have revealed some deep-seated problems in practical applications, particularly in their inability to respond accurately to dynamic changes in the traffic environment. Many systems rely too heavily on fixed rules or preset schemes in their design, lacking comprehensive consideration for all traffic participants, such as insufficient consideration of the needs of pedestrians and cyclists, resulting in a poor travel experience for some groups. Furthermore, when handling overall coordination between intersections, these systems often fail to achieve regional-level optimization due to a lack of information sharing and linkage mechanisms, thus affecting the traffic capacity of the entire road or even the entire road network.
[0003] Against this backdrop, the core technological challenges have gradually become apparent. Firstly, there's the issue of comprehensively capturing traffic conditions. Because traffic scenarios involve multiple factors such as vehicles, pedestrians, and environmental conditions, existing technologies often fail to integrate all of this information, making it difficult to form a complete picture of the traffic situation. This inadequate information integration directly leads to another, deeper challenge: limited dynamic adjustment capabilities for signal timing. For example, at a busy intersection, during the morning rush hour, vehicle traffic surges, and simultaneously, schools nearby dismiss students, significantly increasing pedestrian crossing demand. If the traffic lights cannot adjust the green light duration in a timely manner based on vehicle queue length and pedestrian waiting times, prolonged vehicle congestion and pedestrians being forced to wait for extended periods or even risk crossing the street will occur, threatening both traffic efficiency and safety. Summary of the Invention
[0004] The purpose of this invention is to provide a holographic perception traffic signal dynamic timing method and system based on near-end strategy optimization. By constructing a holographic perception system based on smart light poles and a near-end strategy optimization reinforcement learning model, it solves the problems of low intersection traffic efficiency and insufficient travel safety for vulnerable groups caused by the single perception dimension and inability to dynamically adapt to complex traffic flow changes in existing traffic signal control.
[0005] The objective of this invention can be achieved through the following technical solutions: This application provides a holographic sensing traffic signal dynamic timing method based on near-end strategy optimization, including the following steps: S1. Multimodal sensing devices deployed on smart light poles collect traffic data and device status data at intersections, integrate them to form a comprehensive status vector, and upload it to the cloud data processing and optimization center; The traffic data includes vehicle queue length, instantaneous vehicle speed, vehicle type classification, number of waiting pedestrians, non-motorized vehicle traffic flow, environmental visibility and light intensity; the equipment status data includes the current value, voltage value and power consumption data of traffic light fixtures. S2. The data processing and optimization center uses a fusion model of convolutional neural network and long short-term memory network to extract spatiotemporal features from the comprehensive state vector, identify the current intersection congestion level and predict short-term traffic trends, and generate state prior information. S3. When the congestion level exceeds the preset threshold, the data processing and optimization center calls the near-end strategy optimization reinforcement learning agent, using the comprehensive state vector as the state space, the green light duration adjustment amount of each phase as the action space, and the composite traffic efficiency index as the reward function to perform strategy iteration and generate the optimized green light duration sequence. S4. The regional coordination unit interacts with the intelligent traffic light control system of adjacent intersections, and iteratively updates the signal cycle parameters and phase difference through a genetic algorithm to generate a synchronization scheme for the intersection group and realize the coordinated control of the green wave on the trunk line. S5: Based on the three built-in working modes of vehicle priority, pedestrian priority, and normal time period, it monitors the number of pedestrians waiting and the waiting time in real time, dynamically adjusts the green light duration for pedestrian crossing, and generates an enhanced safety priority timing scheme. S6, the intelligent transportation big data management platform obtains real-time execution logs. Its built-in intelligent work order alarm center executes closed-loop work order processes for equipment abnormalities and traffic incidents, and feeds back the execution effect data to the data processing and optimization center for continuous iterative training of reinforcement learning agents.
[0006] This application provides a holographic sensing traffic signal dynamic timing system based on near-end strategy optimization, applied to a holographic sensing traffic signal dynamic timing method based on near-end strategy optimization, including: The holographic perception and data fusion module, deployed on the smart light pole, includes an AI camera, an environmental detector, a one-button alarm device, and a single-lamp controller. It is used to collect multi-source data and fuse it to form a comprehensive state vector. The feature extraction and congestion prediction module, built into the data processing and optimization center, is used to extract spatiotemporal features using a fusion model of convolutional neural networks and long short-term memory networks, identify congestion levels and predict traffic trends, and generate state prior information. The PPO dynamic timing optimization module is activated when the congestion level exceeds a dynamic threshold. It calls the near-end policy optimization reinforcement learning agent to iterate the policy and generate an optimized green light duration sequence. The regional collaborative control module communicates with adjacent intersections through the regional coordination unit. It is used to optimize the signal period and phase difference using a genetic algorithm to generate a synchronization scheme for the intersection group and realize green wave coordinated control. The safety priority decision module has built-in vehicle priority, pedestrian priority and regular time period modes, which are used to monitor pedestrian demand in real time and dynamically adjust the green light duration for pedestrian crossing, and generate an enhanced safety priority timing scheme. The intelligent operation and maintenance and model evolution module has a built-in intelligent work order alarm center, which is used to execute closed-loop work order processes and feed back the execution effect data to the PPO reinforcement learning agent for continuous iterative training, forming a full-link adaptive evolution closed loop.
[0007] The beneficial effects of this invention are as follows: This invention addresses the problems of incomplete traffic status capture and insufficient information integration by constructing a holographic perception and data fusion system based on smart light poles. Step S1 deploys multimodal perception devices such as AI cameras, environmental detectors, one-click alarm devices, and single-lamp controllers to collect device status data such as vehicle queue length, pedestrian waiting number, non-motorized vehicle flow, environmental visibility, and traffic light current and voltage. A classification algorithm is then used to fuse multi-source heterogeneous data to form a comprehensive state vector. This achieves a leap from single vehicle detection to full-element perception of people, vehicles, roads, and the environment, solving the pain points of existing technologies that are incomplete and unable to form a complete traffic status picture, and providing an accurate and comprehensive data foundation for subsequent dynamic decision-making. By employing a collaborative mechanism of spatiotemporal feature extraction and reinforcement learning-based dynamic timing, the problem of limited dynamic adjustment capability of signal timing and inability to adapt to complex traffic scenario changes is solved. Step S2 utilizes a parallel architecture of convolutional neural networks and long short-term memory networks to extract spatiotemporal features of traffic conditions, combined with a pre-adjustment simulation mechanism to predict short-term traffic flow trends. Step S3 calls on a near-end strategy to optimize the reinforcement learning agent, generating an optimized green light duration sequence. Simultaneously, step S4 optimizes the signal period and phase difference through a genetic algorithm to achieve coordinated control of regional green waves. This transforms signal timing from fixed rules or preset schemes into adaptive and predictive dynamic optimization, enabling intelligent adjustment based on real-time traffic flow, pedestrian flow, and environmental changes. This significantly improves intersection efficiency, reduces vehicle delays and parking frequency, and effectively addresses traffic management challenges in complex scenarios such as overlapping morning rush hours and school dismissal times. By employing a pedestrian-priority decision-making mechanism and an intelligent operation and maintenance closed loop, the system addresses the issues of insufficient attention to vulnerable groups such as pedestrians and the lack of self-evolution capabilities. Step S5 incorporates three working modes: vehicle priority, pedestrian priority, and regular time periods. It monitors the number and duration of waiting pedestrians in real time, dynamically adjusts the pedestrian crossing green light duration by extracting historical data through feedback loops, generates an enhanced safety priority timing scheme, and links LED screens and broadcasts to issue prompts, ensuring pedestrian safety and travel experience. Step S6 obtains execution logs through the intelligent transportation big data management platform. The intelligent work order alarm center executes a closed-loop work order process for equipment anomalies and traffic events, including alarm, dispatch, order acceptance, repair, review, and completion. It also feeds back the execution effect data to the data processing and optimization center for continuous iterative training of the reinforcement learning agent. The entire process forms a closed-loop chain from perception, decision-making, execution to operation and maintenance and evolution, which not only improves the system's operation and maintenance efficiency and equipment reliability but also enables the model to continuously optimize based on actual operating results, achieving adaptive evolution and long-term performance improvement of the system. Attached Figure Description
[0008] To better understand and implement this application, the technical solution is described in detail below with reference to the accompanying drawings.
[0009] Figure 1 A flowchart illustrating a holographic sensing traffic signal dynamic timing method based on near-end strategy optimization provided in Embodiment 1 of this application; Figure 2 This is a flowchart illustrating step S2 in a holographic sensing traffic signal dynamic timing method based on near-end strategy optimization provided in Embodiment 1 of this application. Figure 3 This is a flowchart illustrating step S3 in a holographic perception traffic signal dynamic timing method based on near-end strategy optimization provided in Embodiment 1 of this application. Figure 4 This is a schematic diagram of the structure of a holographic perception traffic signal dynamic timing system based on near-end strategy optimization, provided in Embodiment 2 of this application. Detailed Implementation
[0010] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, exemplary embodiments will be described in detail below, examples of which are illustrated in the accompanying drawings. In the following description, when referring to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application.
[0011] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used herein are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
[0012] The following detailed description of the specific implementation methods, features, and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided in detail.
[0013] Example 1, please refer to Figures 1-3 This embodiment provides a holographic sensing traffic signal dynamic timing method based on near-end strategy optimization, including the following steps: S1. AI cameras, environmental detectors, one-click alarm devices, and single-lamp controllers deployed on smart light poles collect traffic data and equipment status data at intersections. The AI cameras collect data on vehicle queue length, instantaneous vehicle speed, vehicle type classification, number of pedestrians waiting, and non-motorized vehicle traffic flow. The environmental detectors collect data on ambient visibility and light intensity. The one-click alarm device receives active alarm signals. The detection circuit built into the single-lamp controller collects the current, voltage, and power consumption data of the traffic lights. The above traffic data and equipment data are fused to form a comprehensive status vector and uploaded to the cloud-based data processing and optimization center.
[0014] Further, step S1 specifically includes: S11. Using AI cameras and environmental detectors deployed on smart light poles, real-time data on vehicle queue length, instantaneous vehicle speed, vehicle type classification, number of waiting pedestrians, non-motorized vehicle traffic flow, environmental visibility, and light intensity are collected. A data standardization process is employed to uniformly format and timestamp-align the multi-source raw data with different dimensions and sampling frequencies, eliminating data bias caused by sensor heterogeneity and obtaining a preliminary traffic and environmental dataset. This step utilizes smart light poles as the physical carrier, leveraging the hardware integration advantages of multiple poles combined into one, and providing a unified input benchmark for subsequent holographic perception. Furthermore, after receiving the traffic and environmental dataset, the data processing and optimization center first establishes a spatial index based on the deployment location coding of the smart light poles. It then uses the 3σ criterion to detect outliers in the collected data, eliminating outlier data points caused by sensor obstruction or momentary communication interruptions. For data gaps caused by network jitter, a collaborative interpolation method based on adjacent time points and adjacent light pole data is used to fill in the gaps. Specifically, when the vehicle queue length data for a certain light pole is missing, a weighted estimate is made based on the queue data of upstream and downstream light poles at the same time and historical passage times. Simultaneously, the environmental visibility and light intensity data are smoothed and filtered according to the time series to eliminate the interference of instantaneous fluctuations on subsequent congestion identification, forming a continuous, complete, and reliable input dataset.
[0015] S12. Based on the preliminary traffic and environmental dataset, and combined with the current, voltage, and power consumption data of each traffic light fixture synchronously collected by the single-lamp controller, a monitoring channel for equipment operation status is constructed. The collected current and voltage values are compared with preset threshold ranges. If any value exceeds the threshold range, the equipment's operation data is automatically marked as abnormal, generating a preliminary judgment result of equipment malfunction. This step, through the refined detection of the single-lamp controller, achieves a dimensional expansion from intersection-level equipment management to light group-level status awareness, providing a data foundation for preventative maintenance. Furthermore, the detection circuit built into the single-lamp controller continuously collects the current and voltage waveform data of the traffic light fixtures at a millisecond-level sampling frequency. After converting the analog signals into digital signals through an analog-to-digital converter, a fast Fourier transform is used to extract the fundamental component and each harmonic component. When the current harmonic distortion rate exceeds a preset threshold, it is determined that there is a hidden fault in the lamp drive circuit. When the instantaneous power calculated by multiplying the collected current effective value and voltage effective value deviates from the standard power model of the same model of lamp under the same temperature conditions by more than 15%, the aging degree of the lamp or the poor contact of the power supply line is comprehensively judged by combining the phase angle analysis of the current and voltage. The above abnormal judgment results, together with the corresponding lamp number, occurrence time, and abnormality type, constitute the equipment operation abnormality record, providing data support for subsequent fault confirmation.
[0016] S13. Based on the preliminary judgment results of equipment operation anomalies generated in S12, the active alarm signals received by the one-button alarm device are integrated to construct a spatiotemporal correlation analysis model. If the occurrence time of the active alarm signal overlaps with the time window of the equipment anomaly marker and their spatial locations are correlated, the anomaly state is reconfirmed to comprehensively determine the probability of equipment operation failure and generate a fault priority label based on the fault type and historical fault data. This step, through a human-machine collaborative anomaly confirmation mechanism, effectively reduces the false alarm rate of a single sensor, improves the accuracy and reliability of fault judgment, and provides accurate input information for the intelligent work order alarm center. Furthermore, the spatiotemporal correlation analysis model employs a bidirectional sliding time window mechanism. Centered on the time t0 when the one-click alarm device receives the active alarm signal, a detection interval is formed by extending the time window forward by Δt1 and backward by Δt2. Within this interval, the device anomaly markers reported under the same smart light pole number are retrieved. If a matching anomaly marker is found, historical video clips from the light pole's AI camera within the time interval [t0-Δt1, t0+Δt2] are called. The target detection algorithm identifies visual features of device malfunction in the alarm scene, including traffic light lights going out, abnormal brightness flickering, light pole tilting, or equipment emitting smoke. The video recognition results are cross-validated with the device anomaly markers. If they match, the fault status is confirmed. Based on factors such as the historical frequency of the fault type, the number of intersections affected, and whether it involves main roads, the hierarchical analysis method is used to calculate the fault priority label, classifying the label into three levels: high-risk, critical, and general. The high-risk level triggers immediate dispatch response.
[0017] S14. The traffic and environment dataset, equipment operation data, and fault priority labels are fused from multiple sources to form a comprehensive state vector. This vector is then classified using a support vector machine algorithm to obtain a comprehensive evaluation category for the intersection's traffic conditions and equipment status. Based on this comprehensive evaluation category, and considering the distribution of vehicle queue length and instantaneous speed in the traffic situation, if the queue length exceeds a preset threshold and the instantaneous speed is below a preset range, a traffic congestion warning signal is generated, and the necessary traffic control parameters are determined. Simultaneously, through correlation analysis between the traffic congestion warning signal and environmental visibility and light intensity data, if environmental visibility is below a preset standard and light intensity is insufficient, the brightness parameters of the traffic lights are dynamically adjusted, generating optimized lighting control instructions. Furthermore, based on the fault priority labels in the equipment operation data, if the fault priority label indicates a high-risk state, an emergency maintenance request is transmitted to the cloud data processing and optimization center, and corresponding dispatch instructions are generated, providing complete input state information for the subsequent spatiotemporal feature extraction and congestion prediction in step S2.
[0018] Furthermore, the comprehensive state vector is constructed using a multimodal feature concatenation method, connecting the normalized traffic feature vector, equipment feature vector, and fault label vector to form a fixed-dimensional input. The support vector machine algorithm uses a radial basis function kernel, optimizing the penalty parameter and kernel function parameter through grid search to perform nonlinear classification of the comprehensive state vector, outputting four levels of intersection traffic conditions and four levels of equipment conditions, forming 16 comprehensive evaluation categories. The traffic congestion warning employs a dual-threshold hysteresis mechanism: a warning is activated when the vehicle queue length exceeds the first threshold for three consecutive cycles and the instantaneous vehicle speed is below the first threshold for three consecutive cycles; a warning is triggered when both exceed the second threshold simultaneously. Immediate intervention is initiated, transmitting congestion warning signals and necessary traffic control parameters to the data processing and optimization center. The traffic light brightness adjustment utilizes a fuzzy control rule base, classifying environmental visibility into low, medium, and high levels, and light intensity into nighttime, dusk / dawn, and daytime periods. A lookup table method is used to determine the traffic light brightness adjustment level, and brightness adjustment commands are issued through individual light controllers. Emergency maintenance requests include the faulty equipment number, fault type, fault priority label, and GPS coordinates. Dispatch instructions are dynamically allocated based on the distance between the maintenance personnel's current location and the fault point, as well as the personnel's skill matching, ensuring that high-risk faults are prioritized.
[0019] Specifically, by collecting traffic and environmental data through multimodal sensing devices deployed on smart light poles, and combining the refined detection of single-lamp controllers with the spatiotemporal correlation verification of one-click alarms, accurate perception and fault self-diagnosis of all elements of people, vehicles, roads, environment and equipment are achieved, generating comprehensive state vectors and priority warnings, thereby providing reliable input for dynamic timing optimization and intelligent operation and maintenance.
[0020] S2. The data processing and optimization center uses a convolutional neural network to extract features from the comprehensive state vector in terms of time and space dimensions, identify the current traffic congestion level at the intersection, predict the short-term traffic trend in the next 1 to 3 signal cycles, and generate state prior information.
[0021] Furthermore, step S2 specifically includes: S21. The data processing and optimization center receives the comprehensive state vector and inputs it into a pre-built convolutional neural network model. Through the stacking of convolutional layers and pooling layers, the comprehensive state vector is subjected to temporal feature extraction in the time dimension and distribution feature extraction in the spatial dimension to identify the congestion level of the current intersection and generate a preliminary congestion judgment result. Among them, the time dimension features include the temporal change rate of vehicle queue length and the fluctuation range of instantaneous vehicle speed, while the spatial dimension features include the balance of queue distribution at each entrance lane and the distribution pattern of non-motorized vehicle and pedestrian gathering areas. Furthermore, the convolutional neural network model adopts a two-dimensional convolutional kernel structure, organizing the comprehensive state vector into a two-dimensional matrix of time step × spatial position. The time dimension takes the time-series data of the past 5 signal cycles, with 4 time points sampled in each cycle. The spatial dimension covers the 4 entrance directions and 8 lane groups of the intersection. The convolutional layer uses a 3×3 convolutional kernel for feature extraction, and introduces nonlinearity through the ReLU activation function. The pooling layer uses max pooling to retain significant features. After 3 layers of convolutional pooling are stacked alternately, the probability distribution of congestion level is output through a fully connected layer. The temporal change rate of vehicle queue length is calculated using a linear regression method to calculate the slope value of the past 5 cycles. The fluctuation amplitude of instantaneous vehicle speed is quantified by the ratio of standard deviation to mean. The balance of queue distribution in each entrance lane is measured by the Gini coefficient. The distribution pattern of non-motorized vehicle and pedestrian gathering areas is analyzed by spatial clustering using heat maps output by AI cameras.
[0022] S22. Based on the preliminary congestion assessment results generated in S21, and combining historical time-series data with current time-series characteristics, a fusion model of long short-term memory network and convolutional neural network is used to predict short-term traffic trends within the next 1 to 3 signal cycles, obtain traffic fluctuation data, and determine traffic flow direction. If the traffic trend indicates that the congestion will worsen, the system will activate a pre-adjustment simulation mechanism to dynamically adjust the signal cycle parameters, generate adjusted signal cycle data, and simulate the traffic distribution under the adjustment scheme through a traffic simulation model to determine whether it can alleviate intersection congestion pressure. Furthermore, the fusion model of the Long Short-Term Memory Network (LSTM) and the Convolutional Neural Network (CNN) adopts a parallel architecture. The CNN branch is used to extract spatial features, and the LSTM branch is used to extract temporal dependencies. The output feature vectors of the two branches are concatenated and input into the fully connected layer for traffic flow prediction. The input of the CNN branch is the queue distribution matrix of each approach lane in the current cycle, and the input of the LSTM branch is the historical traffic flow sequence of the past 12 cycles. The pre-adjustment simulation mechanism adopts an online simulation module based on the SUMO microscopic traffic simulation model. The adjusted signal cycle parameters are input into the simulation model. With the current measured traffic flow as the initial condition, the simulation runs for 3 signal cycles and outputs the queue length change curve of each approach lane. If the simulation results show that the maximum queue length has decreased by more than 20% compared with before the adjustment and the average number of stops has decreased by more than 15%, the adjustment scheme is deemed effective; otherwise, the scheme is abandoned and a second adjustment is triggered.
[0023] SUMO (Simulation of Urban MObility) is a type of urban traffic simulation software.
[0024] S23. Based on the traffic distribution under the simulation, the updated state vector data is extracted and input into the convolutional neural network again for secondary feature extraction to obtain a more accurate congestion prediction result. This secondary feature extraction process introduces the feedback data after simulation adjustment, and gradually reduces the prediction error through the iterative optimization mechanism to improve the model's adaptability to complex traffic scenarios. For the congestion prediction result obtained from the secondary feature extraction, the corresponding state prior data is generated to determine the direction of traffic control in future signal cycles. Furthermore, the input data for the secondary feature extraction includes three parts: the original comprehensive state vector, the traffic distribution data after the S22 simulation adjustment, and the comparative difference feature vector before and after the simulation adjustment. The iterative optimization mechanism adopts a residual learning structure, which connects the features extracted by the first convolutional neural network with the features extracted by the second convolutional neural network using residual connections. The network weights are updated by backpropagating the gradient of the prediction error. After each round of simulation adjustment, the root mean square error between the predicted value and the actual simulation value is calculated. The iteration stops when the error decreases for three consecutive rounds or reaches the preset accuracy threshold. The generated state prior data includes the predicted number of arriving vehicles, the predicted peak queue length, and the recommended traffic light timing adjustment direction for each phase in the next three cycles. The adjustment direction is represented by three levels: extension, shortening, and maintenance, and is accompanied by a confidence score.
[0025] S24. By continuously updating the state prior data, real-time intersection congestion feedback data is obtained, and a dynamically updated state prior information database is constructed. This database not only includes the current intersection congestion level and traffic trend, but also integrates the equipment health status data collected by the single-lamp controller in S1, traffic event data identified by the AI camera, and the collaborative status data of adjacent intersections. Based on the updated state prior information, the system determines the optimization direction of subsequent signal cycles and feeds the optimization results back to the convolutional neural network model for online fine-tuning, realizing the adaptive evolution of the feature extraction model. At the same time, this state prior information serves as the input prior for the PPO reinforcement learning agent in step S3, forming a complete data link from perception to decision-making.
[0026] The aforementioned PPO (Proximal Policy Optimization) is a type of proximal policy optimization.
[0027] Furthermore, the dynamically updated state prior information database adopts a time-series database architecture, storing the state prior data for each period using intersection number and timestamp as a composite primary key. The data retention period is 30 days, and a time-partitioned index is established to support fast querying. The equipment health status data includes the cumulative running time, recent failure count, and remaining life prediction value of each traffic light reported by the single-lamp controller. The traffic event data identified by the AI camera includes event types such as traffic accidents, illegal parking, and pedestrians running red lights, as well as the occurrence time and duration. The collaborative status data of adjacent intersections includes the green light phase difference between upstream and downstream intersections, the current period timing scheme, and queue overflow warning signs. The online fine-tuning adopts an incremental learning method. After accumulating 24 hours of actual operating data, the deviation between the actual congestion situation and the predicted congestion situation is used as the loss function to update the parameters of the last fully connected layer of the convolutional neural network model. At the same time, elastic weight consolidation technology is used to prevent catastrophic forgetting, ensuring that the model can adapt to new scenarios while maintaining the ability to recognize historical scenarios.
[0028] Specifically, by extracting the spatiotemporal features of the comprehensive state vector through a fusion model of convolutional neural networks and long short-term memory networks, and combining it with a pre-adjustment simulation mechanism to predict short-term traffic trends and the effectiveness of optimization schemes, the problem of traditional signal control lacking foresight and being unable to predict traffic flow changes, resulting in lag in timing adjustments, is solved. This enables accurate prediction and proactive intervention for the next 1 to 3 cycles, improving congestion response speed and the foresight of timing decisions.
[0029] S3. When the congestion level exceeds the preset threshold, the data processing and optimization center calls the near-end strategy optimization reinforcement learning agent, using the comprehensive state vector as the state space, the green light duration adjustment amount of each phase as the action space, and the composite traffic efficiency index as the reward function, to simulate multiple rounds of signal adjustment scenarios for strategy iteration, and generate an optimized green light duration sequence.
[0030] Furthermore, step S3 specifically includes: S31. By using prior state information, the congestion level of the current intersection is monitored in real time to determine whether it exceeds a preset threshold and generate a preliminary congestion judgment result. If the congestion level exceeds the preset threshold, the judgment result is transmitted to the data processing and optimization center to trigger the dynamic timing optimization process. The data processing and optimization center obtains the comprehensive state vector generated by S1 to form a complete digital description of the current traffic situation. This triggering mechanism ensures that the PPO reinforcement learning agent is only called when necessary, avoiding unnecessary consumption of computing resources. Furthermore, the preset threshold adopts a dynamic threshold mechanism rather than a fixed value. Specifically, the system calculates the 75th percentile of the congestion index as a baseline threshold based on the congestion level distribution of historical data (such as the same time period of the previous week or the same week type of the previous month), and dynamically adjusts it in combination with the characteristics of the current time period (morning peak, evening peak, off-peak, holidays). When the congestion level exceeds the dynamic threshold for two consecutive monitoring cycles (each cycle is the signal cycle length), the system determines it as persistent congestion and triggers the PPO optimization process to avoid frequent triggering due to instantaneous fluctuations. The complete digital description of the current traffic conditions includes the real-time queue length matrix of each approach lane of the intersection, the current green light duration of each phase, the temporal distribution of the number of pedestrians waiting, the non-motorized vehicle flow density map, and the equipment health score reported by the single-lamp controller. The above data is encapsulated in a unified data structure and then passed into the input interface of the PPO reinforcement learning agent.
[0031] S32. Based on a complete digital description of the current traffic conditions, the data processing and optimization center calls a pre-trained near-end policy optimization reinforcement learning agent. This agent captures the holographic traffic state of the current intersection using a comprehensive state vector as the state space, defines the executable timing adjustment range using the green light duration adjustment for each phase as the action space, and uses a composite traffic efficiency index as the reward function. This composite index includes a weighted combination of the reduction in total intersection delay, the total number of passing vehicles, and the number of stops. The agent iterates its strategy by simulating multiple rounds of signal adjustment scenarios. In each iteration, a pruning strategy update amplitude mechanism is used to limit the range of strategy changes, ensuring the stability and convergence of the optimization process, and generating an optimized green light duration sequence. Furthermore, the pre-trained PPO reinforcement learning agent adopts an Actor-Critic network architecture, where the Actor network is responsible for outputting action policies and the Critic network is responsible for evaluating state values. Both networks use a three-layer fully connected neural network with 256, 128, and 64 hidden layer neurons, respectively. The activation function is ReLU (Rectified Linear Unit), a type of rectified linear unit. The state space is 128-dimensional, containing 28 normalized traffic features from the comprehensive state vector generated in S1, 12 predicted features from the prior state information generated in S2, and 8 phase features of the current timing scheme. The action space is defined as the continuous adjustment of the green light duration for each phase, ranging from -8 seconds to +8 seconds, with a step size of 1 second. A maximum of two phase durations can be adjusted within a single cycle to ensure the stability of the timing scheme. The weighting coefficients of the reward function are dynamically adjusted according to the optimization objective. During peak hours, the weight of the total number of vehicles passing through is increased (set to 0.5); during off-peak hours, the weight of the reduction in total intersection delay is increased (set to 0.6); and during periods of high pedestrian traffic, the weight of the number of stops is increased (set to 0.4). The optimal combination of weight coefficients is determined during the offline training phase using a grid search method. The pruning strategy update amplitude mechanism sets the pruning parameter ε=0.2, meaning the probability ratio before and after the strategy update is limited to the interval [0.8, 1.2]. At the same time, the generalized advantage estimation is used to calculate the advantage function, with a discount factor γ=0.99 and a GAE (Generalized Advantage Estimation) parameter λ=0.95. The GAE is a generalized advantage estimation to ensure the smoothness of the strategy iteration process. The strategy iteration adopts a multi-round parallel simulation method, generating 32 parallel simulation scenarios in each round. Each scenario simulates the traffic evolution process of the next 5 signal cycles, and the cumulative iteration is 100 rounds or until the reward function converges.
[0032] S33. Based on the optimized green light duration sequence, output the final signal control command, clarifying the green light duration allocation scheme for each phase; this signal control command is transmitted to the individual light controller at the intersection via a wired or wireless communication network, and the individual light controller executes the specific traffic light lighting and extinguishing operations; after execution, the individual light controller collects execution feedback data through its built-in detection circuit, including changes in current, voltage, and power consumption, and sends the feedback data back to the data processing and optimization center to determine whether the command was correctly issued to the device and whether the execution effect meets expectations, thus constructing a complete closed loop of decision-making-execution-verification.
[0033] Furthermore, the signal control command adopts a protocol encapsulation format, including a command type identifier (0x01 indicates green light duration adjustment), intersection number, phase number, target green light duration, execution timestamp, and checksum. It is transmitted to the target single-lamp controller via the MQTT protocol through a 4G / 5G communication module. Communication uses TLS encryption to ensure the security of command transmission. Upon receiving the command, the single-lamp controller first verifies the checksum. After confirming the command is complete and error-free, it writes the target green light duration into its local configuration register and executes the new timing scheme at the start of the next signal cycle. After execution, the built-in detection circuit collects the current and voltage waveforms during the execution of that phase. The waveform is compared with the expected standard waveform to calculate the waveform similarity. If the similarity is less than 90%, the execution is deemed abnormal, and the abnormality type (such as current overload, voltage drop, or switching device failure) is encoded and sent back. If the similarity is greater than 95%, the execution is deemed successful, and confirmation information and the start and end timestamps of the actual execution green light are sent back. After receiving the feedback data, the data processing and optimization center writes the execution result into the execution log database. At the same time, the execution deviation value (the difference between the actual execution time and the instruction time) is used as a correction term for the reward function to adjust the confidence of subsequent PPO strategy iterations, forming a complete closed-loop control link from decision generation, instruction issuance, equipment execution to effect verification.
[0034] TLS (Transport Layer Security) is a transport layer security protocol.
[0035] Specifically, by combining a dynamic threshold triggering mechanism with a near-end strategy optimization reinforcement learning agent, a pre-trained model is invoked to perform multiple rounds of parallel simulation and pruning strategy iteration on the holographic state of the intersection when congestion occurs, generating an optimized green light duration sequence. The single-lamp controller executes feedback and deviation correction to form a decision-execution-verification closed loop, which solves the problems of lagging, lack of adaptive capability, and untraceable execution effect of traditional signal timing adjustment. It achieves accurate dynamic optimization, stable convergence, and closed-loop guarantee of execution credibility of the timing scheme.
[0036] S4. The regional coordination unit interacts with the intelligent traffic light control system of adjacent intersections to obtain the traffic status and timing scheme of adjacent intersections. If it is determined that there is an imbalance in regional traffic flow, the signal cycle parameters and phase difference are iteratively updated through a genetic algorithm to generate a synchronization scheme for the intersection group and realize the coordinated control of green wave on the trunk line.
[0037] Furthermore, step S4 specifically includes: S41. The regional coordination unit establishes a communication connection with the intelligent traffic light control system of adjacent intersections through wired or wireless communication networks, obtains traffic status data and current timing scheme information of adjacent intersections, and completes initial data collection; based on the obtained traffic status data, it analyzes the traffic flow distribution between intersections; if uneven traffic flow distribution is detected, it triggers the optimization process to determine the range of intersection groups that need to be adjusted, providing target areas for subsequent collaborative optimization. Furthermore, the regional coordination unit and the intelligent traffic light control system at adjacent intersections use the MQTT protocol for data interaction, with a communication cycle of 3 seconds. The data includes the queue length, average vehicle speed, signal cycle duration, green light duration for each phase, and remaining time for the current phase at each entrance lane of the adjacent intersection. Data synchronization adopts a master-slave mechanism, using the clock of the main intersection (intersection with high traffic volume) as a reference, and achieving microsecond-level time synchronization through the NTP protocol to ensure the accuracy of phase difference calculation. Uneven traffic distribution is detected using a sliding window analysis method. When the coefficient of variation of the upstream and downstream traffic ratio of three adjacent intersections exceeds 0.3 within five consecutive cycles, it is determined that the traffic distribution is uneven. At this time, the system automatically identifies traffic congestion intersections and sparse traffic intersections, and expands the intersection group range by 1 to 2 intersections upstream and downstream from the traffic congestion intersection. The group size is dynamically determined according to the traffic flow propagation distance, and is generally controlled within the range of 3 to 5 intersections to ensure a balance between optimization calculation efficiency and coordination effect.
[0038] MQTT (Message Queuing Telemetry Transport) is a message queue telemetry transport protocol, and NTP (Network Time Protocol) is a network time protocol.
[0039] S42. Based on the determined intersection group, extract the signal cycle and phase difference of each intersection as optimization variables, and use a genetic algorithm for parameter iterative optimization. With the optimization objectives of minimizing the total regional delay and maximizing the green wave bandwidth, a preliminary combination of synchronization control parameters is generated through selection, crossover, and mutation operations. Based on the optimized combination of synchronization control parameters, a timing adjustment scheme for the intersection group is generated, focusing on the impact of the phase difference on green wave coordination, and the final signal timing scheme is determined to achieve trunk green wave coordination control. Furthermore, the genetic algorithm employs a real-number encoding method, encoding the signal cycle length (range 40 to 180 seconds) of each intersection and the phase difference (range -180 to 180 seconds) of each intersection relative to the reference intersection into chromosomes. The population size is set to 50, and the maximum number of iterations is 200 generations. The fitness function F adopts a weighted summation method, defined as F = w1 × (1 - D / D_max) + w2 × (B / B_max), where D is the total regional delay time, calculated through the SUMO simulation model, D_max is the preset maximum allowable delay value, B is the green wave bandwidth (i.e., the width of the time window for the convoy to continuously pass through multiple intersections), and B_max is the theoretical maximum bandwidth. The weight coefficients w1 and w2 are dynamically adjusted according to the coordination objective. During trunk line coordination, w2... The parameters are set to 0.6 and w1 to 0.4; the selection operation adopts a tournament selection mechanism with a tournament size of 3; the crossover operation adopts simulated binary crossover with a crossover probability of 0.9; the mutation operation adopts polynomial mutation with a mutation probability of 0.1; during the optimization process, for the generated combination of synchronization control parameters, the operation status of each intersection in the group under the parameter combination is simulated by a traffic simulation model, the total regional delay and green wave bandwidth are calculated, and the iteration continues until the fitness function converges or the maximum number of iterations is reached, and the optimal combination of synchronization control parameters is output; when determining the phase difference value in the final timing scheme, the two-way green wave coordination technology is adopted, with the goal of minimizing the average delay of the two-way traffic flow on the trunk line, and the phase difference compromise value that can obtain better traffic effect in both the up and down directions is calculated by the coordinated phase difference optimization algorithm.
[0040] S43. After obtaining the final signal timing scheme, the regional coordination unit issues adjustment instructions to the intelligent traffic light control system at each intersection, and updates the signal cycle and phase difference value synchronously to achieve coordinated control of the intersection group. Based on the real-time equipment execution feedback data collected by the single light controller and the traffic status data monitored by the AI camera, the green wave coordination effect is periodically analyzed. If it is found that the traffic distribution is uneven again or the coordination effect decreases, the optimization process of S41 is triggered again to dynamically adjust the synchronous control parameters and form a closed-loop optimization mechanism for regional collaboration.
[0041] Furthermore, the adjustment instructions employ a tiered issuance mechanism. The regional coordination unit first sends the target signal cycle and target phase difference to the data processing and optimization centers at each intersection. Each intersection center then locally completes the transition between the current timing scheme and the target scheme. The transition period is set to three signal cycles, during which the phase difference is gradually adjusted to avoid traffic disruptions caused by sudden parameter changes. After the instructions are issued, the regional coordination unit monitors the response status of the control systems at each intersection through a heartbeat mechanism. If any intersection does not return a confirmation signal within 5 seconds, a retransmission mechanism is initiated. After three consecutive failed retransmissions, the intersection is marked as having communication anomalies and temporarily excluded from coordinated control. The green wave coordination effect evaluation cycle is set to 15 seconds. The evaluation metrics include the average number of stops at each intersection within the group, the average travel speed on main roads, and the green wave bandwidth retention rate. The green wave bandwidth retention rate is defined as the ratio of the current actual green wave bandwidth to the optimized target bandwidth. When the average travel speed at any key intersection within the group decreases by more than 15% from the initial optimization value or the green wave bandwidth retention rate falls below 70%, the coordination effect is deemed to have declined, and the system automatically triggers the S41 optimization process. Simultaneously, the system stores the coordination effect evaluation data for each instance into a historical database to build a knowledge base of optimal coordination parameters for different time periods and weather conditions. In subsequent optimizations, the historical optimal parameters for similar scenarios can be used as the initial population for the genetic algorithm to improve optimization efficiency.
[0042] Specifically, by interacting with the control system of adjacent intersections through regional coordination units, the uneven distribution of traffic flow is detected by sliding window analysis, and the signal period and phase difference are iteratively optimized by genetic algorithm. This solves the problem of low traffic efficiency on trunk lines caused by isolated operation of intersections and lack of regional coordination in traditional signal control. It realizes green wave coordinated control and dynamic adaptive adjustment of intersection groups, which significantly improves the average travel speed of trunk road sections and reduces the number of vehicle stops.
[0043] S5. Based on the three built-in working modes of vehicle priority, pedestrian priority, and normal time period, and on the basis of the intersection group synchronization scheme or the optimized green light duration sequence, it monitors the number of pedestrians waiting and the waiting time in real time. When the number of pedestrians waiting exceeds the set threshold or the waiting time exceeds the maximum tolerance time, it integrates historical pedestrian waiting time data into the decision through feedback loop, dynamically adjusts the pedestrian crossing green light duration, and generates an enhanced safety priority timing scheme.
[0044] Furthermore, step S5 specifically includes: S51. By using AI cameras deployed on smart light poles to collect real-time data on the number and duration of pedestrians waiting at intersections, and combining this data with vehicle flow information, a comprehensive traffic status dataset is formed to determine the current traffic demand at the intersection. If the number of pedestrians waiting exceeds a preset threshold, a pedestrian priority mode is triggered, and the current data is compared with historical data to determine whether the pedestrian green light duration needs to be adjusted. Furthermore, the AI camera employs a deep learning object detection algorithm, using human key point detection technology to distinguish between pedestrians who are waiting and those who are briefly stopping. It only counts pedestrians who remain continuously within the designated waiting area marked by the curb for more than 3 seconds, excluding those passing through or observing, ensuring the accuracy of pedestrian waiting count statistics. Pedestrian waiting time is tracked using queue tracking technology, assigning a unique identifier to each pedestrian entering the waiting area and recording their entry time. When a pedestrian's waiting time exceeds a preset threshold, an individual alarm is triggered. The preset threshold uses a dual-threshold mechanism, with a base threshold of 10 people suitable for normal time periods. The number of people waiting during peak hours is dynamically increased to 15, and decreased to 8 during nighttime hours. The threshold adjustment is adaptively corrected based on the ambient light intensity collected by S1 and the historical pedestrian traffic data for the same period. The historical data comparison uses a time series similarity algorithm to extract the distribution of pedestrian waiting numbers in the same time period (within 30 minutes before and after) in the past 7 days, calculate the deviation rate between the current waiting number and the historical median. If the deviation rate exceeds 30% and the waiting number shows a continuous upward trend, an emergency adjustment request is triggered. At the same time, combined with the traffic trend prediction in the state prior information generated by S2, the peak duration of pedestrian demand is predicted to provide a basis for decision-making on the subsequent adjustment range.
[0045] S52. Based on the triggered pedestrian priority mode, extract pedestrian waiting time data. If the waiting time exceeds the preset maximum tolerance time, extract historical waiting time records through a feedback loop mechanism to obtain long-term trend information and construct a time-series distribution model of pedestrian waiting time. Use the support vector machine algorithm to classify the comprehensive traffic status dataset. Combine the requirements of pedestrian priority and vehicle priority modes, incorporate historical pedestrian waiting time data as a weight factor into the decision, dynamically calculate the adjustment range of pedestrian crossing green light duration, and obtain a preliminary green light duration allocation scheme for each mode. Furthermore, the feedback loop mechanism employs a sliding window memory structure to store waiting time data, adjusted green light durations, and dissipation effects from each trigger of the pedestrian priority mode over the past 30 days. The window size is 1000 records, and a first-in-first-out (FIFO) strategy is used for updating. Historical waiting time records are categorized and stored by time period (morning peak, evening peak, off-peak, night), weather conditions (sunny, rainy, foggy, snowy), and holiday type, constructing a multi-dimensional historical data index. The temporal distribution model of pedestrian waiting time adopts a hybrid model of Poisson and normal distributions. The parameters are fitted using the EM algorithm to calculate the cumulative probability density of the current waiting time. When the cumulative probability exceeds 85%, it is determined to be an abnormally high demand state. Support Vector Machine (SVM) algorithm is used. Input features include the ratio of current pedestrian waiting number to waiting time, the ratio of vehicle queue length to traffic capacity, time period type, weather conditions, and historical pedestrian dispersal efficiency for the same period. The output is a probability distribution for three categories: prioritizing pedestrian green light adjustment, maintaining the status quo, and prioritizing vehicle passage. The weighting factor is dynamically calculated using the ratio of the historical median waiting time to the current waiting time, combined with the multiple by which the current waiting time exceeds the historical maximum. The adjustment range is controlled between the minimum adjustment unit (3 seconds) and the maximum allowable adjustment range (15 seconds), with an adjustment step size of 3 seconds. The adjusted vehicle delay increment is verified through a traffic simulation model. If the delay increment exceeds the preset upper limit, the adjustment range is adjusted back to ensure a balance between pedestrian priority and vehicle efficiency.
[0046] S53. Coordinate and optimize the initial green light duration allocation scheme through the intersection group synchronization scheme, obtain the signal linkage relationship between adjacent intersections, and ensure that the adjustment of pedestrian crossing green light does not affect the coordination effect of the green wave on the main road; dynamically update the green light duration data of each intersection according to the final timing scheme, and generate an enhanced timing scheme with safety priority; record the timing scheme and corresponding traffic status data after each adjustment through system logs, obtain feedback information on the adjustment effect, determine the historical data basis for subsequent decisions, and form a continuous optimization closed loop of the pedestrian priority mode.
[0047] Furthermore, the coordination optimization employs phase difference fine-tuning technology. When a single intersection needs to extend the pedestrian crossing green light duration due to pedestrian priority requirements, the system first checks whether the phase adjustment at that intersection will disrupt the preset green wave phase difference with upstream and downstream intersections. If the impact is within the allowable range (phase difference deviation not exceeding 3 seconds), compensation is made by compressing the green light time for vehicles in the same phase or adjusting the start time of the next phase to maintain the stability of the trunk green wave. If the impact exceeds the allowable range, the adjustment request is reported to the regional coordination unit, which then uniformly adjusts the phase difference of all intersections within the group. The group coordination parameters are recalculated using a genetic algorithm to generate a compromise solution that balances pedestrian needs and green wave effectiveness. In addition to adjusting the green light duration, the enhanced safety priority timing scheme also simultaneously triggers the LED display screen and broadcasting equipment on the smart light pole to announce the pedestrian crossing green light. The extended warning message, displayed on an LED screen with a countdown, guides pedestrians to cross the street in an orderly manner and avoids rushing. The system log uses a structured storage format to record the adjustment trigger time, the timing scheme before and after the adjustment, the change curve of the number of pedestrians waiting after the adjustment, the change value of vehicle delay, and the execution feedback data collected by the single-lamp controller (including the actual execution time of the pedestrian crossing phase and the change of current waveform). The log data is stored in time partitions and retained for 180 days. The feedback information of the adjustment effect is evaluated by calculating the improvement rate of the average waiting time of pedestrians, the maximum waiting time, and the number of pedestrian illegal crossing events (identified by AI camera) for three consecutive cycles before and after the adjustment. The evaluation results are stored in the historical database as a reference for decision-making in similar scenarios in the future, forming a continuous improvement closed loop of perception-decision-execution-evaluation-optimization.
[0048] Specifically, by accurately collecting the number and duration of pedestrian waiting through AI cameras, and combining historical feedback loops and support vector machine classification decisions, the green light duration for pedestrian crossings is dynamically adjusted within the framework of regional green wave coordination. This solves the problems of excessively long pedestrian waiting times and safety hazards caused by insufficient response to the needs of vulnerable groups in traditional signal control. It achieves synergistic optimization of pedestrian priority and arterial road traffic efficiency, significantly improving the human-centered service level and safety of the transportation system.
[0049] S6, the intelligent transportation big data management platform obtains real-time execution logs, including timing execution status, equipment status and traffic incident handling records. The platform's built-in intelligent work order alarm center executes a closed-loop work order process of alarm, dispatch, order acceptance, repair, review and completion for equipment abnormalities and traffic incidents identified by AI. At the same time, it feeds back the execution effect data to the data processing and optimization center for continuous iterative training of reinforcement learning agents to achieve long-term optimization of the model.
[0050] Furthermore, step S6 specifically includes: S61. The intelligent transportation big data management platform acquires execution log data from AI cameras, single-lamp controllers, and traffic light control devices deployed on smart light poles through a real-time acquisition interface. This log data includes timing execution status, equipment operating status, and traffic event handling records, forming a preliminary operating status dataset. For the operating status dataset, pre-established anomaly detection rules are used to compare the current, voltage, and power consumption data reported by the single-lamp controllers with preset thresholds. At the same time, combined with traffic events identified by the AI cameras, if the equipment status deviates from the preset threshold or the traffic event triggers the alarm condition, the anomaly category and priority are determined, and anomaly detection results are generated. Furthermore, the real-time acquisition interface adopts a message queue middleware architecture, using Kafka as the data bus. Three independent topics are set up to receive timing execution data, device status data, and traffic event data respectively. The data acquisition frequency is set to once every 5 seconds. The current, voltage, and power consumption data reported by the single-lamp controller are transmitted using Protobuf serialization format. The data fields include device ID, timestamp, three-phase current value, three-phase voltage value, active power, reactive power, and device temperature. The anomaly detection rules adopt a three-layer progressive architecture: the first layer is a threshold detection layer, setting three threshold levels for current, voltage, and power consumption (yellow warning threshold is ±10% of the rated value, orange warning threshold is ±20%, and red alarm threshold is ±30%). When the monitored value exceeds the threshold for three consecutive acquisition cycles, the corresponding level of anomaly is triggered. The second layer is a trend detection layer, using linear regression analysis to analyze the current and voltage change trends of the same device over the past 24 hours. If the absolute value of the slope exceeds the preset range and R... 2 If the value is greater than 0.7, it is judged as an abnormal trend of equipment performance degradation. The third layer is the correlation detection layer, which performs spatiotemporal correlation analysis on traffic events (such as traffic accidents, vehicle collisions with light poles, pedestrians pressing one-button alarm devices) identified by AI cameras and equipment anomalies. When the same light pole or adjacent light poles report equipment anomalies within 5 minutes before and after the occurrence of a traffic event, the anomaly priority is automatically raised by one level. The anomaly categories are classified and coded according to the equipment type (traffic lights, AI cameras, environmental detectors, broadcasts, LED displays) and the anomaly type (communication interruption, power supply anomaly, light fixture failure, sensor failure) to form a standardized anomaly classification system. The priority labels are divided into three levels: high risk, critical, and general. The high risk level corresponds to anomalies that affect the passage of main roads or involve public safety. After being triggered, dispatching must be completed within 30 seconds.
[0051] The Protobuf (Protocol Buffers) refers to protocol buffers (a serialization format); Kafka (Apache Kafka) refers to a distributed message queue system.
[0052] S62. Based on the anomaly detection results, activate the platform's built-in intelligent work order alarm center to automatically generate an intelligent work order. This work order includes a detailed description of the equipment anomaly or traffic incident, the anomaly category, priority label, and spatial location information to determine the urgency of the dispatch task. Through the closed-loop process management system, the intelligent work order is distributed to the corresponding processing unit according to the six processes of alarm, dispatch, order acceptance, repair, review, and completion. The execution status of the work order is tracked, and real-time progress information on order acceptance confirmation, repair progress, and review results is obtained to form a complete operation and maintenance closed loop. Furthermore, the intelligent work order alarm center adopts a collaborative architecture of rule engine and workflow engine. The rule engine is built on the Drools framework and has a built-in dispatch rule library. It matches and calculates based on the priority tag of the anomaly category, the geographical location of the equipment, the skill tag of the current on-duty maintenance personnel, and the load status to generate dispatch tasks. The work order structure adopts JSON format, including work order number (generated according to the rule of year, month, day + equipment type + serial number), anomaly description text, anomaly category code, priority level (high risk / critical / general), GPS coordinates (longitude, latitude), equipment code, smart light pole code, intersection name, administrative division, estimated processing time limit (high risk 30 minutes, critical 2 hours, general 24 hours), and attachment information (AI camera screenshot or video clip at the time of the anomaly, data waveform diagram reported by the single light controller). The urgency level is judged using a comprehensive scoring model, which calculates the urgency score E=w1×P+w2×L+w3×T, where P is the priority weight (high risk 1.0, critical 0). 6. The weighting is as follows: E = 0.3 (generally), L = 1.0 for main roads, 0.7 for secondary roads, and 0.4 for secondary roads), and T = 1.0 for time-related impact (peak hours, off-peak hours, and nighttime). Emergency dispatch is triggered when E ≥ 0.8. The closed-loop process management system uses a state machine model, defining the work order lifecycle states as: pending dispatch, dispatched, accepted, under maintenance, pending review, reviewed, and completed. Each state transition requires recording the operator, operation time, and operation notes, and pushes messages to the corresponding responsible person via a mobile app. After the maintenance personnel accept the order, the system automatically plans the optimal route from the maintenance personnel's current location to the fault point, provides navigation suggestions based on real-time traffic information, and simultaneously starts a timer to monitor the maintenance response time. The review process uses a dual-person review mechanism, where personnel with review authority review the before-and-after photos, maintenance records, and replacement material list. After the review is passed, the work order is transferred to the completed state, and the maintenance records and equipment health scores in the equipment ledger are updated.
[0053] S63. Extract execution effect data from the work order execution progress information, including fault repair time, equipment recovery status and traffic incident handling efficiency, and transmit it to the data processing and optimization center; use reinforcement learning agent to input the execution effect evaluation results as feedback signals into the iterative training module, adjust the state space weights and reward function parameters of the PPO reinforcement learning model, optimize the timing decision strategy and anomaly detection rules, and obtain the updated strategy configuration. Furthermore, the execution effect data extraction adopts an ETL data processing workflow, extracting the execution records of completed work orders from the work order database, and calculating key performance indicators using Python scripts, including Mean Time To Repair (MTTR), equipment availability, first-time repair success rate, and average response time for traffic incidents. The MTTR calculation starts from the dispatch time to the approval time, and is grouped and statistically analyzed by equipment type, anomaly category, and time period to generate baseline values for repair efficiency under different dimensions. After the equipment recovery status is approved by the work order, the current, voltage, and power consumption data reported by the single-lamp controller within 24 consecutive hours are compared with the standard baseline for verification. If all parameters are within the threshold range and there are no repeated alarms, the recovery is considered successful. Traffic incident handling efficiency is quantified using the time difference between the occurrence of the incident and its archiving. The archiving time is defined as the time when the event status in the AI event alarm center changes to "completed." The time frame is the primary factor. The iterative training module of the reinforcement learning agent adopts a combination of offline training and online fine-tuning. The offline training cycle is once a week, using the execution effect data of the past 7 days as training samples to incrementally learn the PPO model. Online fine-tuning is done on a daily basis, using the execution effect evaluation results of the day as reward signal correction terms to adjust the weight coefficients of the device health features in the state space. When the failure frequency of a certain type of device increases, its penalty term in the reward function is increased accordingly, so that the PPO model tends to avoid over-reliance on this type of device during timing optimization. The optimization of anomaly detection rules adopts the Apriori association rule mining algorithm to analyze the change patterns of operating parameters in the 24 hours before the occurrence of device anomalies, and discovers precursor features such as increased current harmonic distortion rate and continuous temperature rise. The identified precursor features are used as the threshold for new anomaly detection rules, realizing the evolution from passive alarm to active early warning.
[0054] The ETL (Extract, Transform, Load) refers to the data processing flow of extracting, transforming, and loading data.
[0055] S64. Through the updated strategy configuration, it is reloaded into the intelligent transportation big data management platform and data processing and optimization center, and applied to the subsequent real-time acquisition system and intelligent work order alarm center; the log data acquisition, anomaly detection, work order processing and effect feedback process from S61 to S63 are executed in a loop, forming a continuous improvement closed loop mechanism from data acquisition, intelligent decision-making, operation and maintenance execution to model optimization, so as to realize the long-term optimization and adaptive evolution of reinforcement learning agent.
[0056] Furthermore, the strategy configuration employs a hot-loading mechanism. Updated model parameters, rule bases, and threshold configurations are stored as configuration files in a distributed configuration center (such as Apollo). These are automatically pushed to each service node by listening for configuration change events, taking effect without requiring a service restart and ensuring system continuity. The strategy configuration includes three parts: the weight parameter file for the PPO reinforcement learning model (in .pth or .h5 format), the anomaly detection rule base (in .drl rule file format), and the device threshold configuration table (stored in a MySQL database, containing fields such as device model, parameter name, yellow threshold, orange threshold, red threshold, and trend detection slope threshold). Configuration updates utilize a canary release strategy, first piloting the update at a single intersection or in a single area, and then gradually rolling it out after 48 hours of continuous observation without any anomalies. Extending across the entire domain, during the pilot phase, A / B testing was used to compare equipment failure rates, work order processing efficiency, and signal timing effects under the old and new strategies to ensure the effectiveness of the updated strategies. The cyclic execution mechanism adopts an event-driven architecture, where newly generated execution log data triggers subsequent processing flows in real time through a message queue, forming a data closed loop with millisecond-level response. The continuous improvement closed-loop mechanism constructs an operation and maintenance data lake to uniformly store work order data, equipment status data, execution effect data, and model iteration records, supporting retrospective analysis of the model optimization process. Every 30 days, the system summarizes historical optimization records and generates a model evolution report, including performance comparisons of different model versions, key optimization points, and subsequent optimization suggestions, providing data support for continuous system upgrades and ultimately achieving end-to-end adaptive evolution from data collection, intelligent decision-making, operation and maintenance execution to model optimization.
[0057] Specifically, the system obtains execution logs through the intelligent transportation big data management platform and activates the intelligent work order alarm center. It achieves accurate response to equipment anomalies and traffic incidents through a three-layer progressive anomaly detection rule and a six-step closed-loop process. At the same time, it feeds back execution effect data such as fault repair time and equipment recovery status to the reinforcement learning agent for iterative training and strategy hot loading. This solves the problem of the traditional operation and maintenance and decision-making system being disconnected and unable to form a continuous optimization closed loop. It realizes a full-link adaptive closed loop from data collection, intelligent decision-making, operation and maintenance execution to model evolution, which significantly improves the system's operation and maintenance efficiency and the long-term optimization capability of the reinforcement learning model.
[0058] Example 2, please refer to Figure 4 This embodiment provides a holographic sensing traffic signal dynamic timing system based on near-end strategy optimization, applied to a holographic sensing traffic signal dynamic timing method based on near-end strategy optimization, including: A holographic perception and data fusion module, deployed on a smart light pole, includes an AI camera, an environmental detector, a one-button alarm device, and a single-lamp controller. The AI camera collects data on vehicle queue length, instantaneous vehicle speed, vehicle type classification, number of waiting pedestrians, and non-motorized vehicle traffic flow. The environmental detector collects data on environmental visibility and light intensity. The one-button alarm device receives active alarm signals. The single-lamp controller has a built-in detection circuit for collecting current, voltage, and power consumption data of the traffic light fixtures. The collected multi-source data is standardized, outlier removal is performed, and missing value imputation is performed. The data is then fused to form a comprehensive state vector and uploaded to a cloud-based data processing and optimization center. The feature extraction and congestion prediction module, built into the data processing and optimization center, receives the comprehensive state vector and uses a convolutional neural network to extract features from the comprehensive state vector in both time and spatial dimensions to identify the current congestion level at the intersection. It employs a fusion model of a long short-term memory network and a convolutional neural network to predict short-term traffic trends over the next 1 to 3 signal cycles, generating prior state information. Through a secondary feature extraction and iterative optimization mechanism, it outputs accurate prior state data, providing decision-making priors for dynamic timing optimization. The PPO dynamic timing optimization module is activated when the congestion level exceeds a dynamic threshold. It receives the comprehensive state vector and prior state information. The module calls a near-end policy optimization reinforcement learning agent, using the comprehensive state vector as the state space, the green light duration adjustment of each phase as the action space, and the composite traffic efficiency index as the reward function. It adopts an Actor-Critic network architecture and generates an optimized green light duration sequence through multiple rounds of parallel simulation policy iteration and pruning policy updates. The optimized sequence is then output to the control execution module. The regional collaborative control module communicates with the intelligent traffic light control system of adjacent intersections through the regional coordination unit to obtain the traffic status and timing scheme of adjacent intersections. It uses the sliding window analysis method to detect regional traffic imbalance, optimizes the signal period and phase difference with a genetic algorithm, and generates a group synchronization scheme for intersections with the goal of minimizing the total regional delay and maximizing the green wave bandwidth. It realizes the coordinated control of trunk green waves through a hierarchical distribution mechanism, periodically evaluates the coordination effect, and outputs the optimized collaborative parameters. The safety priority decision-making module has built-in vehicle priority mode, pedestrian priority mode, and regular time period mode. It receives data on the number of pedestrians waiting and the waiting time collected by AI cameras, uses human key point detection and queue tracking technology to count pedestrian demand, and triggers the pedestrian priority mode when the demand exceeds a preset threshold. It extracts historical waiting time data through feedback loop, uses support vector machine classification and hybrid distribution model to dynamically calculate the adjustment range of pedestrian crossing green light, generates an enhanced safety priority timing scheme, and links LED display screen and broadcast equipment to release prompt information. The intelligent operation and maintenance and model evolution module is part of the smart transportation big data management platform. It has a built-in intelligent work order alarm center to receive timing execution logs, equipment status data, and traffic incident handling records. It adopts a three-layer progressive anomaly detection rule to identify equipment anomalies and traffic incidents, automatically generates intelligent work orders, and executes a closed-loop operation and maintenance process according to six steps: alarm, dispatch, order acceptance, repair, review, and completion. It extracts execution effect data such as fault repair time and equipment recovery status, and feeds them back to the PPO reinforcement learning agent for continuous iterative training. Through hot loading and canary release mechanisms, it updates strategy configurations, forming a closed-loop adaptive evolution that extends from data collection, intelligent decision-making, operation and maintenance execution to model optimization.
[0059] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any brief modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.
Claims
1. A holographic sensing traffic signal dynamic timing method based on near-end strategy optimization, characterized in that: Includes the following steps: S1. Multimodal sensing devices deployed on smart light poles collect traffic data and device status data at intersections, integrate them to form a comprehensive status vector, and upload it to the cloud data processing and optimization center; The traffic data includes vehicle queue length, instantaneous vehicle speed, vehicle type classification, number of waiting pedestrians, non-motorized vehicle traffic flow, environmental visibility and light intensity; the equipment status data includes the current value, voltage value and power consumption data of traffic light fixtures. S2. The data processing and optimization center uses a fusion model of convolutional neural network and long short-term memory network to extract spatiotemporal features from the comprehensive state vector, identify the current intersection congestion level and predict short-term traffic trends, and generate state prior information. S3. When the congestion level exceeds the preset threshold, the data processing and optimization center calls the near-end strategy optimization reinforcement learning agent, using the comprehensive state vector as the state space, the green light duration adjustment amount of each phase as the action space, and the composite traffic efficiency index as the reward function to perform strategy iteration and generate the optimized green light duration sequence. S4. The regional coordination unit interacts with the intelligent traffic light control system of adjacent intersections, and iteratively updates the signal cycle parameters and phase difference through a genetic algorithm to generate a synchronization scheme for the intersection group and realize the coordinated control of the green wave on the trunk line. S5: Based on the three built-in working modes of vehicle priority, pedestrian priority, and normal time period, it monitors the number of pedestrians waiting and the waiting time in real time, dynamically adjusts the green light duration for pedestrian crossing, and generates an enhanced safety priority timing scheme. S6, the intelligent transportation big data management platform obtains real-time execution logs. Its built-in intelligent work order alarm center executes closed-loop work order processes for equipment abnormalities and traffic incidents, and feeds back the execution effect data to the data processing and optimization center for continuous iterative training of reinforcement learning agents.
2. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 1, characterized in that: Step S1 specifically includes: S11. Traffic and environmental data are collected in real time by AI cameras and environmental detectors deployed on smart light poles. Data is then standardized and formatted and timestamped to obtain a preliminary traffic and environmental dataset. S12. Combine the current, voltage and power consumption data of each traffic light fixture collected synchronously by the single-lamp controller to build a monitoring channel for equipment operation status. Compare the collected data with the preset threshold. If the threshold is exceeded, mark the abnormal status and generate a preliminary judgment result of abnormal equipment operation. S13. Integrate the active alarm signals received by the one-button alarm device, construct a spatiotemporal correlation analysis model, conduct secondary confirmation of abnormal states, comprehensively determine the possibility of equipment malfunction and generate fault priority labels. S14. The traffic and environment dataset, equipment operation data, and fault priority labels are fused into a comprehensive state vector. A classification algorithm is used to process the data to obtain a comprehensive evaluation category of the intersection traffic conditions and equipment conditions. Based on this, traffic congestion warning signals, lighting control instructions, and emergency maintenance requests are generated.
3. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 1, characterized in that: Step S2 specifically includes: S21. Input the comprehensive state vector into the convolutional neural network model. Through the stacking of convolutional layers and pooling layers, extract the temporal features in the time dimension and the distribution features in the spatial dimension to identify the congestion level of the current intersection. S22. Combining historical time series data with the characteristics of the current time period, a fusion model of long short-term memory network and convolutional neural network is adopted to predict the short-term traffic trend in the next 1 to 3 signal cycles. If the traffic trend shows that the congestion will intensify, the pre-adjustment simulation mechanism is activated to dynamically adjust the signal cycle parameters, and the traffic distribution under the adjustment plan is simulated through a traffic simulation model. S23. Based on the traffic distribution under the simulation operation, extract the updated state vector data for secondary feature extraction to obtain the congestion prediction result and generate the corresponding state prior data. S24. By continuously updating the state prior data, a dynamically updated state prior information database is constructed, which integrates equipment health status data, traffic event data, and collaborative status data of adjacent intersections to provide input priors for reinforcement learning agents.
4. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 3, characterized in that: The fusion model of the Long Short-Term Memory Network and the Convolutional Neural Network adopts a parallel architecture. The Convolutional Neural Network branch is used to extract spatial features, and the Long Short-Term Memory Network branch is used to extract temporal dependencies. The output feature vectors of the two branches are concatenated and then input into the fully connected layer for traffic prediction. The pre-adjustment simulation mechanism uses the online simulation module of the microscopic traffic simulation model. The adjusted signal cycle parameters are input into the simulation model, and the simulation is run with the current measured traffic flow as the initial condition. The effectiveness of the adjustment plan is determined based on the simulation results.
5. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 1, characterized in that: Step S3 specifically includes: S31. Real-time monitoring of the current intersection's congestion level through prior state information to determine whether it exceeds a preset threshold. If it does, a dynamic timing optimization process is triggered to obtain a comprehensive state vector to form a complete digital description of the current traffic situation. S32. Call the pre-trained proximal policy optimization reinforcement learning agent, with the comprehensive state vector as the state space, the green light duration adjustment of each phase as the action space, and the composite traffic efficiency index as the reward function. Use the pruning policy update amplitude mechanism to perform multiple rounds of policy iteration to generate the optimized green light duration sequence. S33. Output the final signal control command, transmit it to the single-lamp controller at the intersection via the communication network for execution, and collect execution feedback data to send back to the data processing and optimization center.
6. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 5, characterized in that: The pre-trained PPO reinforcement learning agent adopts an Actor-Critic network architecture, where the Actor network is responsible for outputting action policies and the Critic network is responsible for evaluating state values. The composite traffic efficiency index includes a weighted combination of the reduction in total intersection delay, the total number of passing vehicles, and the number of stops, and its weighting coefficients are dynamically adjusted according to time period characteristics and traffic demand. The pruning policy update magnitude mechanism sets pruning parameters to limit the probability ratio before and after the policy update to a preset range, and uses generalized dominance estimation to calculate the dominance function. The strategy iteration adopts a multi-round parallel simulation approach to generate multiple parallel simulation scenarios and accumulates iterations until the reward function converges.
7. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 1, characterized in that: Step S4 specifically includes: S41. The regional coordination unit establishes a communication connection with the intelligent traffic light control system of the adjacent intersection, obtains traffic status data and current timing scheme information of the adjacent intersection, and triggers the optimization process if uneven traffic distribution is detected to determine the range of intersection groups that need to be adjusted. S42. Extract the signal period and phase difference of each intersection as optimization variables, use a genetic algorithm to perform parameter iterative optimization, generate a combination of synchronization control parameters through selection, crossover and mutation operations, and determine the final signal timing scheme. S43. Issue adjustment instructions to the intelligent traffic light control system at each intersection, synchronously update the signal cycle and phase difference, periodically analyze the coordination effect based on the equipment execution feedback data and traffic status data, dynamically adjust the synchronous control parameters, and form a closed-loop optimization mechanism for regional collaboration.
8. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 1, characterized in that: Step S5 specifically includes: S51. Real-time data on the number of pedestrians waiting and the waiting time at intersections are collected using AI cameras and combined with vehicle flow information to form a comprehensive traffic status dataset. If the number of pedestrians waiting exceeds a preset threshold, the pedestrian priority mode is triggered. S52. Extract pedestrian waiting time data. If the waiting time exceeds the preset maximum tolerance time, extract historical waiting time records through a feedback loop mechanism to construct a time series distribution model. Use a classification algorithm to process the comprehensive traffic status dataset, incorporate historical data as a weight factor into the decision, and dynamically calculate the adjustment range of pedestrian crossing green light time. S53. The initial green light duration allocation scheme is coordinated and optimized through the intersection group synchronization scheme to generate a safety-first enhanced timing scheme, and the adjustment effect is recorded to form a continuous optimization closed loop.
9. The method for dynamic timing of holographic perception traffic signals based on near-end strategy optimization according to claim 1, characterized in that: Step S6 specifically includes: S61. The intelligent transportation big data management platform obtains execution log data from the sensing and control devices deployed on smart light poles, compares the device operation data with preset thresholds using anomaly detection rules, and generates anomaly detection results by combining traffic events identified by AI cameras. S62. Activate the intelligent work order alarm center based on the anomaly detection results, automatically generate intelligent work orders containing anomaly description, category, priority and location information, and distribute them to the corresponding processing units according to the six processes of alarm, dispatch, order acceptance, repair, review and completion, and track the work order execution status to form a complete operation and maintenance closed loop. S63. Extract execution effect data from the work order execution progress information, transmit it to the data processing and optimization center, and use a reinforcement learning agent to input the execution effect evaluation result as a feedback signal into the iterative training module to adjust the state space weights and reward function parameters of the PPO reinforcement learning model. S64. The updated strategy configuration is reloaded into the intelligent transportation big data management platform and data processing and optimization center, and the process of log data acquisition, anomaly detection, work order processing and effect feedback is executed in a loop to form a closed-loop mechanism for continuous improvement.
10. A holographic sensing traffic signal dynamic timing system based on near-end strategy optimization, applied to the holographic sensing traffic signal dynamic timing method based on near-end strategy optimization as described in any one of claims 1-9, characterized in that: include: The holographic perception and data fusion module, deployed on the smart light pole, includes an AI camera, an environmental detector, a one-button alarm device, and a single-lamp controller. It is used to collect multi-source data and fuse it to form a comprehensive state vector. The feature extraction and congestion prediction module, built into the data processing and optimization center, is used to extract spatiotemporal features using a fusion model of convolutional neural networks and long short-term memory networks, identify congestion levels and predict traffic trends, and generate state prior information. The PPO dynamic timing optimization module is activated when the congestion level exceeds a dynamic threshold. It calls the near-end policy optimization reinforcement learning agent to iterate the policy and generate an optimized green light duration sequence. The regional collaborative control module communicates with adjacent intersections through the regional coordination unit. It is used to optimize the signal period and phase difference using a genetic algorithm to generate a synchronization scheme for the intersection group and realize green wave coordinated control. The safety priority decision module has built-in vehicle priority, pedestrian priority and regular time period modes, which are used to monitor pedestrian demand in real time and dynamically adjust the green light duration for pedestrian crossing, and generate an enhanced safety priority timing scheme. The intelligent operation and maintenance and model evolution module has a built-in intelligent work order alarm center, which is used to execute closed-loop work order processes and feed back the execution effect data to the PPO reinforcement learning agent for continuous iterative training, forming a full-link adaptive evolution closed loop.