Traffic signal adaptive control method and system based on congestion feedback adjustment

CN122290342APending Publication Date: 2026-06-26ANHUI KELI INFORMATION IND

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ANHUI KELI INFORMATION IND
Filing Date
2026-04-13
Publication Date
2026-06-26

Smart Images

  • Figure CN122290342A_ABST
    Figure CN122290342A_ABST
Patent Text Reader

Abstract

This invention discloses a traffic signal adaptive control method and system based on congestion feedback adjustment, belonging to the field of traffic signal control technology. The method includes: S1, acquiring the travel speed of multiple turns at a target intersection; S2, determining the congestion level of the corresponding turn based on the travel speed, and generating the corresponding turn timing demand according to the congestion level; S3, determining the turn timing action for each signal timing stage based on the turn timing demand and the subordinate relationship between the turn and the signal timing stage; S4, adjusting and executing a new signal timing scheme based on the turn timing action; S5, acquiring the saturation change rate for each turn after the new signal timing scheme is executed; S6, evaluating the control effect of the turn timing action based on the saturation change rate, and performing feedback correction on the control parameters based on the evaluation results. This method overcomes the problems of poor generalization ability and indirect optimization objectives inherent in traditional methods based on fixed mathematical models; moreover, this system does not rely on high-cost connected vehicles or trajectory prediction equipment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of traffic signal control technology, and in particular to an adaptive traffic signal control method and system based on congestion feedback adjustment. Background Technology

[0002] Traffic signal control is a core tool in urban road network management, and the quality of its control strategies directly affects road traffic efficiency and congestion mitigation. With rapid urbanization, the dynamics and complexity of traffic flow are increasing, making traditional timed control or single-point sensor control insufficient for the demands of refined management. Therefore, adaptive traffic signal control systems have emerged, aiming to automatically adjust signal timing parameters based on real-time traffic conditions to optimize the overall operational efficiency of intersections and even entire areas.

[0003] Currently, research and practice in the field of adaptive signal control mainly revolve around two core questions: "what to sense" and "how to make decisions," and have formed the following three typical technical routes: The first approach is reactive control based on traditional detection parameters. This method relies on fixed detectors to collect basic parameters such as traffic flow, time occupancy, and queue length. Through mathematical models (such as the Webster algorithm) or inductive delay mechanisms, it adjusts signal timing in real time to minimize average vehicle delay, queue length, or balance phase saturation. While this approach is relatively mature and has manageable implementation costs, its control effectiveness is highly dependent on the pre-set mathematical model. When facing complex and variable real-world traffic flows, the fixed algorithm model has limited generalization ability, making it difficult to achieve continuous global optimization.

[0004] The second approach is predictive control based on vehicle-to-infrastructure (V2I) communication and trajectory data. This method utilizes connected vehicles (CVs) or wide-area detection to acquire high-precision vehicle trajectory, speed, and location information. By predicting the time a vehicle arrives at the stop line, it achieves vehicle speed guidance (GLOSA) or dynamically optimizes signal timing. Although this approach theoretically offers superior forward-looking control, its actual effectiveness heavily relies on high connected vehicle penetration and a robust V2I communication infrastructure. The system is costly to build and maintain, and its optimization effectiveness is limited when penetration is insufficient, making fairness difficult to guarantee. Therefore, it is currently difficult to promote and apply it on a large scale at the city level.

[0005] The third approach is intelligent optimization based on data-driven approaches and reinforcement learning. This method treats intersections or areas as intelligent agents, traffic states (such as lane queuing and waiting times) as the state space, and traffic light actions as the action space. Through deep reinforcement learning (DRL) algorithms, the agent autonomously learns the optimal control strategy through interaction with the environment. This approach demonstrates strong potential for handling complex problems, but its models are often considered "black boxes," lacking interpretability in the decision-making process and posing risks to safety and robustness. Furthermore, such methods typically require massive amounts of training data and enormous computational resources, making the engineering transition from simulation environments to real-world scenarios extremely challenging. Most research remains at the theoretical or small-scale simulation verification stage.

[0006] It is evident that existing mainstream technologies face challenges when it comes to large-scale, low-cost, and highly reliable applications: they may be limited by the generalization ability of the model and the indirectness of the optimization objectives, or constrained by high deployment costs and stringent prerequisites, or trapped by the black box nature of the model, dependence on data and computing power, and engineering difficulties.

[0007] Therefore, there is an urgent need for a traffic signal adaptive control method that can make full use of existing road perception infrastructure, has transparent and explainable control logic, achieve closed-loop online learning and continuous performance evolution, and is easy to deploy and maintain in engineering. Summary of the Invention

[0008] In a first aspect, to solve the above-mentioned technical problems, the present invention provides a traffic signal adaptive control method based on congestion feedback adjustment, comprising the following steps: S1. Obtain the travel speed of vehicles turning at multiple turns at the target intersection; S2. Based on the travel speed, determine the congestion level of the corresponding turn, and generate the corresponding turn timing requirement according to the congestion level; S3. Based on the timing requirements of each turn and the subordinate relationship between the turn and the signal timing phase, determine the timing adjustment action for each signal timing phase; wherein, the timing adjustment action includes increasing, decreasing or maintaining the green light time of the corresponding phase. S4. Based on the aforementioned timing adjustment action, adjust and execute a new signal timing scheme; S5. After the new signal timing scheme is executed, obtain the saturation change rate for each steering direction; S6. Based on the saturation change rate, evaluate the control effect of the time adjustment action, and correct the control parameters according to the evaluation results.

[0009] Furthermore, in step S6, feedback correction of the control parameters based on the evaluation results includes: Based on the time adjustment selection at each stage Examine the rate of change of saturation of its dominant steering. And obtain the steering saturation within one cycle after optimization. : when That is, when the time adjustment action is to increase the green light time of the stage; (a1) If ,in, If the value is a non-negative constant close to zero, it is determined to be negative feedback, and the corresponding feedback coefficient is... Decrease by one adjustment step ; (b1) If If so, it is determined to be general positive feedback, and the feedback coefficient is maintained. constant; (c1) If If it is strong positive feedback, then the corresponding feedback coefficient is determined. Add an adjustment step ; when That is, when the time adjustment action is to reduce the green light time during a phase: (a2) If or If it is negative feedback, then the corresponding feedback coefficient will be adjusted. Add an adjustment step ; (b2) If and If so, it is determined to be general positive feedback, and the feedback coefficient is maintained. constant; (c2) If and If it is strong positive feedback, then the corresponding feedback coefficient is determined. Reduce adjustment step size .

[0010] Furthermore, it also includes the feedback coefficient. The neutral attenuation step is performed such that, in the absence of the strong positive feedback or the negative feedback, the feedback coefficient... It slowly regresses to a preset baseline value over time.

[0011] Further, S1 includes: S11. Based on the vehicle identity and time information collected by the vehicle identity recognition equipment deployed in the road network, match the vehicle's driving trajectory between consecutive intersections, and calculate the first steering stroke speed according to the intersection spacing and the driving time difference. S12. Obtain the second steering stroke speed within the same time and space range, where the second steering stroke speed is derived from the Internet or floating car data; S13. Based on the dynamic credibility calculation model, assign weights to the vehicle speeds of the first steering stroke and the second steering stroke respectively. and The final steering stroke speed is obtained through weighted fusion. :

[0012] In the formula, The speed of the vehicle during the first steering stroke; The speed is the second steering stroke speed.

[0013] Furthermore, the dynamic reliability calculation model is based on the effective sample rate of the first steering stroke speed data. and coefficient of variation Weights Perform calibration, and the calibrated weights Satisfying the expression:

[0014] In the formula, , For calibration function; when Below the preset effective sample rate threshold hour, It is a negative value; when Higher than the preset coefficient of variation threshold hour, It is a negative value.

[0015] Furthermore, the specific method for obtaining the travel speed of multiple turns at the target intersection in S1 is as follows: Obtain the travel speed sequence of this steering input over multiple consecutive statistical time windows. ; Perform linear regression on the travel speed sequence to obtain the slope. ,intercept and goodness of fit ; According to the goodness of fit With prediction threshold Stability threshold The relationship is used to determine the travel speed input value for S2. : like ,but ; like ,but ; like ,but ,in, .

[0016] Further, in step S2, generating the corresponding steering timing requirement based on the congestion level includes: At least two speed thresholds are preset for each steering input, the steering travel speed is divided into multiple levels, and these levels are converted into quantifiable timing requirements. Among them, the level representing more severe congestion corresponds to... The value is greater.

[0017] Furthermore, in S3, determining the timing adjustment actions for each signal timing stage includes: S31, For each signal timing phase According to its associated set of directions Calculation phase timing requirements :

[0018] In the formula, The feedback coefficient is... This is the value required for time adjustment; S32, Adjust the time requirements of the aforementioned stages Compare with the preset demand threshold range to determine the adjustment category to which this stage belongs; S33. Determine the timing action for this stage based on the adjustment category, the current prior saturation level of this stage, and the preset dynamic step size rule.

[0019] Furthermore, S31 also includes a process for allocating exclusive and non-exclusive redirection requests to avoid duplicate responses: a. Timing phase for each signal Distinguish its exclusive turning set Non-exclusive turning set ; b. Iteratively determine a dominant steering and its demand value for each stage, in the signal timing determination stage. After the dominant turn is completed, the dominant turn is removed from the set of non-exclusive turns in other phases; c. Calculate the stage timing demand using the demand values ​​for the dominant steering at each stage determined in step b. .

[0020] Furthermore, the method also includes a system evolution step: Steps S1 to S6 are executed continuously, and the status data, stage timing action data, and reward values ​​calculated based on the saturation change rate and travel speed change are recorded during each execution to form a historical experience dataset. Based on the historical experience dataset, construct a traffic control knowledge graph and / or train an offline reinforcement learning model; In subsequent step S3, the determined timing action or feedback coefficient is corrected by combining the reasoning results of the traffic control knowledge graph and / or the recommendation results of the offline reinforcement learning model.

[0021] Furthermore, in the system evolution step, the recommendation strategy output by the offline reinforcement learning model is transformed into a response to the feedback coefficients. The correction amount is then applied to the feedback coefficients using a smooth update algorithm. Update.

[0022] A second aspect of the present invention provides a traffic signal adaptive control system based on congestion feedback adjustment for implementing the method, comprising: The data sensing module is configured to acquire the travel speed of vehicles at multiple turns at the target intersection; The demand analysis module is configured to determine the congestion level of the corresponding turn based on the travel speed, and generate the corresponding turn timing demand according to the congestion level. The decision module is configured to determine the timing adjustment action for each signal timing stage based on the timing adjustment requirements of each steering and the subordinate relationship between steering and signal timing stages. The control execution module is configured to adjust the signal timing scheme based on the timing action and issue the execution order. The verification module is configured to obtain the saturation change rate at each steering position after the adjusted signal timing scheme is executed; and The feedback learning module is configured to evaluate the effect of the time adjustment action based on the saturation change rate and to perform feedback correction on the control parameters according to the evaluation results; wherein the control parameters include at least a feedback coefficient for mapping the steering time adjustment demand to the stage time adjustment demand.

[0023] Furthermore, the data sensing module includes: The trajectory reconstruction unit is used to calculate the vehicle speed during the first steering stroke based on vehicle identification data; A multi-source fusion unit is used to perform weighted fusion of the first steering stroke speed and the second steering stroke speed from Internet floating car data; and the multi-source fusion unit determines the fusion weight based on a dynamic credibility model.

[0024] Furthermore, the requirements analysis module includes: The state estimation unit is used to perform linear regression analysis on the travel speed sequence of multiple consecutive time windows, and dynamically selects the historical mean or regression prediction value as the travel speed input based on the goodness of fit. The hierarchical mapping unit is used to map the trip speed input to the congestion level and the corresponding time adjustment demand value based on a preset speed threshold.

[0025] Furthermore, the decision-making module includes: The demand allocation unit is used to calculate the timing demand of each stage based on the subordinate relationship between the steering and the stage and the feedback coefficient. When a steering is allowed by multiple stages, the exclusive and non-exclusive steering demand allocation algorithm is executed to avoid conflicts. The action decision unit is used to determine the timing action for each stage based on the stage timing requirements, prior saturation level, and dynamic step size rules.

[0026] Furthermore, it also includes an intelligent evolution module, which comprises: The knowledge graph engine is used to build and update traffic control knowledge graphs based on historical operational data and provide rule-based reasoning. Offline reinforcement learning models, trained on historical state-action-reward data, are used to provide policy recommendations; The collaborative arbitration engine is used to fuse and adjudicate the reasoning results of the knowledge graph engine and the recommendation results of the offline reinforcement learning model, and feed the adjudication results back to the decision module or feedback learning module to correct the time adjustment action or update the feedback coefficients.

[0027] Compared with the prior art, the embodiments of the present invention have the following beneficial effects: This invention constructs a closed-loop control mechanism that uses travel speed as the excitation input and saturation change rate as the verification feedback. It directly addresses the core traffic management objective of "increasing vehicle speed and alleviating congestion," overcoming the problems of poor generalization and indirect optimization objectives inherent in traditional adaptive methods based on fixed mathematical models. Furthermore, this system does not rely on high-cost connected vehicles or trajectory prediction equipment. It fully utilizes internet data and existing data collection facilities, achieving rapid response, logical transparency, and easy engineering deployment through clear and interpretable rules combined with online parameter self-calibration. The saturation change rate feedback verification effectively avoids ineffective green time allocation, improving single-point traffic efficiency and congestion relief while ensuring system reliability, interpretability, and long-term adaptive optimization potential. Attached Figure Description

[0028] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0029] Figure 1(a) is a schematic diagram of the construction of a typical intersection device disclosed in an embodiment of the present invention.

[0030] Figure 1(b) is a schematic diagram of the construction of the 4×2 road network equipment disclosed in the embodiment of the present invention.

[0031] Figure 2 This is a schematic diagram of the standard phase scheme topology disclosed in this invention.

[0032] Figure 3 This is a flowchart of the cross-cycle closed-loop system disclosed in this invention.

[0033] Figure 4 This is a schematic diagram of the two-layer architecture of the intelligent evolution of the system disclosed in this invention. Detailed Implementation

[0034] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0035] This invention aims to provide a traffic signal adaptive control method and system based on congestion feedback adjustment. By acquiring and analyzing steering-level travel speed and traffic flow data that directly reflect the degree of congestion through internet data, existing law enforcement equipment such as electronic police checkpoints, and a self-built low-cost Bluetooth identity collection system, the method further combines signal operation data with saturation indexes to construct a closed-loop control system of "incentive-observation-feedback-adjustment" with the goal of increasing travel speed to alleviate congestion. This enables dynamic adjustment of signal timing schemes and improves the adaptive control capabilities of congestion response and traffic lights in response to traffic flow changes.

[0036] The traffic signal adaptive control method based on congestion feedback adjustment provided by this invention mainly includes the following steps: S1: Obtain the travel speed of multiple turns at the target intersection.

[0037] This solution fully utilizes internet data and existing data collection facilities. For example, by reusing or supplementing electronic police checkpoints, Bluetooth identity collectors, and other identification devices, a road network-level vehicle identification system is constructed. Based on the road network topology and a unified clock, vehicle identity and location information is collected to reconstruct the vehicle's trajectory between intersections. Then, based on time differences and intersection spacing, the system calculates and generates the turning-speed data at intersections. Further integration with internet or floating car data, using the high-precision trajectory of the self-built system as the core, utilizes internet data for calibration, completion, and systematic verification, ultimately achieving a highly reliable output of turning-speed data.

[0038] During implementation, electronic police checkpoints or other identity data collection devices covering the target management intersection and its upstream and downstream areas are reused or supplemented to form a road network-level data collection network. As shown in Figure 1(a), a typical intersection requires the simultaneous deployment of data collection devices at four surrounding related intersections, with an equipment effectiveness of 1 / 5. Although the control application of a single intersection requires equipment coverage at surrounding intersections, the execution equipment such as electronic police checkpoints is itself a necessary facility for intersection traffic management. Moreover, when the target management intersection forms a region, the required equipment and costs will be greatly reduced, making it very suitable for urban road traffic scenarios, such as... Figure 1b As shown in the figure, the equipment efficiency of a 4×2 road network is increased to 2 / 5. The vehicle data collected at fixed points includes vehicle identity or ID, collection time, intersection, direction, and lane (limited to electronic police checkpoints).

[0039] S11. Trajectory reconstruction and steering stroke vehicle speed, vehicle passage statistics In this plan, the target intersection import Based on the vehicle data obtained by the data collection equipment, upstream and downstream intersections are matched according to the vehicle's identity or ID. , The obtained vehicle data, the vehicle passes through consecutive time points and and The time difference between vehicle passes. Summarize the number of all matched vehicles within the statistical time range, and eliminate abnormal data with time differences using the standard deviation method or IQR method to determine the time difference. and Valid samples with effective time differences for passing vehicles were restored to represent consecutive passing vehicles within the statistical time range. The vehicle trajectory, among which It refers to the direction the vehicle is traveling at the target intersection. The vehicle is at the target intersection import The turning point. Calculate the turning point of all valid samples. The average time difference yields the steering travel time. Further based on The vehicle speed is obtained by calculating the physical distance to determine the steering stroke. .

[0040] To obtain comprehensive vehicle traffic statistics, it is necessary to reduce the range of outlier data to be removed. This can be done by using the standard deviation method or IQR method to remove outlier data with time differences. Only samples with outlier time differences (OL→J) need to be excluded; the remaining sample size is used as the vehicle traffic statistics within the statistical time range. For data collection systems with lane-specific statistics, such as electronic traffic enforcement checkpoints, the number of vehicles passing through a designated turning lane is used as the vehicle traffic statistics.

[0041] S12. Multi-source data access and preprocessing This step aims to achieve time window alignment and spatial road segment alignment. Specifically, a unified aggregation time window (e.g., 5 minutes for statistics / 1 minute for updates) is defined for the real-time speed calculated by the Internet / floating car data and the vehicle identification system to ensure data comparability in the time dimension. The smallest road segment unit of the Internet or floating car system is matched with the road segment-turning data of the self-built system. For Internet and floating car data that cannot distinguish turns, multiple road segment-turning matches are repeated, but the basic confidence weight needs to be reduced. .

[0042] S13. By constructing a dynamic credibility calculation model, each data source is scored within each time window to determine the weight of the two sets of data.

[0043] Since internet data is imported from external sources, it provides a higher basic confidence weight for data in self-built systems. Then, the confidence weights of the self-built system data are calibrated using data features. These data features include sample coverage and data consistency level. 1) Calculate the ratio of the effective sample size of the steering stroke speed data of the calculation system to the total number of vehicles / passage. ,when When it falls below a certain threshold (e.g., 50%), according to Calibrate the confidence weights.

[0044] 2) Calculate the coefficient of variation of the vehicle speed data during the steering stroke of the system. ,when When it exceeds a certain threshold (e.g., 30%), it is processed according to... Calibrate the confidence weights. The confidence weights for the system data can be adjusted to... .in and The following linear design can be used as a reference:

[0045]

[0046] In the formula, and It is an adjustable parameter. and These are the expected thresholds for the effective sample rate and the coefficient of variation, respectively.

[0047] Subsequently, the fusion steering speed was calculated. It satisfies the expression:

[0048] To ensure the robustness of the final output data, the stability of the fusion can be verified by cross-checking the system data and external data. If the two sets of data are seriously conflicting, manual correction will be triggered.

[0049] In one specific embodiment, the vehicle speed data for each of the five consecutive steering directions is based on a sliding window. Perform linear regression analysis and calculate its linear slope. ,intercept and goodness of fit The design is based on goodness of fit. The travel speed input mechanism ensures that the travel speed input to the control logic is a high-quality state estimate that combines stability (mean) and foresight (prediction). Furthermore, a prediction threshold is set. 0.7, stability threshold The value is 0.5, based on the goodness of fit. With prediction threshold Stability threshold The relationship determines the input value for the travel speed. , specifically: when When there is a clear upward or downward trend, the linear regression predicted value is used as the input for the travel speed.

[0050] when When the data is highly volatile, its arithmetic mean is used as the input for the travel speed to avoid noise.

[0051] when When the upward or downward trend is unclear, a prudent compromise strategy should be adopted:

[0052] In the formula, .

[0053] S2. Based on the travel speed, determine the congestion level of the corresponding turn, and generate the corresponding turn timing requirement according to the congestion level.

[0054] Based on the determined travel speed inputs mentioned above, a speed classification threshold system will be established for subsequent time adjustment demand analysis. It is proposed to independently define thresholds for each turn at the intersection. Individual vehicle speed thresholds The steering stroke speed is divided into Vehicle speed is categorized.

[0055] In one specific embodiment, a certain steering is set to ,in The speed at which a vehicle turns is allowed is divided into the following 5 levels: Level 1: Level 2: Level 3: Level 4: Level 5: .

[0056] To ensure that timing adjustments comply with basic traffic signal control theory and constraints, a phased step adjustment method is adopted for timing adjustments, with a maximum period value set. Phase adjustment step size and stage upper and lower limits Adjusting the time step size It can be set to a fixed step size or a dynamic step size according to the actual application requirements.

[0057] The advantages of a fixed step size are system stability, ease of debugging, and good performance based on good prior experience in timing and relatively stable traffic flow. A dynamic step size is more intelligent, capable of rapid iteration based on a general initial timing and responding to extreme traffic flow changes; however, its more aggressive debugging strategy can easily lead to extreme changes. In practical applications, the choice and parameter settings for fixed / dynamic step sizes should be based on existing conditions.

[0058] In addition to determining the pressure level (5 levels) based on vehicle speed classification, the dynamic step size also needs to be introduced in advance with a priori saturation level to ensure that extreme and ineffective step size adjustments are not generated. Considering the actual traffic scenario timing adjustment needs, the relationship between dynamic step size, vehicle speed classification, and prior saturation level is shown in Table 1: Table 1. Relationship between dynamic step size and vehicle speed classification, prior saturation level

[0059] S21. Phase scheme identification and saturation calculation First, it is necessary to identify the computational relationship between the phase scheme and steering, determining the basis for timing adjustments and the aforementioned prior saturation calculations. According to the phase scheme structure in the GBT20999-2017 standard, a phase is the carrier of release time; multiple consecutive phases constitute a complete scheme, such as... Figure 2 As shown. The stage includes phase, which is further matched with steering. The relationship between stage and steering is the calculated relationship between timing / green light ratio and steering speed and traffic volume. The calculation of prior saturation can be expressed as:

[0060]

[0061] In the formula, It is a turn The prior saturation, It refers to the number of stages in the plan. It is a stage in the plan Green light time It is a turn and stage dependent variables, It is a stage The associated redirection set.

[0062] S22, Calculation of Staged Time Adjustment Requirements In this solution, the steering travel speed is graded and converted into quantifiable timing requirements. Different levels of time adjustment requirements correspond to different values, following the principle that the larger the level number, the smaller the demand value. The demand value can be negative, indicating a need to reduce the time allocation, denoted as... , Taking the vehicle speed classification in Table 1 as an example, steering... Vehicle speed classification requirements [ [] can be set to [5,3,1,-1,-3].

[0063] S3. Based on the timing requirements of each turn and the subordinate relationship between the turn and the signal timing phase, determine the timing adjustment actions for each signal timing phase; wherein, the timing adjustment actions include increasing, decreasing or maintaining the green light time of the corresponding phase.

[0064] In this plan, based on the steering timing requirements... The subordinate relationship between shift and stage Calculation phase timing requirements Generally, the maximum value of the stage-related turnaround debugging requirement is used as the stage timing requirement, i.e. This is to ensure that the timing needs of the most congested turns are addressed. It is a stage Adjusting the steering wheel The feedback coefficient, initially set at 1.0, can be corrected based on expert experience or by feedback correction based on the saturation verification results after time adjustment.

[0065] In a further proposed solution, to eliminate the problem of multiple stages repeatedly releasing the same turn under special phase schemes, causing the timing adjustment requirement for a certain turn to be responded to by multiple stages, it is necessary to distinguish between stage-exclusive turns and non-exclusive turns to allocate turn timing adjustment requirements. The specific method is as follows: a. Timing phase for each signal Distinguish its exclusive turning set Non-exclusive turning set .

[0066] b. Iteratively determine a dominant steering and its demand value for each stage, in the signal timing determination stage. After the dominant redirection is completed, it is removed from the set of non-exclusive redirections in other phases. Specifically, this includes: b1. Calculate the maximum exclusive demand and non-exclusive demand for each stage. For each unmarked stage:

[0067]

[0068] b2. Identify the stage where exclusive access shifts to demand-driven growth. If it exists, proceed to step b4; otherwise, proceed to step b3. For each unmarked stage, if... Then the phased adjustment demand Mark the dominant steering ; and from all stages Remove from Turning .

[0069] b3. Identify the non-exclusive shift demand-driven phase, and proceed to step b4 after completion. For each unmarked phase, select... The stage with the highest value If multiple phases exist, the phase with the highest average value of all shift demands is selected. If there are still cases with the same value, the first one in sequence is selected. This addresses the stage-wise time adjustment requirements. Mark the dominant steering ; and from all stages Remove from Turning .

[0070] b4. Determine whether the dominant shift has been marked in all stages. If yes, exit the iteration; otherwise, repeat step b1.

[0071] c. Calculate the stage timing demand using the demand values ​​for the dominant steering at each stage determined in step b. This can eliminate the effect of repeated release redirection on multi-stage time adjustment requirements.

[0072] This scheme divides all stages into three sets by setting thresholds: set When significantly upward; set Insignificant time adjustment; set When significantly downgraded.

[0073] S4. Based on the timing adjustment action, adjust and execute the new signal timing scheme.

[0074] In one specific embodiment, based on the example requirement value [5,3,1,-1,-3] above, the interval threshold can be set to [5,2), [2,-1), and [-1,-3].

[0075] Combining the upper and lower limits of the stage and the period constraints, the complete time adjustment process is as follows: (1) Initialization: Period C = scheme timing period, all stage timing is 0.

[0076] (2) Check the set If not empty, all stages in the set will be adjusted upwards. The adjustment will use a fixed step size or a dynamic step size based on the rules in Table 1. After the adjustment, the stage time will be greater than the stage upper limit. equal to Proceed to step (3).

[0077] (3) Check the set If not empty, all stages in the set will be adjusted downwards, using a fixed step size or a dynamic step size based on the rules in Table 1. After adjustment, the stage time will be less than the stage upper limit. equal to Proceed to step (4).

[0078] (4) Period upper limit adjustment: If the current period Not greater than the maximum value of the period If the time is right, proceed to process (5); otherwise, cancel the time adjustment of the stage with the least demand in the current upward adjustment stage and repeat process (4).

[0079] (5) Record the timing selection for all stages. and current steering saturation It will then output a new timing scheme.

[0080] S5. After the new signal timing scheme is implemented, obtain the saturation change rate for each steering direction.

[0081] After the new timing scheme is issued, vehicle traffic data is collected for one statistical period, and the steering saturation for the optimized period is obtained according to the saturation calculation method mentioned above. Calculate the rate of change of saturation:

[0082] Based on the time adjustment selection at each stage Examine its dominant steering rate of change of saturation .

[0083] S6. Based on the saturation change rate, evaluate the control effect of the time adjustment action, and make feedback corrections to the control parameters according to the evaluation results.

[0084] a. When That is, when increasing the green light time in a phase: like When this occurs, it creates a strong positive feedback loop, meaning that the green light time currently used to alleviate increased congestion further releases the actual traffic capacity for congestion-reversing traffic, thus simultaneously achieving both congestion relief and efficiency improvement.

[0085] like When the green light time is used to alleviate congestion, it is generally positive feedback, meaning that the green light time used to alleviate congestion does not significantly reduce the efficiency of traffic turning in congestion, meets the turning traffic demand, and effectively alleviates congestion. It is a small, non-negative value used to evaluate the rate of change, and is usually set to 0.1.

[0086] like When this occurs, it is a negative feedback loop, meaning that the green light time currently used to alleviate increased congestion actually reduces the efficiency of traffic turning from congestion. Even if it can alleviate congestion at the target turning point, it reduces the overall efficiency of the intersection.

[0087] b. When That is, reducing the green light time in a phase: like and When this occurs, it is a strong positive feedback, meaning that the reduction in green light time in the current stage significantly reduces the green light loss for target turning.

[0088] like and In this case, it is generally positive feedback, meaning that the reduction in green light time in the current stage does not affect the traffic capacity for the target to turn.

[0089] like or When this occurs, it is a negative feedback, meaning that the reduction in green light time at the current stage affects the efficiency of turning traffic or the traffic pressure reaches a critical value, easily leading to congestion.

[0090] This solution can achieve both real-time and experience-based feedback responses through two methods: time-back-off and coefficient correction. Time rollback is a process of quickly rolling back the time adjustment of negative feedback based on the saturation verification results.

[0091] Coefficient correction is achieved by setting a basic adjustment step size. (e.g., 0.05 or 0.1), adjust the feedback coefficient based on the saturation verification results. Make corrections: When the verification result is a strong positive feedback .

[0092] When the verification result is negative feedback .

[0093] Generally, positive feedback does not require correction based on the verification results.

[0094] To prevent coefficient rigidity, a small neutral attenuation is introduced. This makes all Without correction, adjust to the initial value (1.0 or expert-based correction value). Slightly back to normal. We encourage continued exploration.

[0095] Please see Figure 3 This invention demonstrates the closed-loop execution flow of its core control logic, embodying a dynamic cyclical mechanism of "incentive-observation-feedback-adjustment." The process begins with the "incentive" module: the system acquires high-quality steering stroke speed data processed through multi-source fusion and goodness-of-fit analysis, and transforms it into specific timing requirements based on preset speed thresholds. This represents the system's proactive perception and quantitative response to traffic congestion. The "observation and execution" phase follows: based on the timing requirements, the system calculates specific stage-specific timing plans using phase schemes, real-time prior saturation constraints, and conflict resolution algorithms, and then issues them for execution. This phase completes the decision-making and action output from "perceiving congestion" to "implementing intervention." After the plan is executed, the process enters the crucial "feedback" module: after one signal cycle, the system collects actual vehicle data and calculates the saturation change rate for each steering position. This indicator serves as the core effect verifier, used to quantitatively evaluate the true effect of the previous timing action, determining whether it is positive feedback (effective), general feedback (acceptable), or negative feedback (ineffective or harmful). Finally, the "adjustment" module: based on the verification results, the system performs real-time corrections to the control parameters. For timing actions that generate strong positive or negative feedback, the system optimizes subsequent decisions by adjusting the feedback coefficient or performing timing backoff; for general feedback, it maintains parameter stability. The corrected parameters will directly affect the calculation of the "timing requirement" in the next control cycle, thus forming a continuously self-optimizing closed loop.

[0096] Figure 3 This invention clearly reveals how it uses the posterior index of "saturation change rate" to verify and correct the effects of decisions made based on the prior index of "travel speed," thereby ensuring that the control strategy is not only responsive but also effective in the long term and avoids strategy drift.

[0097] S7. System Evolution Steps.

[0098] The aforementioned closed-loop control scheme based on "excitation, observation, feedback, and adjustment" (which can be called the "basic control layer") is not only a complete, independently operating adaptive system, but also a platform capable of continuous evolution, providing core data and algorithmic frameworks for the next generation of intelligent control systems. Its evolution goal is to upgrade the system from a "rule-driven, instant feedback" operating mode to a hybrid intelligent control mode integrating "perception-decision-memory," that is, to build an adaptive control system based on offline reinforcement learning and interpretable knowledge graph collaborative reasoning. The core path of this evolution is to accumulate high-quality "state-action-reward" triplet data through the long-term operation of the "basic control layer," and to use this data to build two core superstructures (traffic control knowledge graph: a structured, interpretable expert experience base; offline reinforcement learning model: an intelligent agent capable of mining super-experience strategies from historical data), forming a two-layer structure where the "basic control layer" and the "intelligent decision-making layer" work collaboratively, such as... Figure 4 As shown.

[0099] S71, Basic Control Layer In a hybrid intelligent architecture, the basic control layer serves as a highly reliable execution terminal, and its functions are extended and solidified based on the original closed-loop control: 1) Core execution: Responsible for making rapid decisions and executing signal control commands at high frequency (such as every signal cycle or every 5 minutes) based on real-time sensing data and preset (or optimized control parameters from the upper layer).

[0100] 2) Data Production: In each complete "excitation-observation-verification" control loop, the system automatically encapsulates and records a structured "experience data package," which includes: State (S): A panoramic snapshot at the moment of decision-making, including vehicle speed at each steering stage travel. saturation The currently running signal scheme and time context information (day of the week, time period).

[0101] Action (A): The specific timing scheme actually executed, accurate to the green light time increment for each signal stage. .

[0102] Reward (R): Based on the effect feedback (saturation change rate) over a period of time after the action is performed. Changes in vehicle speed during the journey This is calculated using a predefined reward function. For example, a function could be used: ,in, and These are the weighting coefficients. The rate of change of saturation that dominates the steering. This represents the change in average travel speed at the intersection. This reward value quantifies the overall effectiveness of this control action.

[0103] 3) Safety mechanisms: Embedded rules such as saturation verification, step size constraint, period constraint, and feedback coefficient correction ensure that the system's actions are always within a safe and reasonable range, providing a safe "simulation environment" for the exploration of upper-level intelligent models.

[0104] S72, Intelligent Decision-Making Layer The intelligent decision-making layer operates at a lower frequency (such as once a day or once a week), deeply mining and learning from the standardized historical data accumulated by the basic control layer. Its core consists of two mutually synergistic engines.

[0105] 1) Construction and application of traffic control knowledge graph Construction: The system automatically cleans, summarizes, and correlates the massive amounts of "experience data packages" generated by the basic control layer. In particular, the "feedback coefficient" mentioned above... The evolution of "and its history in different "time-state" contexts is transformed into structured knowledge.

[0106] Storage: A graph network is formed with "intersection-turn" as the entity and "time pattern", "traffic state interval", "effective control action" and "historical statistical utility" as key attributes. For example, a typical knowledge entry can be described as: "When turning left at the east entrance of intersection A, during the morning rush hour (7:30-8:30) on weekdays, when the saturation is ∈ [0.7, 0.85) and the vehicle speed is ∈ [15, 20] km / h, the green light time for east-west left turns is increased by 6±2 seconds. Historical statistics show that the probability of obtaining positive feedback is 85%, and the average comprehensive utility value is +0.12." Reasoning and Pre-tuning: When the real-time system re-enters a spatiotemporal and state context similar to a knowledge entry, the graph reasoning engine is activated, providing forward-looking guidance to the basic control layer: ① Pre-tuning feedback coefficients: Based on the historical utility of matching knowledge, key parameters in the basic layer algorithm are dynamically pre-tuned, such as related... Adjusted to ① This makes the decision-making logic of the basic layer naturally inclined to effective actions that have been verified in history; ② Risk warning: Identify and mark "state-action" pairs with a high probability of negative feedback in history, generate warning signals, and actively inhibit the basic layer from making the same decision in similar situations.

[0107] Supporting Model Training: ① Sample Selection: Based on the confidence and utility values ​​of knowledge items, thresholds are set to select high-quality subsets of experience as priority training samples for reinforcement learning models, guiding them to quickly learn effective strategies; ② Action Space Constraints: The reasonable and safe boundaries of control actions (such as the adjustment range of green lights in each phase) under various traffic conditions are clearly defined to trim and standardize the output action space of reinforcement learning models; ③ Policy Interpretation: "Post-hoc interpretation" is provided for the counterintuitive but efficient new strategies generated by reinforcement learning models. By finding similarities between some states or effects in the graph, understandable correlations are provided for "black box" decision-making.

[0108] 2) Training and deployment of offline reinforcement learning models Training: The system utilizes long-term accumulated structured "state-action-reward" sequences and employs an offline reinforcement learning (Offline RL) algorithm for training. This approach fully leverages historical data for policy optimization, eliminating the need for any high-risk online trial and error at real intersections, making it safe and efficient.

[0109] Objective: To train a policy network The network can adjust the current integrated vector. It directly outputs the timing adjustment action that maximizes the expected long-term cumulative reward. and its valuation value.

[0110] Deployment: The trained model runs in "AI advisor" mode: periodically (e.g., every 5 or 15 minutes) receiving a panoramic view of the base layer. It will output one or more high-value recommendation actions. and its prediction value.

[0111] 3) Collaborative decision-making and evolutionary mechanisms In this solution, the collaborative decision engine serves as the "command center" of the hybrid intelligence, responsible for fusing and adjudicating the outputs of the knowledge graph and the RL model. Its workflow is as follows: ① Feasibility Filtering: Receive rule suggestions (actions) from the knowledge graph. and confidence level ) and policy recommendations (actions) from RL models and Value), perform feasibility filtering (such as checking) Is it too low? (Is the value negative?) ②Effect Quantification and Normalization: To solve (probability value) and For issues where the values ​​(expected cumulative rewards) have different dimensions, the engine maps both to a unified "expected benefit score." For example, defining a graph benefit score. , These are the basic weight coefficients of the knowledge graph, which can be dynamically adjusted based on their long-term historical accuracy; the rolling mean and standard deviation are calculated using historical Q-value data to assess the current accuracy. Z-score normalization of the values: Then calculate: Map it to the vicinity of (0,1). These are the weight coefficients of the RL model.

[0112] ③ Context-weighted arbitration: Dynamically weighted based on the safety risk level of the current traffic situation and the coverage of historical experience. This significantly improves the efficiency of arbitration in normal scenarios with high safety risks or ample historical experience. The weighting is adjusted to ensure system robustness; in non-critical security scenarios with sparse historical experience, the weighting is increased cautiously. The weights allow the system to conduct controlled exploration.

[0113] ④ Decision output: Based on the weighted comprehensive score, select the final action plan to be executed and send it to the basic control layer for execution.

[0114] To ensure security, interpretability, and knowledge retention, the system does not directly execute the original actions output by the RL model. Instead, it back-infers and attributes the effective strategies to the core parameters of the basic control layer—the feedback coefficients. Above. The specific mechanism is as follows: When a certain RL strategy is adopted by the collaborative decision engine under a specific traffic state context, the system analyzes the strategy and identifies the target phase that its core logic primarily serves. With traffic flow Based on the effectiveness of the strategy (e.g., excess rewards), calculate the corresponding... Adjustment amount .Will Expanded into context-dependent functions A smooth update algorithm is used for correction, for example: ,in The forgetting factor (e.g., 0.8). This serves as a baseline value (e.g., 1.0). This mechanism ensures smooth parameter adjustment and avoids abrupt changes.

[0115] The above process constitutes a complete closed loop for the system's self-improvement: First, through the mechanism of indirect parameter correction based on RL policy feedback, the "super-empirical" strategies explored by RL are transformed into interpretable and storable parametric knowledge, enhancing the depth and breadth of the knowledge graph.

[0116] Next, all decision results and their effects are recorded as new "experience data packages" and imported into the historical database.

[0117] Finally, the offline reinforcement learning model is retrained periodically using the enhanced historical data; simultaneously, the updated knowledge graph and... The parameter library also provides better prior guidance for the training of the next model. Thus, the system achieves a spiral evolution from perception and execution to decision optimization, and then to knowledge accumulation and model iteration.

[0118] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A traffic signal adaptive control method based on congestion feedback regulation, characterized in that, Includes the following steps: S1. Obtain the travel speed of vehicles turning at multiple turns at the target intersection; S2. Based on the travel speed, determine the congestion level of the corresponding turn, and generate the corresponding turn timing requirement according to the congestion level; S3. Based on the timing requirements of each turn and the subordinate relationship between the turn and the signal timing phase, determine the timing adjustment action for each signal timing phase; wherein, the timing adjustment action includes increasing, decreasing or maintaining the green light time of the corresponding phase. S4. Based on the aforementioned timing adjustment action, adjust and execute a new signal timing scheme; S5. After the new signal timing scheme is executed, obtain the saturation change rate for each steering direction; S6. Based on the saturation change rate, evaluate the control effect of the time adjustment action, and correct the control parameters according to the evaluation results.

2. The traffic signal adaptive control method based on congestion feedback regulation according to claim 1, characterized in that, In step S6, feedback correction of the control parameters is performed based on the evaluation results, including: Based on the time adjustment selection at each stage Examine the rate of change of saturation of its dominant steering. And obtain the steering saturation within one cycle after optimization. : when That is, when the time adjustment action is to increase the green light time of the stage; (a1) If ,in, If the value is a non-negative constant close to zero, it is determined to be negative feedback, and the corresponding feedback coefficient is... Decrease by one adjustment step ; (b1) If If so, it is determined to be general positive feedback, and the feedback coefficient is maintained. constant; (c1) If If it is strong positive feedback, then the corresponding feedback coefficient is determined. Add an adjustment step ; when That is, when the time adjustment action is to reduce the green light time during a phase: (a2) If or If it is negative feedback, then the corresponding feedback coefficient will be adjusted. Add an adjustment step ; (b2) If and If so, it is determined to be general positive feedback, and the feedback coefficient is maintained. constant; (c2) If and If it is strong positive feedback, then the corresponding feedback coefficient is determined. Reduce adjustment step size .

3. The traffic signal adaptive control method based on congestion feedback regulation according to claim 2, characterized in that, It also includes the feedback coefficients The neutral attenuation step is performed such that, in the absence of the strong positive feedback or the negative feedback, the feedback coefficient... It slowly regresses to a preset baseline value over time.

4. The traffic signal adaptive control method based on congestion feedback regulation according to claim 1, characterized in that, S1 includes: S11. Based on the vehicle identity and time information collected by the vehicle identity recognition equipment deployed in the road network, match the vehicle's driving trajectory between consecutive intersections, and calculate the first steering stroke speed according to the intersection spacing and the driving time difference. S12. Obtain the second steering stroke speed within the same time and space range, where the second steering stroke speed is derived from the Internet or floating car data; S13. Based on the dynamic reliability calculation model, assign weights to the vehicle speeds of the first steering stroke and the second steering stroke respectively. and The final steering stroke speed is obtained through weighted fusion. : In the formula, The speed of the vehicle during the first steering stroke; The speed is the second steering stroke speed.

5. The traffic signal adaptive control method based on congestion feedback regulation according to claim 4, characterized in that, The dynamic reliability calculation model is based on the effective sample rate of the first steering stroke speed data. and coefficient of variation Weights Perform calibration, and the calibrated weights Satisfying the expression: In the formula, , For calibration function; when Below the preset effective sample rate threshold hour, It is a negative value; when Higher than the preset coefficient of variation threshold hour, It is a negative value.

6. The traffic signal adaptive control method based on congestion feedback regulation according to claim 1, characterized in that, The specific method for obtaining the travel speed of multiple turns at the target intersection in S1 is as follows: Obtain the travel speed sequence of this steering input over multiple consecutive statistical time windows. ; Perform linear regression on the travel speed sequence to obtain the slope. ,intercept and goodness of fit ; According to the goodness of fit With prediction threshold Stability threshold The relationship is used to determine the travel speed input value for S2. : like ,but ; like ,but ; like ,but ,in, .

7. The traffic signal adaptive control method based on congestion feedback regulation according to claim 2, characterized in that, In step S2, generating corresponding turnaround time requirements based on the congestion level includes: At least two speed thresholds are preset for each steering input, the steering travel speed is divided into multiple levels, and these levels are converted into quantifiable timing requirements. Among them, the level representing more severe congestion corresponds to... The value is greater.

8. The traffic signal adaptive control method based on congestion feedback regulation according to claim 7, characterized in that, In step S3, the timing adjustment actions for each signal timing stage are determined, including: S31, For each signal timing phase According to its associated set of directions Calculation phase timing requirements : In the formula, The feedback coefficient is... For turning f The time adjustment requirement value; S32, Adjust the time requirements of the aforementioned stages Compare with the preset demand threshold range to determine the adjustment category to which this stage belongs; S33. Determine the timing action for this stage based on the adjustment category, the current prior saturation level of this stage, and the preset dynamic step size rule.

9. The traffic signal adaptive control method based on congestion feedback regulation according to claim 8, characterized in that, S31 further includes a process for allocating exclusive and non-exclusive redirection requests to avoid duplicate responses: a. Timing phase for each signal Distinguish its exclusive turning set Non-exclusive turning set ; b. Iteratively determine a dominant steering and its demand value for each stage, in the signal timing determination stage. After the dominant turn is completed, the dominant turn is removed from the set of non-exclusive turns in other phases; c. Calculate the stage timing demand using the demand values ​​for the dominant steering at each stage determined in step b. .

10. The traffic signal adaptive control method based on congestion feedback regulation according to claim 2, characterized in that, The method also includes a system evolution step: Steps S1 to S6 are executed continuously, and the status data, stage timing action data, and reward values ​​calculated based on the saturation change rate and travel speed change are recorded during each execution to form a historical experience dataset. Based on the historical experience dataset, construct a traffic control knowledge graph and / or train an offline reinforcement learning model; In the subsequent execution of step S3, the determined timing action or feedback coefficient is corrected by combining the reasoning results of the traffic control knowledge graph and / or the recommendation results of the offline reinforcement learning model.

11. The traffic signal adaptive control method based on congestion feedback regulation according to claim 10, characterized in that, In the system evolution steps, the recommendation strategy output by the offline reinforcement learning model is transformed into a response to the feedback coefficients. The correction amount is then applied to the feedback coefficients using a smooth update algorithm. Update.

12. A traffic signal adaptive control system based on congestion feedback regulation that implements the method of any one of claims 1-11, characterized in that, include: The data sensing module is configured to acquire the travel speed of vehicles at multiple turns at the target intersection; The demand analysis module is configured to determine the congestion level of the corresponding turn based on the travel speed, and generate the corresponding turn timing demand according to the congestion level. The decision module is configured to determine the timing adjustment action for each signal timing stage based on the timing adjustment requirements of each steering and the subordinate relationship between steering and signal timing stages. The control execution module is configured to adjust the signal timing scheme based on the timing action and issue the execution order. The verification module is configured to obtain the saturation change rate for each steering direction after the adjusted signal timing scheme is executed. as well as The feedback learning module is configured to evaluate the effect of the time adjustment action based on the saturation change rate and to perform feedback correction on the control parameters according to the evaluation results; wherein the control parameters include at least a feedback coefficient for mapping the steering time adjustment demand to the stage time adjustment demand.

13. The traffic signal adaptive control system based on congestion feedback regulation according to claim 12, characterized in that, The data sensing module includes: The trajectory reconstruction unit is used to calculate the vehicle speed during the first steering stroke based on vehicle identification data; A multi-source fusion unit is used to perform weighted fusion of the first steering stroke speed and the second steering stroke speed from Internet floating car data; and the multi-source fusion unit determines the fusion weight based on a dynamic credibility model.

14. The traffic signal adaptive control system based on congestion feedback regulation according to claim 12, characterized in that, The requirements analysis module includes: The state estimation unit is used to perform linear regression analysis on the travel speed sequence of multiple consecutive time windows, and dynamically selects the historical mean or regression prediction value as the travel speed input based on the goodness of fit. The hierarchical mapping unit is used to map the trip speed input to the congestion level and the corresponding time adjustment demand value based on a preset speed threshold.

15. The traffic signal adaptive control system based on congestion feedback regulation according to claim 12, characterized in that, The decision-making module includes: The demand allocation unit is used to calculate the timing demand of each stage based on the subordinate relationship between the steering and the stage and the feedback coefficient. When a steering is allowed by multiple stages, the exclusive and non-exclusive steering demand allocation algorithm is executed to avoid conflicts. The action decision unit is used to determine the timing action for each stage based on the stage timing requirements, prior saturation level, and dynamic step size rules.

16. The traffic signal adaptive control system based on congestion feedback regulation according to claim 12, characterized in that, It also includes an intelligent evolution module, which comprises: The knowledge graph engine is used to build and update traffic control knowledge graphs based on historical operational data and provide rule-based reasoning. Offline reinforcement learning models, trained on historical state-action-reward data, are used to provide policy recommendations; The collaborative arbitration engine is used to fuse and adjudicate the reasoning results of the knowledge graph engine and the recommendation results of the offline reinforcement learning model, and feed the adjudication results back to the decision module or feedback learning module to correct the time adjustment action or update the feedback coefficients.