A proportionally variable speed limit control method for mixed traffic flow
By constructing a probabilistic variable speed limit strategy based on Markov decision models and deep learning algorithms, the problem of unstable execution of speed limit strategies in mixed traffic flow environments was solved, and collaborative speed limiting between autonomous vehicles and manually driven vehicles was realized, thereby improving the stability and safety of traffic flow.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN INST OF TECH
- Filing Date
- 2025-04-24
- Publication Date
- 2026-06-26
AI Technical Summary
Existing variable speed limit strategies are difficult to ensure speed limit effectiveness in mixed traffic flow environments. Drivers of autonomous vehicles disengaging from autonomous driving mode leads to traffic instability, and the speed limit values are not precise enough, affecting traffic flow stability and safety.
A probabilistic variable speed limit control method based on Markov decision model and dual-delay deep deterministic policy gradient algorithm is adopted. Vehicle and environmental information is obtained through roadside terminals, a probabilistic variable speed limit policy is constructed, the optimal combination of vehicles to implement the speed limit is selected, and autonomous vehicles are used to guide human-driven vehicles to follow the speed limit policy, thereby reducing the autonomous driving disengagement rate.
This improved the effectiveness of speed limit enforcement, reduced traffic instability caused by drivers disobeying speed limits, and enhanced the stability and safety of traffic flow.
Smart Images

Figure CN120299248B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of intelligent traffic control technology, specifically relating to a proportional variable speed limit control method for mixed traffic flows. Background Technology
[0002] In high-traffic environments, non-periodic congestion on highways not only severely reduces road traffic efficiency but may also trigger a chain reaction, increasing the probability of traffic accidents.
[0003] Variable speed limit is a traffic management strategy that dynamically adjusts road speed limits based on traffic conditions. It aims to reduce speed differences between vehicles near congested areas, balance traffic flow distribution, and thus improve road capacity and alleviate congestion. However, this control model still has some limitations: (1) Existing variable speed limit control strategies rely on roadside signs, and it is difficult to ensure the speed limit effect when driver compliance is low. (2) Speed limits are usually expressed as relatively obvious integer values, rather than optimal dynamic speed limits based on real-time traffic conditions, which reduces the accuracy of the speed limit strategy to some extent. (3) Temporarily issuing uniform speed limit information to all vehicles on the entire road segment may cause drivers to become nervous and brake suddenly, which may exacerbate the instability of traffic flow.
[0004] With the rapid development of autonomous driving technology, a mixed traffic flow environment where autonomous and human-driven vehicles coexist is gradually taking shape and will continue to exist for a considerable period. Autonomous vehicles have the ability to interact with highway variable speed limit management platforms in real time, allowing them to receive speed limit commands and guide human-driven vehicles to follow variable speed limit strategies, thus optimizing traffic flow. However, fully autonomous driving technology is not yet mature, and autonomous vehicles are still in a stage where both human and autonomous control coexist. When an autonomous vehicle receives a speed limit command, some drivers may choose to disengage from autonomous driving mode and switch to manual driving due to distrust of the system. This behavior may cause additional speed fluctuations, thereby affecting the stability of traffic flow.
[0005] Therefore, how to fully consider the driving characteristics of autonomous vehicles in mixed traffic flow environments, design more intelligent, precise, and dynamically adaptable variable speed limit strategies, and improve traffic flow stability and road safety has become an important research direction for the construction of intelligent connected highways, and is also one of the key issues for the optimization of future traffic management systems. Summary of the Invention
[0006] The problem this invention aims to solve is to improve the effectiveness of speed limit enforcement and reduce traffic instability caused by drivers' non-compliance with speed limits. It proposes a proportional variable speed limit control method for mixed traffic flows.
[0007] To achieve the above objectives, the present invention provides the following technical solution:
[0008] A proportionally variable speed limit control method for mixed traffic flow includes the following steps:
[0009] S1. Obtain vehicle and environmental information of the current road segment through the roadside terminal, obtain the trip data of autonomous vehicles through information interaction, and retrieve the historical trip data of autonomous vehicles of the road segment for a certain period of time to construct the historical traffic dataset of the road segment.
[0010] S2. Based on the Markov decision model, construct the state, action, and reward functions in the probabilistic variable speed limit control strategy;
[0011] S3. Construct a probabilistic variable speed limit control strategy solution model based on the dual-delay deep deterministic strategy gradient algorithm. Use the historical traffic data of the road segment obtained in step S1 to train the probabilistic variable speed limit control strategy solution model to obtain the trained probabilistic variable speed limit control strategy solution model, and solve the optimal probabilistic variable speed limit control strategy for the current traffic state.
[0012] S4. Construct a structural equation model to calculate the autonomous driving disengagement rate of all autonomous vehicles in the controlled road segment under the optimal probability variable speed limit control strategy. Autonomous vehicles with excessively high autonomous driving disengagement rates will not be used to receive speed limit control commands.
[0013] S5. Construct speed limit control guidance capability indicators based on speed limit ratios and current traffic conditions, evaluate the control effectiveness of each speed limit enforcement vehicle combination, and select the speed limit enforcement vehicle combination with the best control effectiveness as the final speed limit strategy implementation target.
[0014] Furthermore, the specific implementation method of step S1 includes the following steps:
[0015] S1.1. The roadside terminal acquires vehicle information for the current road segment, including driving parameters and location information for all vehicles within the segment. The driving parameters specifically include the speed v of vehicle i. i acceleration a i Location information refers to the coordinates (x, y) of the centroid of vehicle i. i ,y i );
[0016] S1.2. The roadside terminal acquires environmental information for the current road segment, including traffic environment information and natural environment information. The traffic environment information includes the current road occupancy rate q and the average vehicle speed within the sub-segment. The current road segment is divided into controlled and congested sections based on the differences in the average vehicle speeds of different sub-segments. Then the sub-segment is classified as a congested segment. con α is the congestion judgment threshold, v * It is the free-flow velocity; congested road section r con The sub-road section within 1000 meters upstream will be designated as a controlled road section. vsl Speed limits are implemented; natural environmental information includes weather, lighting, and road humidity.
[0017] S1.3. The roadside terminal obtains vehicle type information through information interaction, including manually driven vehicles or autonomous vehicles, and collects the driver's personal attributes and vehicle trip attributes for each autonomous vehicle. The driver's personal attributes include the driver's age group, gender, and current autonomous driving monitoring status, while the vehicle trip attributes include the total trip duration and the autonomous driving trip duration.
[0018] Furthermore, the specific implementation method of step S2 includes the following steps:
[0019] S2.1. The setting status consists of 4 elements: congested road segment r con Road occupancy Congested road section r con average vehicle speed Controlled road section r vsl Road occupancy Controlled road section r vsl average vehicle speed state
[0020] S2.2. Action A is set to consist of two continuous variables: the speed limit value and the speed limit ratio, where the speed limit value v∈[v...]. min ,v max ], where v min v is the minimum speed limit for a road segment. max The maximum speed limit for the road segment is denoted as ; the speed limit ratio P∈(0,1) is the percentage of autonomous vehicles that receive the speed limit command in the current control cycle, and action A=(v,p) constitutes a two-dimensional continuous action.
[0021] S2.3. The incentives include four indicators: vehicle delay, vehicle CO2 emissions, fuel consumption, and driving risk.
[0022] S2.3.1. The expression for the vehicle delay index is:
[0023]
[0024] Among them, D i L is the delay time of vehicle i, and L is the total road segment length. and These represent the average vehicle speed and free-flow speed of vehicle i, respectively.
[0025] S2.3.2. The expression for the vehicle CO2 emission index is:
[0026] E i (t)= max (E0,e1+e2v i (t)+e3(v i (t)) 2 +e4a i (t)+e5(a i (t)) 2 +e6v i (t)a i (t))
[0027]
[0028] Among them, E i (t) is the CO2 emission rate of vehicle i at time t, v i (t) and a i (t) represent the velocity and acceleration of vehicle i at time t, respectively, E i This represents the CO2 emissions of vehicle i during the observation time T. The other parameters are: E0 = 5.11 × 10⁻⁶. -1 e1 = 5.53 × 10 -1 e2 = 1.61 × 10 -1 e3 = -2.89 × 10 -2 e4 = 2.66 × 10 -1 e5 = 5.11 × 10 -1 e6 = 1.83 × 10 -1 ;
[0029] S2.3.3. The expression for the fuel consumption index is:
[0030] VSP i (t)=v i (t)(1.1a i (t)+0.132)+0.000302(v i (t)) 3 ,
[0031]
[0032] Among them, VSP i (t) is the specific power of vehicle i at time t, NFR i (t) is the fuel consumption rate of vehicle i at time t, NFR i It is the fuel consumption of vehicle i during the observation time T;
[0033] S2.3.4. The expression for the driving risk index is:
[0034]
[0035]
[0036] Among them, TTC i (t) represents the time required for a collision to occur while the speed difference between vehicles i remains constant, x i-1 (t) and x i (t) represents the positions of vehicle i-1 and vehicle i at time t, respectively, where l is the vehicle length and v is the vehicle body length. i-1 (t) and v i (t) represents the velocities of vehicle i-1 and vehicle i at time t, respectively, TTC i It is the total cumulative time during which vehicle i experiences collisions within the observation time T;
[0037] S2.3.5. Construct the weighted reward R as follows:
[0038]
[0039] Where ω1, ω2, ω3, and ω4 are the weighting coefficients for vehicle delay indicators, vehicle CO2 emission indicators, fuel consumption indicators, and driving risk indicators, respectively, and N is the total number of vehicles.
[0040] Furthermore, the specific implementation method of step S3 includes the following steps:
[0041] S3.1. Construct the network components for the probabilistic variable rate limiting control policy solution model based on the dual-delay deep deterministic policy gradient algorithm, including the policy network μ(S t |θ μ ), Evaluation network Q(S) t A t |θ Q The target networks of the policy network and the evaluation network are respectively... The evaluation network and the target network of the evaluation network each consist of two sub-networks. and The evaluation network Q(S) t A t |θ Q ), and The target network constituting the evaluation network Initialize the parameters of all networks;
[0042] S3.2. Collect experience data and store it in the experience replay pool: For the state S at time step t tThe model is based on the current policy network μ(S) t |θ μ Choose an action A t ,in To ensure the exploratory nature of the noise term;
[0043] According to action A t The interaction with the environment yields the state S at the next time step. t+1 and reward R t , will (S t A t ,R t ,S t+1 Stored in the experience replay pool;
[0044] S3.3. Sample batch data from the experience replay pool: Each time an update is performed, a batch of data is randomly sampled from the experience replay pool;
[0045] S3.4. Update the parameters of the evaluation network using the batch data sampled in step S3.3. First, calculate the target Q value. The calculation formula is as follows:
[0046]
[0047] Where γ is the discount factor, It is an action generated through the target network.
[0048] Then the mean squared error loss function L(θ) is used. Q Update the evaluation network parameters using the following expression:
[0049]
[0050] Where m is the size of the sampling batch in S3.3;
[0051] S3.5. Optimize the policy network using the updated evaluation network: Calculate the gradient of the policy network using the evaluation network, aiming to maximize the Q-value of the evaluation network, and update the parameters θ of the policy network. μ The calculation formula is as follows:
[0052]
[0053] in, It is the gradient of the policy network. It evaluates the network's gradient with respect to the action. It is the gradient of the policy network with respect to the network parameters;
[0054] S3.6. Perform soft updates to the target network periodically: The soft update method for the target network parameters is as follows:
[0055]
[0056] Where τ is the soft update factor, These are the target network parameters of the policy network. It is the target network parameter θ used to evaluate the network. μ These are the policy network parameters, θ Q It is to evaluate network parameters;
[0057] S3.7. Iterate through steps S3.2 to S3.6 until the predetermined number of training iterations is reached or the model converges, to obtain the trained probability variable rate limiting control strategy solution model;
[0058] S3.8. Using the trained probabilistic variable speed limit control strategy solution model obtained in step S3.7, solve for the optimal control strategy (v) under the current traffic conditions. vsl ,p vsl ).
[0059] Furthermore, the specific implementation method of step S4 includes the following steps:
[0060] S4.1. Construct a structural equation model for computation, the expression of which is:
[0061] η=Bη+Γξ+ζ
[0062] Z = Λ z ξ+δ
[0063] W = Λ w η+ε
[0064] Where η represents the endogenous latent variable matrix, ξ represents the exogenous latent variable matrix, ζ is the disturbance term matrix in the structural equation; B represents the influence coefficient matrix of exogenous latent variables on endogenous latent variables, Γ describes the interaction relationship between endogenous latent variables, Z and W are the observed manifest variable matrices corresponding to ξ and η, respectively, and Λ z With Λ w These are the measurement error terms for Z and W, respectively;
[0065] S4.2. Construct a structural equation path diagram and set the intrinsic interaction relationship between exogenous and endogenous latent variables as follows: the autonomous driving disengagement rate η1 is affected by the natural environment ξ1, traffic environment ξ2, driving parameters ξ3, driver personal attributes ξ4, trip attributes ξ5, and control strategy ξ6.
[0066] S4.3. Based on the structural equation model formula and the structural equation path diagram, simultaneously calculate the system of equations. Substitute the processed explicit variable codes from the historical traffic dataset of the road segment into the system of equations to fit the undetermined coefficient matrices B, Γ, ζ, and Λ in the structural equation model. z Λ w , δ, ε;
[0067] S4.4. The controlled road segment r obtained in step S3 vsl Substitute the variable information corresponding to each autonomous vehicle into the structural equation to obtain the autonomous driving disengagement rate η1 of each autonomous vehicle.
[0068] S4.5. When the autonomous driving disengagement rate η1 of any autonomous vehicle is higher than the prescribed speed limit control threshold, it will not be used to receive speed limit control instructions. Except for the autonomous vehicles selected to receive speed limit instructions, the remaining autonomous vehicles are equivalent to manually driven vehicles in this round of control and receive guidance from the speed limit enforcement vehicle.
[0069] Furthermore, the specific implementation method of step S5 includes the following steps:
[0070] S5.1. Construct a speed limit control guidance capability index to obtain the speed limit control guidance capability Φ of autonomous vehicle i to manually driven vehicle j. ij The specific calculation method is as follows:
[0071] Φ ij =f(d ij ,Δv ij ,ρ ij ,κ i )
[0072] Where, d ij Δv is the distance between autonomous vehicle i and manually driven vehicle j along the road direction. ij ρ is the speed difference between i and j along the road direction. ij κ is the traffic flow density of the sub-road segments near i and j. i is the stability factor of the current speed limit command i, and f(·) is a function of the comprehensive control effect, which is a weighted linear combination or a nonlinear function;
[0073] S5.2. Calculate the comprehensive control benefit under the speed limit control guidance capability index. This is set during the speed limit guidance process if the comprehensive guidance value of the guided manually driven vehicle is lower than a preset threshold φ. * If the vehicle does not receive effective guidance, the calculation method is as follows:
[0074] S5.2.1. Calculate the effective guidance rate: Let N be the number of manually driven vehicles in all controlled road sections. hLet the overall guidance value of the j-th manually driven vehicle be... Effective guidance rate The expression is:
[0075]
[0076] Where I(·) is an indicator function that takes the value 1 when the condition is true and 0 otherwise;
[0077] S5.2.2. Calculate the comprehensive guidance value: Based on the fact that each manually driven vehicle may be guided by multiple autonomous vehicles simultaneously, let the set of autonomous vehicles be . vehicle The guiding ability Φ of manually driven vehicle j ij The effect weight is χ ij Then the comprehensive guidance value of vehicle j The calculation formula is as follows:
[0078]
[0079] Where, χ ij ∈[0,1], set according to factors such as distance decay function, relative speed or sensing range;
[0080] S5.2.3. Final Evaluation Value of the Plan: To comprehensively measure the advantages and disadvantages of different implementation plans, a comprehensive management and control benefit index G is introduced. vsl The result is obtained by weighting the effective guidance rate and the average comprehensive guidance value:
[0081]
[0082] Among them, λ1 and λ2 are weight parameters, and the bias target is set according to the system design requirements.
[0083] The beneficial effects of this invention are:
[0084] The present invention provides a proportional variable speed limit control method for mixed traffic flow. Addressing the shortcomings of existing variable speed limit methods in mixed traffic environments, the significant impact of driver compliance on execution effectiveness, and the potential for traffic fluctuations caused by large-scale speed limit commands, this invention constructs a refined proportional speed limit scheme based on reinforcement learning. By sending speed limit commands to some autonomous vehicles, the method guides manually driven vehicles to follow the speed limit strategy, thereby improving the speed limit execution effectiveness and reducing traffic instability caused by driver non-compliance. Attached Figure Description
[0085] Figure 1 This is a flowchart of a proportional variable speed limit control method for mixed traffic flow as described in this invention;
[0086] Figure 2 This is a structural equation path diagram of the present invention. Detailed Implementation
[0087] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention; that is, the described specific embodiments are merely a part of the embodiments of the invention, and not all of them. The components of the specific embodiments of the invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations, and the invention may also have other embodiments.
[0088] Therefore, the following detailed description of specific embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected specific embodiments of the invention. All other specific embodiments obtained by those skilled in the art based on these specific embodiments without inventive effort are within the scope of protection of this invention.
[0089] To further understand the invention's content, features, and effects, the following specific embodiments are provided, along with accompanying drawings. Figure 1 and attached Figure 2 Detailed explanation is as follows:
[0090] Example 1:
[0091] A proportionally variable speed limit control method for mixed traffic flow includes the following steps:
[0092] S1. Obtain vehicle and environmental information of the current road segment through the roadside terminal, obtain the trip data of autonomous vehicles through information interaction, and retrieve the historical trip data of autonomous vehicles of the road segment for a certain period of time to construct the historical traffic dataset of the road segment.
[0093] Furthermore, the specific implementation method of step S1 includes the following steps:
[0094] S1.1. The roadside terminal acquires vehicle information for the current road segment, including driving parameters and location information for all vehicles within the segment. The driving parameters specifically include the speed v of vehicle i. i acceleration a i Location information refers to the coordinates (x, y) of the centroid of vehicle i. i ,y i );
[0095] S1.2. The roadside terminal acquires environmental information for the current road segment, including traffic environment information and natural environment information. The traffic environment information includes the current road occupancy rate q and the average vehicle speed within the sub-segment. The current road segment is divided into controlled and congested sections based on the differences in the average vehicle speeds of different sub-segments. Then the sub-segment is classified as a congested segment. con α is the congestion judgment threshold, v * It is the free-flow velocity; congested road section r con The sub-road section within 1000 meters upstream will be designated as a controlled road section. vsl Speed limits are implemented; natural environmental information includes weather, lighting, and road humidity.
[0096] S1.3. The roadside terminal obtains vehicle type information through information interaction, including manually driven vehicles or autonomous vehicles, and collects the driver's personal attributes and vehicle trip attributes for each autonomous vehicle. The driver's personal attributes include the driver's age group, gender, and current autonomous driving monitoring status, while the vehicle trip attributes include the total trip duration and the autonomous driving trip duration.
[0097] S2. Based on the Markov decision model, construct the state, action, and reward functions in the probabilistic variable speed limit control strategy;
[0098] Furthermore, the specific implementation method of step S2 includes the following steps:
[0099] S2.1. The setting status consists of 4 elements: congested road segment r con Road occupancy Congested road section r con average vehicle speed Controlled road section r vsl Road occupancy Controlled road section r vsl average vehicle speed state
[0100] S2.2. Action A is set to consist of two continuous variables: the speed limit value and the speed limit ratio, where the speed limit value v∈[v...]. min ,v max ], where v min v is the minimum speed limit for a road segment. max The maximum speed limit for the road segment is denoted as ; the speed limit ratio P∈(0,1) is the percentage of autonomous vehicles that receive the speed limit command in the current control cycle, and action A=(v,p) constitutes a two-dimensional continuous action.
[0101] S2.3. The incentives include four indicators: vehicle delay, vehicle CO2 emissions, fuel consumption, and driving risk.
[0102] These metrics reflect the overall performance of the transportation system. When training the model, single-reward training can be used as needed, or weighted training can be performed according to the weights.
[0103] S2.3.1. The expression for the vehicle delay index is:
[0104]
[0105] Among them, D i L is the delay time of vehicle i, and L is the total road segment length. and These represent the average vehicle speed and free-flow speed of vehicle i, respectively.
[0106] S2.3.2. The expression for the vehicle CO2 emission index is:
[0107] E i (t)= max (E0,e1+e2v i (t)+e3(v i (t)) 2 +e4a i (t)+e5(a i (t)) 2 +e6v i (t)a i (t))
[0108]
[0109] Among them, E i (t) is the CO2 emission rate of vehicle i at time t, v i (t) and a i (t) represent the velocity and acceleration of vehicle i at time t, respectively, E i This represents the CO2 emissions of vehicle i during the observation time T. The other parameters are: E0 = 5.11 × 10⁻⁶. -1 e1 = 5.53 × 10 -1 e2 = 1.61 × 10 -1 e3 = -2.89 × 10 -2 e4 = 2.66 × 10 -1 e5 = 5.11 × 10 -1 e6 = 1.83 × 10 -1 ;
[0110] S2.3.3. The expression for the fuel consumption index is:
[0111] VSP i (t)=v i (t)(1.1a i (t)+0.132)+0.000302(v i (t)) 3 ,
[0112]
[0113] Among them, VSP i (t) is the specific power of vehicle i at time t, NFR i (t) is the fuel consumption rate of vehicle i at time t, NFR i It is the fuel consumption of vehicle i during the observation time T;
[0114] S2.3.4. The expression for the driving risk index is:
[0115]
[0116] Among them, TTC i (t) represents the time required for a collision to occur while the speed difference between vehicles i remains constant, x i-1 (t) and x i (t) represents the positions of vehicle i-1 and vehicle i at time t, respectively, where l is the vehicle length and v is the vehicle body length. i-1 (t) and v i (t) represents the velocities of vehicle i-1 and vehicle i at time t, respectively, TTC i It is the total cumulative time during which vehicle i experiences collisions within the observation time T;
[0117] S2.3.5. Construct the weighted reward R as follows:
[0118]
[0119] Where ω1, ω2, ω3, and ω4 are the weighting coefficients for vehicle delay indicators, vehicle CO2 emission indicators, fuel consumption indicators, and driving risk indicators, respectively, and N is the total number of vehicles.
[0120] S3. Construct a probabilistic variable speed limit control strategy solution model based on the dual-delay deep deterministic strategy gradient algorithm. Use the historical traffic data of the road segment obtained in step S1 to train the probabilistic variable speed limit control strategy solution model to obtain the trained probabilistic variable speed limit control strategy solution model, and solve the optimal probabilistic variable speed limit control strategy for the current traffic state.
[0121] Furthermore, the specific implementation method of step S3 includes the following steps:
[0122] S3.1. Construct the network components for the probabilistic variable rate limiting control policy solution model based on the dual-delay deep deterministic policy gradient algorithm, including the policy network μ(S t |θ μ ), Evaluation network Q(S) t A t |θ Q The target networks of the policy network and the evaluation network are respectively... The evaluation network and the target network of the evaluation network each consist of two sub-networks. and The evaluation network Q(S) t A t |θ Q ), and The target network constituting the evaluation network Initialize the parameters of all networks;
[0123] S3.2. Collect experience data and store it in the experience replay pool: For the state S at time step t t The model is based on the current policy network μ(S) t |θ μ Choose an action A t ,in To ensure the exploratory nature of the noise term;
[0124] According to action A t The interaction with the environment yields the state S at the next time step. t+1 and reward R t , will (S t A t ,R t ,S t+1 Stored in the experience replay pool;
[0125] S3.3. Sample batch data from the experience replay pool: Each time an update is performed, a batch of data is randomly sampled from the experience replay pool;
[0126] S3.4. Update the parameters of the evaluation network using the batch data sampled in step S3.3. First, calculate the target Q value. The calculation formula is as follows:
[0127]
[0128] Where γ is the discount factor, It is an action generated through the target network.
[0129] Then the mean squared error loss function L(θ) is used.Q Update the evaluation network parameters using the following expression:
[0130]
[0131] Where m is the size of the sampling batch in S3.3.
[0132] S3.5. Optimize the policy network using the updated evaluation network: Calculate the gradient of the policy network using the evaluation network, aiming to maximize the Q-value of the evaluation network, and update the parameters θ of the policy network. μ The calculation formula is as follows:
[0133]
[0134] in, It is the gradient of the policy network. It evaluates the network's gradient with respect to the action. It is the gradient of the policy network with respect to the network parameters;
[0135] Furthermore, minimizing this loss allows the policy network to select actions with higher Q values, thereby optimizing the policy.
[0136] S3.6. Perform soft updates to the target network periodically: The soft update method for the target network parameters is as follows:
[0137]
[0138] Where τ is the soft update factor, These are the target network parameters of the policy network. It is the target network parameter θ used to evaluate the network. μ These are the policy network parameters, θ Q It is to evaluate network parameters;
[0139] S3.7. Iterate through steps S3.2 to S3.6 until the predetermined number of training iterations is reached or the model converges, to obtain the trained probability variable rate limiting control strategy solution model;
[0140] S3.8. Using the trained probabilistic variable speed limit control strategy solution model obtained in step S3.7, solve for the optimal control strategy (v) under the current traffic conditions. vsl ,p vsl ).
[0141] S4. Construct a structural equation model to calculate the autonomous driving disengagement rate of all autonomous vehicles in the controlled road segment under the optimal probability variable speed limit control strategy. Autonomous vehicles with excessively high autonomous driving disengagement rates will not be used to receive speed limit control commands.
[0142] Furthermore, the specific implementation method of step S4 includes the following steps:
[0143] S4.1. Construct a structural equation model for computation, the expression of which is:
[0144] η=Bη+Γξ+ζ
[0145] Z = Λ z ξ+δ
[0146] W = Λ w η+ε
[0147] Where η represents the endogenous latent variable matrix, ξ represents the exogenous latent variable matrix, ζ is the disturbance term matrix in the structural equation; B represents the influence coefficient matrix of exogenous latent variables on endogenous latent variables, Γ describes the interaction relationship between endogenous latent variables, Z and W are the observed manifest variable matrices corresponding to ξ and η, respectively, and Λ z With Λ w These are the measurement error terms for Z and W, respectively;
[0148] The specific variable breakdown is shown in Table 1:
[0149] Table 1:
[0150]
[0151]
[0152] When processing historical trip data of autonomous vehicles, the explicit variables involved in each record were encoded. Explicit variables are divided into two categories: ordinal and categorical. Ordinal variables are normalized according to their numerical range, while categorical variables are encoded using a binary encoding method. Detailed encoding methods for explicit variables are shown in Table 2.
[0153] Table 2:
[0154]
[0155]
[0156] S4.2. Construct a structural equation path diagram and set the intrinsic interaction relationship between exogenous and endogenous latent variables as follows: the autonomous driving disengagement rate η1 is affected by the natural environment ξ1, traffic environment ξ2, driving parameters ξ3, driver personal attributes ξ4, trip attributes ξ5, and control strategy ξ6.
[0157] S4.3. Based on the structural equation model formula and the structural equation path diagram, simultaneously calculate the system of equations. Substitute the processed explicit variable codes from the historical traffic dataset of the road segment into the system of equations to fit the undetermined coefficient matrices B, Γ, ζ, and Λ in the structural equation model. z Λw , δ, ε;
[0158] S4.4. The controlled road segment r obtained in step S3 vsl Substitute the variable information corresponding to each autonomous vehicle into the structural equation to obtain the autonomous driving disengagement rate η1 of each autonomous vehicle.
[0159] S4.5. When the autonomous driving disengagement rate η1 of any autonomous vehicle is higher than the prescribed speed limit control threshold, it will not be used to receive speed limit control instructions. Except for the autonomous vehicles selected to receive speed limit instructions, the remaining autonomous vehicles are equivalent to manually driven vehicles in this round of control and receive guidance from the speed limit enforcement vehicle.
[0160] S5. Construct speed limit control guidance capability indicators based on speed limit ratios and current traffic conditions, evaluate the control effectiveness of each speed limit enforcement vehicle combination, and select the speed limit enforcement vehicle combination with the best control effectiveness as the final speed limit strategy implementation target.
[0161] Furthermore, the specific implementation method of step S5 includes the following steps:
[0162] S5.1. Construct a speed limit control guidance capability index to obtain the speed limit control guidance capability Φ of autonomous vehicle i to manually driven vehicle j. ij The specific calculation method is as follows:
[0163] Φ ij =f(d ij ,Δv ij ,ρ ij ,κ i )
[0164] Where, d ij Δv is the distance between autonomous vehicle i and manually driven vehicle j along the road direction. ij ρ is the speed difference between i and j along the road direction. ij It is the traffic flow density of the sub-road segments near i and j, κ i is the stability factor of the current speed limit command i, and f(·) is a function of the comprehensive control effect, which is a weighted linear combination or a nonlinear function;
[0165] The linear weighted form can be written as:
[0166]
[0167] β1, β2, β3, and β4 are normalized weight coefficients, which can be set or trained based on historical data or strategy design.
[0168] To measure the guiding role of autonomous vehicles in the execution of speed-limit policies, a speed-limit control guidance capability index is proposed. This capability is influenced by factors such as its driving state, location, surrounding traffic flow density, and vehicle spacing stability. The core of the guidance capability is represented by the control field, which indicates the spatial control coverage area of an autonomous vehicle.
[0169] S5.2. Calculate the comprehensive control benefits under the speed limit control guidance capability index. During the speed limit guidance process, if the comprehensive guidance value of the guided manually driven vehicle is lower than the preset threshold φ... * If the vehicle fails to receive effective guidance, it is considered that the vehicle has not received effective guidance. The calculation method is as follows:
[0170] S5.2.1. Calculate the effective guidance rate: Let N be the number of manually driven vehicles in all controlled road sections. h Let the overall guidance value of the j-th manually driven vehicle be... Effective guidance rate The expression is:
[0171]
[0172] Where I(·) is an indicator function that takes the value 1 when the condition is true and 0 otherwise;
[0173] S5.2.2. Calculate the comprehensive guidance value: Based on the fact that each manually driven vehicle may be guided by multiple autonomous vehicles simultaneously, let the set of autonomous vehicles be . vehicle For the guidance capability Φ of manually driven vehicle j ij The effect weight is χ ij Then the comprehensive guidance value of vehicle j The calculation formula is as follows:
[0174]
[0175] Where, χ ij ∈[0,1], set according to factors such as distance decay function, relative speed or sensing range;
[0176] S5.2.3. Final Evaluation Value of the Plan: To comprehensively measure the advantages and disadvantages of different implementation plans, a comprehensive management and control benefit index G is introduced. vsl The result is obtained by weighting the effective guidance rate and the average comprehensive guidance value:
[0177]
[0178] Among them, λ1 and λ2 are weight parameters, and the bias target is set according to the system design requirements.
[0179] It should be noted that relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0180] Although this application has been described above with reference to specific embodiments, various modifications can be made and components can be replaced with equivalents without departing from the scope of this application. In particular, as long as there is no structural conflict, the features in the specific embodiments disclosed in this application can be combined with each other in any way. The lack of an exhaustive description of these combinations in this specification is merely for the sake of brevity and resource conservation. Therefore, this application is not limited to the specific embodiments disclosed herein, but includes all technical solutions falling within the scope of the claims.
Claims
1. A proportionally variable speed limit control method for mixed traffic flow, characterized in that, Includes the following steps: S1. Obtain vehicle and environmental information of the current road segment through the roadside terminal, obtain the trip data of autonomous vehicles through information interaction, and retrieve the historical trip data of autonomous vehicles of the road segment for a certain period of time to construct the historical traffic dataset of the road segment. S2. Based on the Markov decision model, construct the state, action, and reward functions in the probabilistic variable speed limit control strategy; The specific implementation method of step S2 includes the following steps: S2.
1. The status setting consists of four elements: congested road sections. Road occupancy Congested road sections average vehicle speed Controlled road sections Road occupancy Controlled road sections average vehicle speed State S ; S2.
2. Action A is set to consist of two continuous variables: the speed limit value and the speed limit ratio. The speed limit value... ,in This is the minimum speed limit for the road section. This represents the maximum speed limit for the road section. Speed limit ratio The speed limit ratio is the percentage of autonomous vehicles that receive a speed limit command during the current control cycle. Action A This constitutes a two-dimensional continuous action; S2.
3. Set up rewards including vehicle delays, vehicles Four indicators: emissions, fuel consumption, and driving risk; S2.3.
1. The expression for the vehicle delay index is: ; in, L is the delay time of vehicle i, and L is the total road segment length. and These represent the average vehicle speed and free-flow speed of vehicle i, respectively. S2.3.
2. Vehicles The expression for emission indicators is: ; ; in, It is vehicle i at time t. Emission rate, and These are the velocity and acceleration of vehicle i at time t, respectively. It is the vehicle i within the observation time T Emissions, and the other parameters are as follows: , , , , , , ; S2.3.
3. The expression for the fuel consumption index is as follows: ; ; ; in, It is the specific power of vehicle i at time t. It is the fuel consumption rate of vehicle i at time t. It is the fuel consumption of vehicle i during the observation time T; S2.3.
4. The expression for the driving risk index is: ; ; in, This represents the time required for a collision to occur while the speed difference between vehicles i remains constant. Representing vehicle i and vehicles At time t, l represents the length of the vehicle body. Representing vehicle i and vehicles The velocity at time t It is the total cumulative time during which vehicle i experiences collisions within the observation time T; S2.3.
5. Constructing a weighted reward system as follows: ; in, These are vehicle delay indicators, vehicles The weighting coefficients for emission indicators, fuel consumption indicators, and driving risk indicators, where N is the total number of vehicles; S3. Construct a probabilistic variable speed limit control strategy solution model based on the dual-delay deep deterministic strategy gradient algorithm. Use the historical traffic data of the road segment obtained in step S1 to train the probabilistic variable speed limit control strategy solution model to obtain the trained probabilistic variable speed limit control strategy solution model, and solve the optimal probabilistic variable speed limit control strategy for the current traffic state. S4. Construct a structural equation model to calculate the autonomous driving disengagement rate of all autonomous vehicles in the controlled road segment under the optimal probability variable speed limit control strategy. Autonomous vehicles with excessively high autonomous driving disengagement rates will not be used to receive speed limit control commands. The specific implementation method of step S4 includes the following steps: S4.
1. Construct a structural equation model for computation, with the following expression: ; ; ; in, Represents the matrix of endogenous latent variables. Represents the matrix of exogenous latent variables. This is the interference term matrix in the structural equation; This represents the matrix of influence coefficients of exogenous latent variables on endogenous latent variables. Describe the interaction relationships between endogenous latent variables. and They are respectively and The corresponding observed manifest variable matrix, and They are respectively Measurement error term; S4.
2. Construct a structural equation model path diagram, setting the intrinsic interaction relationship between exogenous and endogenous latent variables as: autonomous driving disengagement rate Due to natural environment Traffic environment Driving parameters Driver's personal attributes Trip attributes Control strategies The impact; S4.
3. Based on the structural equation model formula and the structural equation path diagram, simultaneously calculate the system of equations. Substitute the processed explicit variable codes from the historical traffic dataset of the road segment into the system of equations to fit the undetermined coefficient matrix in the structural equation model. , , , , , , ; S4.
4. The controlled road section obtained in step S3 By substituting the variable information corresponding to each autonomous vehicle into the structural equation model, the disengagement rate of each autonomous vehicle can be obtained. ; S4.
5. The rate of autonomous driving disengagement for any autonomous vehicle When the speed exceeds the prescribed speed limit control threshold, it will not be used to receive speed limit control instructions. Except for the autonomous vehicles selected to receive speed limit instructions, the remaining autonomous vehicles will be treated as manually driven vehicles in this round of control and will receive guidance from the speed limit enforcement vehicles. S5. Construct speed limit control guidance capability indicators based on speed limit ratios and current traffic conditions, evaluate the control effectiveness of each speed limit enforcement vehicle combination, and select the speed limit enforcement vehicle combination with the best control effectiveness as the final speed limit strategy implementation target.
2. The proportional variable speed limit control method for mixed traffic flow according to claim 1, characterized in that, The specific implementation method of step S1 includes the following steps: S1.
1. The roadside terminal acquires vehicle information for the current road segment, including driving parameters and location information for all vehicles within the segment. The driving parameters specifically include the speed of vehicle i. acceleration Location information refers to the coordinates of the centroid of vehicle i. ; S1.
2. The roadside terminal acquires environmental information for the current road segment, including traffic environment information and natural environment information. The traffic environment information includes the current road occupancy rate q and the average vehicle speed within the sub-segment. The current road segment is divided into controlled and congested sections based on the differences in average vehicle speeds across different sub-segments. If so, the sub-segment will be classified as a congested segment. , It is the congestion judgment threshold. It refers to free-flow speed; congested road sections. The sub-road section within 1000 meters upstream will be designated as a controlled road section. Speed limits are implemented; natural environmental information includes weather, lighting, and road humidity. S1.
3. The roadside terminal obtains vehicle type information, including manually driven vehicles or autonomous vehicles, through information interaction, and collects the driver's personal attributes and vehicle trip attributes for each autonomous vehicle. The driver's personal attributes include the driver's age group, gender, and current autonomous driving monitoring status, while the vehicle trip attributes include the total trip duration and the autonomous driving trip duration.
3. The proportional variable speed limit control method for mixed traffic flow according to claim 2, characterized in that, The specific implementation method of step S3 includes the following steps: S3.
1. Construct the network components for the probabilistic variable rate limiting control policy solution model based on the dual-delay deep deterministic policy gradient algorithm, including the policy network. Evaluation Network The target networks of the policy network and the evaluation network are respectively... , The evaluation network and the target network of the evaluation network each consist of two subnetworks. and Constructing an evaluation network , and The target network constituting the evaluation network Initialize all network parameters; S3.
2. Collect experience data and store it in the experience replay pool: for the state at time step t The model is based on the current policy network. Choose an action ,in , To ensure the exploratory nature of the noise term; According to the action The interaction with the environment yields the state at the next time step. and rewards ,Will( Stored in the experience replay pool; S3.
3. Sample batch data from the experience replay pool: Each time an update is performed, a batch of data is randomly sampled from the experience replay pool; S3.
4. Update the parameters of the evaluation network using the batch data sampled in step S3.
3. First, calculate the target Q value. The calculation formula is as follows: ; in, It is a discount factor. It is an action generated through the target network. ; Then the mean squared error loss function is used. The updated evaluation network parameters are expressed as follows: ; ; Where m is the size of the sampling batch in S3.3; S3.
5. Optimize the policy network using the updated evaluation network: Calculate the gradient of the policy network using the evaluation network, aiming to maximize the Q-value of the evaluation network, and update the parameters of the policy network. The calculation formula is as follows: ; in, It is the gradient of the policy network. It evaluates the network's gradient with respect to the action. It is the gradient of the policy network with respect to the network parameters; S3.
6. Perform soft updates to the target network periodically: The soft update method for the target network parameters is as follows: ; ; in, It is a soft update factor. These are the target network parameters of the policy network. It is the target network parameter for evaluating the network. These are policy network parameters. It is to evaluate network parameters; S3.
7. Iterate through steps S3.2 to S3.6 until the predetermined number of training iterations is reached or the model converges, to obtain the trained probability variable rate limiting control strategy solution model; S3.
8. Using the trained probabilistic variable speed limit control strategy solution model obtained in step S3.7, solve for the optimal control strategy under the current traffic conditions. .
4. The proportional variable speed limit control method for mixed traffic flow according to claim 3, characterized in that, The specific implementation method of step S5 includes the following steps: S5.
1. Construct a speed limit control guidance capability index to obtain the speed limit control guidance capability of autonomous vehicle i for manually driven vehicle j. The specific calculation method is as follows: ; in, It is the distance between autonomous vehicle i and manually driven vehicle j along the road direction. It is the speed difference between i and j along the road direction. It is the traffic flow density of the sub-road segments near i and j. It is the stability factor of the current rate-limiting instruction being executed by i. It is a function of the overall control effect, which can be a weighted linear combination or a nonlinear function; S5.
2. Calculate the comprehensive control benefits under the speed limit control guidance capability index. This is set so that if the comprehensive guidance value of the guided manually driven vehicle is lower than a preset threshold during the speed limit guidance process. If the vehicle fails to receive effective guidance, it is considered that the vehicle has not received effective guidance. The calculation method is as follows: S5.2.
1. Calculate the effective guidance rate: Let the number of manually driven vehicles in all controlled road sections be denoted as . Let the overall guidance value of the j-th manually driven vehicle be... The effective guidance rate The expression is: ; in, This is an indicator function that takes the value 1 when the condition is true and 0 otherwise. S5.2.
2. Calculate the comprehensive guidance value: Based on the premise that each manually driven vehicle may be simultaneously guided by multiple autonomous vehicles, let the set of autonomous vehicles be . ,vehicle Guiding capability for manually driven vehicle j The effect weight is Then the comprehensive guidance value of vehicle j The calculation formula is as follows: ; in, It is set based on factors such as distance decay function, relative speed, or sensing range; S5.2.
3. Final Evaluation Value of the Plan: To comprehensively measure the advantages and disadvantages of different implementation plans, a comprehensive management and control benefit index is introduced. The result is obtained by weighting the effective guidance rate and the average comprehensive guidance value: ; in, and The weighting parameter is used to set the bias target according to the system design requirements.