A vehicle passing control method, system and device for a signal-free intersection
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHANGAN UNIV
- Filing Date
- 2024-08-14
- Publication Date
- 2026-06-19
Smart Images

Figure CN119068660B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of traffic control technology for unsignalized intersections, specifically to a method, system, and device for controlling vehicle traffic at unsignalized intersections. Background Technology
[0002] Unsignaled intersections are a crucial component of urban transportation systems, and their efficiency directly impacts the overall traffic flow of the city. However, traditional management methods for unsignaled intersections often rely on traffic control facilities or fixed traffic rules. Although numerous researchers have optimized vehicle traffic at unsignaled intersections based on traffic rules, proposing a series of rules and protocols superior to signal control methods, these rules and protocols struggle to adapt to complex and ever-changing traffic environments and improve intersection efficiency. To optimize traffic performance, researchers have begun to establish optimization models and solve for optimal solutions to obtain control schemes for unsignaled intersections. However, with the increase in the number of vehicles to be planned, the number of effective traffic sequence combinations explodes, leading to a significant increase in computational complexity and making optimization models difficult to solve, failing to meet the timeliness requirements of real-time traffic control problems. Deep reinforcement learning (DRL), as an artificial intelligence method combining deep learning and reinforcement learning, possesses powerful perception and decision-making capabilities. In the DRL framework, each vehicle is treated as an independent intelligent agent, continuously learning through trial and error to adapt to changing environments and handle complex decision-making tasks. Designing a traffic strategy that can adapt to changes in the traffic environment of unsignalized intersections using DRL has become a current research focus. However, this method requires a large amount of data and computational resources for training. As the number of agents increases, the training time and difficulty also increase significantly, which directly affects the performance of the model.
[0003] Vehicle platooning, by grouping similar vehicles together, aims to reduce communication overhead, improve traffic efficiency, and reduce energy consumption. Furthermore, platooning allows multiple vehicles to be controlled as a single vehicle, thereby reducing control overhead. Applying vehicle platooning to multi-agent deep reinforcement learning (MADRL) algorithms can reduce the number of agents, decrease training difficulty, and optimize training performance. In existing research on vehicle platooning at unsignalized intersections, most studies tend to focus on fixed platooning, meaning that once formed, the platoon maintains a fixed size and coordinates its movement as a whole throughout the journey.
[0004] While this platooning approach is stable and reliable, in dynamically changing traffic environments, the algorithm may result in unreasonable platoon sizes, causing unnecessary deceleration or stopping, thus affecting traffic efficiency. A few studies utilize DRL-based adaptive platooning to determine the optimal platoon size, but these studies employ inefficient rules or optimization methods for platoon coordination. Other studies have introduced Deep Q-Networks (DQNs) to achieve adaptive vehicle platooning and efficient platoon coordination. The platoon coordination actions generated by the DQN algorithm in adaptive platooning are discrete, while platoon driving actions are continuous. Discretizing DQN for a continuous action space leads to decreased control accuracy, affecting traffic safety and efficiency, and also neglecting performance aspects related to driving comfort. Summary of the Invention
[0005] To address the shortcomings of existing technologies in dynamically changing traffic environments, which may lead to unreasonable platoon sizes, unnecessary deceleration or stopping, and reduced traffic efficiency, this invention proposes a vehicle traffic control method, system, and device for unsignalized intersections. By employing the MADDPG algorithm, which is applicable to continuous motion space problems, and aiming at safety, traffic efficiency, and driving comfort, this invention performs adaptive vehicle platooning and platoon coordination control, thereby solving the problems existing in the prior art.
[0006] A method for controlling vehicle traffic at an unsignalized intersection includes the following steps:
[0007] The current observation status of the target vehicle fleet and the decision reference vehicle fleet in an unsignalized intersection environment is obtained; the observation status includes the position, size, acceleration, and lane position of the target vehicle fleet and the decision reference vehicle fleet; the decision reference vehicle fleet refers to the vehicle fleet that has a conflicting relationship with the target vehicle fleet in the same lane.
[0008] The current observation states of the target convoy and the decision reference convoy are input into the actor network of the MADDPG algorithm to generate the action to be performed by the target convoy at the next moment. The action includes adjusting the size of the target convoy and the acceleration of the lead vehicle of the target convoy. Based on the adjusted size of the target convoy and the acceleration of its lead vehicle, the car-following model IDM is used to generate the acceleration of the vehicles in the target convoy that follow the lead vehicle.
[0009] The current observation states of the target convoy and the decision reference convoy, as well as the action to be performed by the target convoy in the next moment, are input into the critic network to obtain the estimated Q value. The action to be performed by the target convoy in the next moment is evaluated based on the Q value. The evaluation result is fed back to the actor network and combined with the total reward to continuously iterate and optimize the action to be performed by the target convoy in the next moment, generating a strategy to adjust the optimal convoy size and the optimal acceleration of the lead car of the target convoy.
[0010] The strategy of adjusting the optimal fleet size and optimal acceleration of the lead vehicle in the target fleet, as well as the acceleration of vehicles following the lead vehicle, is used to coordinate and control the target fleet in the unsignalized intersection environment. Within one control cycle, all fleets in the unsignalized intersection environment will sequentially become the target fleet and perform actions to complete the passage control of all fleets in the unsignalized intersection.
[0011] Furthermore, the acquisition of the current observation states of the target convoy and the decision reference convoy in the unsignalized intersection environment includes observation states for generating the target convoy size and observation states for generating the acceleration of the lead vehicle in the target convoy; the observation states for generating the target convoy size include:
[0012] The target vehicle's state at time t is
[0013] The decision reference vehicle status is represented as follows:
[0014]
[0015] in, For the position of team i, For the size of team i, Let i be the speed of the lead car in the convoy. The lane number where vehicle i is located. Let i be the position of the nth decision reference convoy around convoy i. Let the size of the nth decision reference convoy around convoy i be denoted as convoy i. Let the speed of the lead car in the nth decision reference vehicle around vehicle i be taken as an example. Let n be the lane number of the nth decision reference convoy surrounding convoy i;
[0016] The observation state used to generate the acceleration of the lead vehicle in the target convoy. include:
[0017] The state of the target vehicle at time t is represented as follows:
[0018] The decision reference vehicle status is represented as follows:
[0019]
[0020] in, Acceleration of the lead car in convoy i Let the acceleration of the lead car in the nth decision reference convoy around convoy i be given.
[0021] Furthermore, the total reward includes the total reward for generating the optimal fleet size strategy with fairness, traffic efficiency, and safety as optimization objectives, and the total reward for generating the optimal fleet leader acceleration strategy with safety, traffic efficiency, and driving comfort as optimization objectives.
[0022] Furthermore, the total reward for generating the optimal fleet size strategy is determined with fairness, traffic efficiency, and safety as optimization objectives. Represented as:
[0023]
[0024] Where, ω pf Indicates fair reward The weight, ω pe Indicates traffic efficiency reward The weight, ω ps Indicates safety reward The weighting; the fairness reward is measured by the time each team stays in the control area; the longer a team stays in the control area, the smaller the fairness reward. Represented as:
[0025]
[0026] in: Let i be the total time step in which the first car in the i-th convoy stops. For the size of the i-th convoy, w thr Here, c represents the waiting time threshold, and c represents the control... The upper limit parameter; This represents the duration of the i-th convoy's stay in the controlled area;
[0027] The efficiency bonus is represented by the average speed of the fleet. With maximum speed v M The closer the speed is to the maximum speed, the higher the traffic efficiency, and the greater the traffic efficiency reward fed back to the convoy. Represented as:
[0028]
[0029] The safety bonus is expressed as the ratio of the distance between convoys to their corresponding safe distance. The smaller the spatial or temporal distance, the higher the risk of a collision between convoys, and the smaller the safety bonus. Represented as:
[0030]
[0031] Among them, S d (t) and S t(t) represents the spatial and temporal distances between the two convoys, and d represents the distance between the two conv p_thr and t p_thr These are the spatial and temporal safe distance thresholds for the two vehicle teams, respectively.
[0032] Furthermore, the total reward for generating the optimal platoon leader acceleration strategy is determined with safety, traffic efficiency, and driving comfort as optimization objectives. Represented as:
[0033]
[0034] If the convoy travels normally, the total reward The total reward is a weighted sum of the rewards for each objective; if the convoy collides, the total reward will be reduced. -10;
[0035] The security reward r s (t), Traffic efficiency reward r e (t) and driving comfort reward r j (t) are respectively represented as:
[0036]
[0037] Among them, S d (t) and S t (t) represents the spatial and temporal distances between the two convoys, respectively. (x i (t),y i (t) represents the position of vehicle i at time t, (x) i-1 (t),y i-1 (t) represents the position of team i-1 at time t. l i-1 Let l be the length of the i-1 convoy, and let d be the length of the convoy. last -d first Calculations show that d last Indicates the distance d between the last car in the convoy and the stop line. first Indicates the distance between the lead car in the convoy and the stop line; (0,d) p_thr ) represents the spatial distance range where there is a risk of collision, d p_thr =D thr +l i-1 D thr For safe distance threshold, v i (t) represents the speed of convoy i at time t, where t is the velocity of convoy i. r The delay time includes the driver's reaction time and the system's reaction time during braking. max The maximum braking acceleration; (0,t) p_thr() indicates the time and distance range where a collision risk exists. v M For the maximum speed limit at the intersection, v m Minimum speed limit at intersection; a t Let a be the acceleration at the current moment. t-1 Let Δa be the acceleration at the previous moment. max This represents the maximum value of the change in acceleration.
[0038] The criteria for determining whether a collision has occurred in the convoy are as follows:
[0039]
[0040] Among them, D p (t) represents the distance between teams p1 and p2 at time t; (x) p1 (t),y p1 (t)) and (x p2 (t),y p2 (t) represents the positions of convoys p1 and p2 at time t, and l1 and l2 are the lengths of convoys p1 and p2, respectively; d gap Indicates the collision gap.
[0041] Furthermore, the actions include adjusting the size of the target convoy and the acceleration of the lead vehicle in the target convoy; the action space for the size of the target convoy is represented as follows:
[0042]
[0043] Team i's actions Represented as:
[0044]
[0045] Where, θ l The actor network weights represent the size of the generated fleet. Indicates the state The action value is calculated based on strategy π.
[0046] The motion space of the acceleration of the lead vehicle in the target convoy is expressed as: The actions of team i at time step t Represented as:
[0047]
[0048] Where π represents the actor network control strategy; θ i This represents the weights of the actor network.
[0049] Furthermore, based on the adjusted target convoy size and the acceleration of its lead vehicle, the car-following model IDM is used to generate the acceleration of the vehicles following the lead vehicle in the target convoy. Represented as:
[0050]
[0051] in, Let be the expected following distance of vehicle j in vehicle i; s0 is the minimum safe following distance; v j Let be the current speed of the j-th vehicle in convoy i; T be the safe following distance; Δvx be the speed difference between the j-th vehicle in convoy i and the vehicle in front; a max The maximum acceleration; a c For comfortable deceleration; δ is the sensitivity parameter for acceleration; Let vj be the following acceleration of the j-th car in convoy i; v0 be the initial velocity of the j-th car in convoy i; Δs j Let be the distance between the j-th vehicle and the vehicle in front.
[0052] Furthermore, the coordinated control of the target convoy in the unsignalized intersection environment based on the strategy of adjusting the optimal convoy size and the optimal acceleration of the lead vehicle, as well as the acceleration of vehicles following the lead vehicle, specifically includes the following steps:
[0053] If the target team If the size of the following convoy is less than the optimal convoy size, then the convoy behind it will be moved to the next convoy. Add vehicle information to the target fleet and update it simultaneously. and Information;
[0054] If the target team If the size of the following convoy is less than the optimal convoy size, then the convoy behind it will be moved to the next convoy. Add vehicle information to the target fleet and update it simultaneously. and Information;
[0055] like If the size is greater than the optimal fleet size, then... The excess vehicles were separated and reorganized into a new fleet;
[0056] The passage of each vehicle in the convoy is controlled by generating the acceleration of the target vehicle and the acceleration of the following vehicles, which is represented as follows:
[0057]
[0058] in, This represents the position of team i at time t. This represents the position of vehicle i at time t+Δt, where Δt is the time of one control cycle. Let represent the acceleration of the i-th convoy at time t. This represents the speed of vehicle i at time t. Let represent the velocity of vehicle i at time t+Δt.
[0059] The present invention also includes a vehicle traffic control system for unsignalized intersections, comprising:
[0060] The acquisition module is used to acquire the current observation status of the target vehicle fleet and the decision reference vehicle fleet in an unsignalized intersection environment; the observation status includes the position, size, acceleration, and lane position of the target vehicle fleet and the decision reference vehicle fleet; the decision reference vehicle fleet refers to the vehicle fleet that has a conflicting relationship with the lane position of the target vehicle fleet.
[0061] The action calculation module is used to input the current observation state of the target convoy and the decision reference convoy into the actor network of the MA DDPG algorithm to generate the action to be performed by the target convoy at the next moment; wherein, the action includes adjusting the size of the target convoy and the acceleration of the lead vehicle of the target convoy; based on the adjusted size of the target convoy and the acceleration of its lead vehicle, the car-following model IDM is used to generate the acceleration of the vehicles in the target convoy that follow the lead vehicle.
[0062] The optimization module is used to input the current observation state of the target convoy and the decision reference convoy, as well as the action to be performed by the target convoy in the next moment, into the critic network to obtain the estimated Q value; the action to be performed by the target convoy in the next moment is evaluated based on the Q value, and the evaluation result is fed back to the actor network and combined with the total reward to continuously iterate and update the action to be performed by the target convoy in the next moment, generating the optimal convoy size and the optimal acceleration of the lead car of the target convoy;
[0063] The control module is used to coordinate and control the target convoy in an unsignalized intersection environment based on the strategy of adjusting the optimal convoy size and the optimal acceleration of the lead vehicle, as well as the acceleration of vehicles following the lead vehicle. Within one control cycle, all convoys in the unsignalized intersection environment will sequentially become the target convoy and perform actions to complete the passage control of all convoys in the unsignalized intersection.
[0064] The present invention also includes a computer device for vehicle traffic control at an unsignalized intersection, comprising: a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the vehicle traffic control method at an unsignalized intersection.
[0065] This invention provides a method, system, and device for vehicle traffic control at unsignalized intersections, which have the following advantages:
[0066] This invention employs the MADDPG algorithm, suitable for continuous action space problems, to obtain the size of the target platoon and the acceleration of the lead vehicle in the actor network. It then obtains Q-values by inputting the observed states and actions of the target platoon and the decision reference platoon into the critic network. Based on the Q-values and the total reward, it evaluates the optimal platoon size and the optimal lead vehicle acceleration strategy. Adaptive vehicle formation and platoon coordination control are then performed based on this strategy and the accelerations of following vehicles within the platoon. This method can generate reasonable platoon sizes in dynamically changing traffic environments, control the passage of all platoons at unsignalized intersections, avoid unnecessary deceleration or stopping, and improve the traffic efficiency at unsignalized intersections. Attached Figure Description
[0067] Figure 1 This is a flowchart of the vehicle traffic control method at an unsignalized intersection in an embodiment of the present invention;
[0068] Figure 2 This is a schematic diagram of the environment interaction mechanism and network structure in an embodiment of the present invention;
[0069] Figure 3 This is a flowchart of the MADDPG algorithm in an embodiment of the present invention. Detailed Implementation
[0070] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments.
[0071] This invention employs the MADDPG algorithm, applicable to continuous action space problems, to perform adaptive vehicle platooning with fairness, traffic efficiency, and safety as optimization objectives, and to perform platoon coordination control with safety, traffic efficiency, and driving comfort as optimization objectives. Figure 1 As shown, the specific steps include:
[0072] S1. Using MADDPG, the optimal size N of each vehicle fleet is generated based on the state obtained from the current unsignalized intersection environment (initially, each vehicle is a vehicle fleet). MADDPG mainly includes elements such as state, action, multi-objective reward function, and network.
[0073] The state is composed of key information such as the convoy's position (p), size (N), speed (v), and lane number. This information is then processed by the decision-making agent. i When taking action at time t, it is also necessary to consider the impact of the conflict lane on the agent. iThe algorithm takes the state information of other agents involved in the decision-making process as input. By using the state information of the target convoy and n reference convoys as input, the algorithm can calculate and understand the environmental requirements for convoy size, providing a basis for generating decision actions. In the MADDPG algorithm for adaptive convoy formation (generating the target convoy size), the input state... Let the target convoy and its surrounding n decision reference convoys be represented as follows: The state of the target convoy at time t is denoted as: The surrounding n decision reference fleet states that influence the size of the fleet are represented as follows:
[0074]
[0075] in, For the position of team i, For the size of team i, Let i be the speed of the lead car in the convoy. For the lane where vehicle i is located, Let n be the position of the nth convoy surrounding convoy i. Let n be the size of the nth convoy surrounding convoy i. Let be the speed of the lead car in the nth convoy surrounding convoy i. Let i be the lane number of the nth convoy surrounding convoy i.
[0076] In the MADDPG algorithm for adaptive formation, the action represents the size of the target platoon, calculated by the actor network based on the current state. Within time step t, the actions of n+1 platoons (platoon i and its n surrounding decision reference platoons) in the current environment need to be calculated. The action space is represented as follows: Team i's actions The calculation is as follows:
[0077]
[0078] Where θl represents the actor network weights that generate the fleet size. Indicates the state The action value is calculated based on strategy π.
[0079] In the MADDPG algorithm for adaptive formation, the multi-objective reward comprehensively considers three objectives: fairness, efficiency, and safety. The fairness reward aims to ensure that all convoys have an equal opportunity to pass through intersections, measured by the time each convoy spends in the controlled area. The longer a convoy spends in the controlled area, the smaller its fairness reward. The calculation is as follows:
[0080]
[0081] in: Let Δt be the total time step of the first vehicle in the i-th convoy, and Δt be the control period. For the size of the i-th convoy, w thr Here, c represents the waiting time threshold, and c represents the control... The upper limit parameter.
[0082] Efficiency rewards aim to enable the algorithm's learned strategy to generate convoy sizes that improve traffic efficiency, denoted as the average convoy speed. With maximum speed v M The closer the speed is to the maximum speed, the higher the traffic efficiency, and the greater the traffic efficiency bonus returned to the convoy. Traffic efficiency bonus The calculation is as follows:
[0083]
[0084] The safety reward aims to enable the algorithm to learn a strategy that generates convoy sizes that reduce collision risk. It is expressed as the ratio of the distance between convoys to their corresponding safe distances. Smaller spatial or temporal distances indicate a higher risk of collisions between convoys, and therefore a smaller safety reward should be. The calculation is as follows:
[0085]
[0086] Among them, S d (t) and S t (t) represents the spatial and temporal distances between the two convoys, and d represents the distance between the two conv p_thr and t p_thr These are the spatial and temporal safe distance thresholds for the two vehicle teams, respectively.
[0087] To simultaneously optimize the three objectives of fairness, efficiency, and safety, the total reward of the adaptive formation algorithm is calculated as follows:
[0088]
[0089] The weights corresponding to each adaptive formation objective are as follows: fairness weight ω pf Efficiency weight ω pe Safety weight ω ps .
[0090] In the MADDPG algorithm, a fleet of vehicles is treated as an agent, and each agent has an action network (actor network) and a value function network (critic network). For example... Figure 1 As shown, the actor network represents the local state observed by the agent in the environment. or The critic network takes the global state observed in the environment as input, calculates the corresponding action, and outputs the action as the network output. or The Q-value is estimated by taking the input as input, and the executed action is evaluated by the Temporal Difference Error (TD-error) method based on the value. At the same time, the result is fed back to the actor network to guide the actor network to optimize its policy using gradient descent.
[0091] Both actor and critic networks contain an input layer and three fully connected layers (e.g., ...). Figure 2 (As shown). The specific construction order of the actor network is as follows:
[0092] (1) The input layer receives input data (state information of n+1 agents) and performs layer normalization preprocessing.
[0093] (2) Fully connected layer (64 neurons), layer normalization, ReLU activation function introduces nonlinear characteristics.
[0094] (3) Fully connected layer (64 neurons), layer normalization, ReLU activation function introduces nonlinear characteristics.
[0095] (4) Fully connected layer (1 neuron), processed by Tanh activation function to obtain output action value.
[0096] The critic network differs in its structure in introducing the action space and processing the final output.
[0097] The specific construction order of the critic network is as follows:
[0098] (1) The input layer receives input data (state information of n+1 agents) and performs layer normalization preprocessing.
[0099] (2) Fully connected layer (64 neurons), layer normalization, ReLU activation function introduces nonlinear characteristics.
[0100] (3) Introducing motion space information (a t This enhances the network's ability to assess the value of actions.
[0101] (4) Fully connected layer (64 neurons), layer normalization, ReLU activation function introduces non-linear characteristics.
[0102] (5) Fully connected layer (1 neuron) directly obtains the output Q value.
[0103] S2. Using MADDPG, generate the optimal acceleration 'a' for the lead vehicle of each target platoon based on the current unsignalized intersection environment. Use the car-following model IDM to generate the accelerations of following vehicles in the platoon.
[0104] The MADDPG model for vehicle coordination also includes elements such as states, actions, multi-objective reward functions, and networks.
[0105] Each agent's state information represents the information of the lead vehicle in the convoy; its position is the center position of the convoy; and its speed is the speed of the lead vehicle. In the MADDPG algorithm for generating the acceleration of the target convoy's lead vehicle, the input state... Let the target convoy and its surrounding n decision reference convoys be represented as follows: The state of the target convoy at time t is: The states of the n decision-making reference teams are represented as follows:
[0106]
[0107] in, Let the acceleration of the lead car in the i-th convoy be... Let the acceleration of the lead car in the nth decision reference convoy around convoy i be given.
[0108] The coordinated actions of the convoy output acceleration. Within time step t, the accelerations of n+1 agents in the environment need to be determined. The action space can be represented as... Actions of agent i at time step t The calculation is as follows:
[0109]
[0110] Where π represents the actor network control strategy, and θ i This represents the weights of the actor network. The status of the target team and the team used as a reference for decision-making.
[0111] MADDPG generates the acceleration of the lead car in the convoy. The other vehicles in the convoy follow the lead car, and their acceleration is calculated as follows:
[0112]
[0113] in, Let be the expected following distance of vehicle j in vehicle i; s0 is the minimum safe following distance; v j Let be the current speed of the j-th vehicle in convoy i; T be the safe following distance; Δv j Let a be the speed difference between the j-th car in convoy i and the car in front; max The maximum acceleration; a c For comfortable deceleration; δ is the sensitivity parameter for acceleration; Let vj be the following acceleration of the j-th car in convoy i; v0 be the initial velocity of the j-th car in convoy i; Δs j Let be the distance between the j-th vehicle and the vehicle in front.
[0114] The fleet coordination took into account the safety bonus. s (t), Traffic efficiency reward r e (t) and driving comfort reward r j (t).
[0115]
[0116] Among them, S d (t) and S t (t) represents the spatial and temporal distances between the two convoys, respectively. (x i (t),y i (t) represents the position of vehicle i at time t, (x) i-1 (t),y i-1 (t) represents the position of team i-1 at time t. l i-1 Let l be the length of the i-1 convoy, and let d be the length of the convoy. last -d first Calculations show that d last Indicates the distance d between the last car in the convoy and the stop line. first Indicates the distance between the lead car in the convoy and the stop line; (0,d) p_thr ) represents the spatial distance range where there is a risk of collision, d p_thr =D thr +l i-1 D thr For safe distance threshold, v i (t) represents the speed of convoy i at time t, where t is the velocity of convoy i. r The delay time includes the driver's reaction time and the system's reaction time during braking. max The maximum braking acceleration; (0,t) p_thr () indicates the time and distance range where a collision risk exists. v M For the maximum speed limit at the intersection, v m Minimum speed limit at intersection; a t Let a be the acceleration at the current moment. t-1 Let Δa be the acceleration at the previous moment. max This represents the maximum value of the change in acceleration.
[0117] If the convoy travels normally, the total reward The total reward is a weighted sum of the rewards for each objective; if a vehicle collision occurs, the total reward will be reduced. The value is -10. The total reward is calculated as follows:
[0118]
[0119] The criteria for determining a vehicle collision are as follows:
[0120]
[0121] Where l1 and l2 are the lengths of the two teams, respectively; d gap This indicates the collision gap. If the collision conditions are met, it is determined that a collision has occurred between the two teams.
[0122] The environmental interaction mechanism and network of the fleet coordination Figure 2 The networks are the same.
[0123] S3, Adjust the convoy according to N.
[0124] actor network based on target fleet After determining the optimal fleet size based on the current state, it is necessary to adjust the fleet size according to the determined size. Adjustments need to be made. If... If the size of the following convoy is smaller than the size of the convoy used for the decision, then the following convoys need to be... Add vehicle information to the target fleet and update it simultaneously. and Information; if If the size of the fleet is larger than the size of the decision-making vehicle, then... The extra vehicles were separated and reassembled into a new fleet.
[0125] S4. Control the convoy passage according to the acceleration generated by MADDPG and IDM.
[0126] The formula for controlling the vehicle's forward movement is as follows:
[0127]
[0128] S5. Update the environment of the intersection without signal, update the status information and rewards.
[0129] S6. Evaluation and Network Update: Each agent has an actor-critic network that generates a fleet of vehicles and an actor-critic network that generates accelerations. In evaluating value using the value function network, not only the actions of the target agent are considered, but also the actions of the decision-making reference agent are integrated to comprehensively assess the cooperation among the agents. The objective function of the MADDPG algorithm is as follows:
[0130]
[0131] In the formula: J(θ)i ) represents the objective function of agent i; γ t This is a discount factor used to weigh the importance of short-term and long-term rewards, and its value ranges from 0 to 1. This represents the reward that agent i receives at time step t (if it is the network of optimal size). for If it's a network that generates optimal acceleration, the input... Value ); n represents the total number of decision-making reference agents; This represents the policy function for each agent.
[0132] This invention utilizes Python 3.6 to build a simulation scenario of a two-way six-lane unsignalized intersection, conducts simulation experiments on the proposed unsignalized vehicle traffic control method, and compares the performance of different traffic control methods in terms of safety, traffic efficiency, driving comfort, and training effect under the same simulation conditions.
[0133] (1) Simulation experiment setup.
[0134] The experiment of this invention includes two phases: training and testing. The purpose of the training phase is to obtain the optimal control strategy, and the testing phase introduces the following three methods for comparative analysis:
[0135] ① Uncontrolled Vehicle Passage (UVP) Method: When a vehicle approaches the stop line at the intersection, a 3-second stop observation time is set. For vehicles traveling in the intersection control area, only the IDM model is used to control the vehicle's movement, without using any guidance strategy.
[0136] ②CoMADDPG method: A single-vehicle control method based on cooperative MADDPG, which uses a single vehicle as the control object to control the passage of each vehicle.
[0137] ③ Fixed-condition platooning algorithm-based fleet control (PC-FCPA) method: The fixed-condition platooning algorithm is used for platooning, and the fleet size remains unchanged during the platooning process. This strategy uses a fixed fleet as the control object to control the platooning passage.
[0138] (2) Experimental data.
[0139] The data used for training and testing in this invention are all traffic volume (number of vehicles / hour / lane) data generated in a simulation environment. The traffic volume data used for testing includes multiple different levels, with data values from low to high as follows: 200veh / hr / lane, 400veh / hr / lane, 600veh / hr / lane, 800veh / hr / lane, 1000veh / hr / lane, and 1200veh / hr / lane.
[0140] (3) Experimental results.
[0141] The test results are shown in Table 1. As can be seen from Table 1, the traffic control method proposed in this invention can guarantee a safe throughput rate of over 99%. Compared with UVP, CoMADDPG, and PC-FCPA strategies, the PC-APA strategy performs best in terms of traffic efficiency and driving comfort. Traffic efficiency is improved by 56.35%, 10.26%, and 5.55% respectively (average of the percentage increase for each traffic volume), and driving comfort is improved by 80.67%, 52.96%, and 23.93% respectively (average of the percentage increase for each traffic volume).
[0142] Table 1 Comparison of test results for each method
[0143]
[0144]
[0145] Based on the same inventive concept, this invention also proposes a vehicle traffic control system for unsignalized intersections, comprising:
[0146] The acquisition module is used to acquire the current observation status of the target vehicle fleet and the decision reference vehicle fleet in an unsignalized intersection environment. The observation status includes the position, size, acceleration, and lane position of the target vehicle fleet and the decision reference vehicle fleet. The decision reference vehicle fleet represents the vehicle fleet that has a conflicting relationship with the target vehicle fleet in the same lane.
[0147] The action calculation module is used to input the current observation state of the target convoy and the decision reference convoy into the actor network of the MA DDPG algorithm to generate the action to be performed by the target convoy in the next moment. The action includes adjusting the size of the target convoy and the acceleration of the lead vehicle of the target convoy. Based on the adjusted size of the target convoy and the acceleration of its lead vehicle, the car-following model IDM is used to generate the acceleration of the vehicles in the target convoy that follow the lead vehicle.
[0148] The optimization module is used to input the current observation state of the target convoy and the decision reference convoy, as well as the action to be performed by the target convoy in the next moment, into the critic network to obtain the estimated Q value. Based on the Q value, the action to be performed by the target convoy in the next moment is evaluated, and the evaluation result is fed back to the actor network and combined with the total reward to continuously iterate and optimize the action to be performed by the target convoy in the next moment, generating a strategy to adjust the optimal convoy size and the optimal acceleration of the lead vehicle of the target convoy.
[0149] The control module is used to coordinate and control the target convoy in an unsignalized intersection environment based on the strategy of adjusting the optimal convoy size and the optimal acceleration of the lead vehicle, as well as the acceleration of vehicles following the lead vehicle. Within one control cycle, all convoys in the unsignalized intersection environment will sequentially become the target convoy and perform actions to complete the passage control of all convoys in the unsignalized intersection.
[0150] Based on the same inventive concept, this invention also proposes a computer device for vehicle traffic control at unsignalized intersections, comprising: a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the vehicle traffic control method at unsignalized intersections.
[0151] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A method for controlling vehicle passage at an unsignalized intersection, characterized by, Includes the following steps: Acquire the current observation status of the target vehicle convoy and the decision reference vehicle convoy in an unsignalized intersection environment; The observed status includes the position, size, acceleration, and lane position of the target convoy and the decision reference convoy; The decision reference fleet refers to the fleet whose lane position conflicts with that of the target fleet; Input the current observation status of the target vehicle convoy and the decision reference vehicle convoy into the MADDPG algorithm. actor In the network, the action to be performed by the target convoy at the next moment is generated; wherein, the action includes adjusting the size of the target convoy and the acceleration of the lead vehicle of the target convoy; based on the adjusted size of the target convoy and the acceleration of the lead vehicle, the acceleration of the vehicles following the lead vehicle in the target convoy is generated using the car-following model IDM. Input the current observation status of the target vehicle and the decision reference vehicle, as well as the action the target vehicle will perform in the next moment. critic In the network, the estimated Q value is obtained; based on the Q value, the action to be taken by the target convoy at the next moment is evaluated, and the evaluation result is fed back to... actor The network, combined with the total reward, continuously iterates and updates to optimize the actions to be taken by the target fleet in the next moment, generating strategies to adjust the optimal fleet size and the optimal acceleration of the lead vehicle in the fleet. The strategy of adjusting the optimal fleet size and optimal acceleration of the lead vehicle in the target fleet, as well as the acceleration of vehicles following the lead vehicle, is used to coordinate and control the target fleet in the unsignalized intersection environment. Within one control cycle, all fleets in the unsignalized intersection environment will sequentially become the target fleet and perform actions to complete the passage control of all fleets in the unsignalized intersection.
2. The method for vehicle traffic control at an unsignalized intersection according to claim 1, characterized in that, The acquisition of the current observation states of the target convoy and the decision reference convoy in the unsignalized intersection environment includes observation states for generating the target convoy size and observation states for generating the acceleration of the lead vehicle in the target convoy; the observation states for generating the target convoy size include: The target convoy is The state at time is ; The decision reference vehicle status is represented as follows: in, For the team Location, For the team scale For the team The speed of the lead car For the team Lane number, For the team Surrounding The location of the vehicle team is a reference point for decision-making. For the team Surrounding The size of the fleet is a factor to consider in making decisions. For the team Surrounding One decision-making reference is the speed of the lead car in the convoy. For the team Surrounding The decision-making reference is the lane number where the convoy is located; The observation state used to generate the acceleration of the lead vehicle in the target convoy. include: The target convoy is The state at time t is represented as: ; The decision reference vehicle status is represented as follows: in, For the team The acceleration of the lead car, For the team Surrounding The decision-making reference is the acceleration of the lead car in the convoy.
3. The method for vehicle traffic control at an unsignalized intersection according to claim 1, characterized in that, The total reward includes the total reward for generating the optimal fleet size strategy with fairness, traffic efficiency, and safety as optimization objectives, and the total reward for generating the optimal fleet leader acceleration strategy with safety, traffic efficiency, and driving comfort as optimization objectives.
4. The method for vehicle traffic control at an unsignalized intersection according to claim 3, characterized in that, The total reward for generating the optimal fleet size strategy is determined with fairness, traffic efficiency, and safety as optimization objectives. Represented as: in, Indicates fair reward The weight, Indicates traffic efficiency reward The weight, Indicates safety reward The weighting; the fairness reward is measured by the time each team stays in the control area; the longer a team stays in the control area, the smaller the fairness reward. Represented as: in: For the first The total time the first car in each convoy stayed is... For the first The size of a team As the waiting time threshold, c To control The upper limit parameter; Indicates the first The duration of each convoy's stay in the controlled area; The efficiency bonus is represented by the average speed of the fleet. With maximum speed The closer the speed is to the maximum speed, the higher the traffic efficiency, and the greater the traffic efficiency reward fed back to the convoy. Represented as: The safety bonus is expressed as the ratio of the distance between convoys to their corresponding safe distance. The smaller the spatial or temporal distance, the higher the risk of a collision between convoys, and the smaller the safety bonus. Represented as: in, and The spatial and temporal distances between the two teams are respectively. and These are the spatial and temporal safe distance thresholds for the two vehicle teams, respectively.
5. A method for controlling vehicle traffic at an unsignalized intersection according to claim 3, characterized in that, The process aims to generate the total reward for the optimal platoon leader's acceleration strategy, with safety, traffic efficiency, and driving comfort as optimization objectives. Represented as: If the convoy travels normally, the total reward The total reward is a weighted sum of the rewards for each objective; if the convoy collides, the total reward will be reduced. -10; Security rewards Traffic efficiency rewards And driving comfort rewards They are represented as follows: in, and These represent the spatial and temporal distances between the two convoys, respectively. , Indicates the team exist Location at any given moment Indicates the team exist Location at any given moment , For the team Length, convoy length is determined by Calculations show that Indicates the distance between the last car in the convoy and the stop line. Indicates the distance between the lead car in the convoy and the stop line; Indicates the spatial distance range where a collision risk exists. , For safe distance threshold, , For the team exist The speed of time This is the delay time, including the driver's reaction time and the system's reaction time during braking. This is the maximum braking acceleration; This indicates the time and distance range where a collision risk exists. ; The maximum speed limit at the intersection Minimum speed limit at intersections; For the acceleration at the current moment, The acceleration of the previous moment. This represents the maximum value of the change in acceleration. The criteria for determining whether a collision has occurred in the convoy are as follows: in, For the team p 1 and the team p 2. The two teams are in The distance of time; and For the team p 1 and the team p 2 in Location at any given moment and Teams p 1 and the team p The length of 2; Indicates the collision gap.
6. A method for controlling vehicle traffic at an unsignalized intersection according to claim 2, characterized in that, The actions include adjusting the size of the target convoy and the acceleration of the lead vehicle in the target convoy; the action space for the size of the target convoy is represented as: Team action Represented as: in, Indicates the size of the generated fleet actor Network weights Indicates the state According to the strategy The calculated motion value; The motion space of the acceleration of the lead vehicle in the target convoy is expressed as: convoy At time step action Represented as: in, for actor Network control policies; express actor Network weights.
7. A method for controlling vehicle traffic at an unsignalized intersection according to claim 1, characterized in that, Based on the adjusted target convoy size and the acceleration of its lead vehicle, the car-following model (IDM) is used to generate the accelerations of the vehicles following the lead vehicle in the target convoy. Represented as: in, For the team middle The expected following distance of a vehicle; Minimum safe distance; For the team The Middle The vehicle's current speed; For safe following distance; For the team The Middle The speed difference between the vehicle and the vehicle in front; This is the maximum acceleration; Decelerate for comfort; Sensitivity parameters for acceleration; For the team The Middle The following acceleration of a vehicle; For the team The Middle The initial velocity of the vehicle; For the first The distance between the vehicle and the vehicle in front.
8. A method for controlling vehicle traffic at an unsignalized intersection according to claim 1, characterized in that, The coordinated control of the target convoy in an unsignalized intersection environment, based on the strategy of adjusting the optimal convoy size and the optimal acceleration of the lead vehicle, as well as the acceleration of vehicles following the lead vehicle, specifically includes the following steps: If the target team If the size of the following convoy is less than the optimal convoy size, then the convoy behind it will be moved to the next convoy. Add vehicle information to the target fleet and update it simultaneously. and Information; like If the size is greater than the optimal fleet size, then... The excess vehicles were separated and reorganized into a new fleet; The passage of each vehicle in the convoy is controlled by generating the acceleration of the target vehicle and the acceleration of the following vehicles, which is represented as follows: in, Indicates the team exist Location at any given moment Indicates the team exist Location at any given moment The time of one control cycle, Indicates the first A team in acceleration at any moment Indicates the team exist The speed of time Indicates the team exist The speed of time.
9. A vehicle traffic control system for an unsignalized intersection, characterized in that, include: The acquisition module is used to acquire the current observation status of the target vehicle convoy and the decision reference vehicle convoy in an unsignalized intersection environment; The observed status includes the position, size, acceleration, and lane position of the target convoy and the decision reference convoy; The decision reference fleet refers to the fleet whose lane position conflicts with that of the target fleet; The action calculation module is used to input the current observation states of the target convoy and the decision reference convoy into the MADDPG algorithm. actor In the network, the action to be performed by the target convoy at the next moment is generated; wherein, the action includes adjusting the size of the target convoy and the acceleration of the lead vehicle of the target convoy; based on the adjusted size of the target convoy and the acceleration of the lead vehicle, the acceleration of the vehicles following the lead vehicle in the target convoy is generated using the car-following model IDM. The optimization module is used to input the current observation status of the target vehicle and the decision reference vehicle, as well as the action to be performed by the target vehicle in the next moment. critic In the network, the estimated Q value is obtained; based on the Q value, the action to be taken by the target convoy at the next moment is evaluated, and the evaluation result is fed back to... actor The network, combined with the total reward, continuously iterates and updates to optimize the actions to be taken by the target fleet in the next moment, generating strategies to adjust the optimal fleet size and the optimal acceleration of the lead vehicle in the fleet. The control module is used to coordinate and control the target convoy in an unsignalized intersection environment based on the strategy of adjusting the optimal convoy size and the optimal acceleration of the lead vehicle, as well as the acceleration of vehicles following the lead vehicle. Within one control cycle, all convoys in the unsignalized intersection environment will sequentially become the target convoy and perform actions to complete the passage control of all convoys in the unsignalized intersection.
10. A computer device for vehicle traffic control at an unsignalized intersection, characterized in that, include: The memory, the processor, and the computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the unsignalized intersection vehicle traffic control method according to any one of claims 1-8.