Traffic organization scheme optimization method based on multi-signal lamp reinforcement learning

A technology of reinforcement learning and optimization methods, applied in traffic control systems of road vehicles, machine learning, traffic signal control, etc., can solve problems such as model convergence and speed instability, and achieve the effect of improving the smooth flow rate

Active Publication Date: 2021-11-09
CHENGDU UNIV OF INFORMATION TECH
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Decentralized communication is more practical and does not require centralized decision-ma...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Traffic organization scheme optimization method based on multi-signal lamp reinforcement learning
  • Traffic organization scheme optimization method based on multi-signal lamp reinforcement learning
  • Traffic organization scheme optimization method based on multi-signal lamp reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] This embodiment is a multi-intersection traffic organization plan optimization method based on multi-signal reinforced learning, using multi-agents, Actor-Critic network, Subnet network, and trajectory reconstruction to improve the traffic flow rate of the road network. A multi-agent environment is one in which there are multiple intelligent entities in each step, such as figure 1 Shown is the difference between multi-agent and single-agent environments.

[0063] First construct an Actor network. The traffic road network contains multiple intersections, and the signal lights at each intersection correspond to an agent. Multiple agents need to construct multiple corresponding Actor networks. The Actor network includes a state space set and a behavior space set.

[0064] Through the program in the traffic lights to change the state of the road, to achieve a certain sense of short-term road closure for traffic control. In this embodiment, proceeding from the actual situat...

Embodiment 2

[0072] This embodiment is a method for optimizing a traffic organization scheme based on reinforcement learning of multi-signal lights for a single intersection. The simulation platform used in this embodiment is SUMO. SUMO is an open source road simulator, which can meet the collection of relevant data required in the simulation experiment, as well as the simulation of traffic behavior and the required road network construction. The most important thing is to The timing data of traffic lights can be collected. The development IDE tool for writing code is Pycharm, and Tensorflow-gpu-1.4.0 version and Numpy are used to complete the relevant reinforcement learning and neural network construction. The above extensions need to be improved, and the second most important thing is to implement SUMO Traci Traffic control interface, Traci can help to expand the dynamic control of traffic lights, can call SUMO simulation tools, obtain individual vehicle information, and obtain detailed ...

Embodiment 9

[0075] In the experimental model of 9-grid multi-intersection in this embodiment, each rectangle represents a signalized intersection, and every two adjacent intersections are connected by two lanes.

[0076] In the setting of this embodiment, the following parameter settings need to be completed in the SUMO simulation software. In the 9-grid environment, a total of 7,000 vehicles enter the simulation system. The model sets the initial vehicles to 50 vehicles, and the shortest vehicle There are 2 driving paths, the longest vehicle driving path is 7, and the random seed parameter is set to 10.

[0077] After the experimental model is built, the action mode of each agent is constructed according to its own behavior mode. Under the original conditions, the total waiting time of cars in this environment is 24732 seconds. There are 21 pairs of OD pairs in this experimental traffic environment. In the original environment, the traffic volume in the lower right area of ​​the 9th grid...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a traffic organization scheme optimization method based on multi-signal lamp reinforcement learning, and belongs to the field of traffic signal lamp control. Firstly, an Actor network containing a state space set and a behavior space set is constructed, then an observed value is introduced, high-latitude information is compressed into low-latitude information through processing of a Subnet network, the behavior deflection probability is calculated, then initial state information, updated state information and the behavior deflection probability are introduced into a Critic network for centralized learning, finally, track reconstruction is carried out. In a multi-intersection traffic environment, multiple intelligent agents improve the road network unblocked rate by means of an Actor-Critic algorithm framework. Meanwhile, a method of centralized learning and distributed execution between intelligent agents is used, and the advantages of centralized learning and distributed execution are combined, so that the convergence speed of the algorithm is greatly improved.

Description

technical field [0001] The invention relates to the field of traffic signal lamp control, in particular to a traffic organization scheme optimization method based on multi-signal lamp reinforcement learning. Background technique [0002] In the era of technology and information technology, human life is becoming more and more abundant. Now most families have their own means of transportation - cars, which leads to various traffic problems in the city, such as long waiting time, Lane occupancy is too high, etc. With the development of artificial intelligence, many traffic intelligent technologies have emerged, which have begun to effectively control traffic behavior. Agent reinforcement learning is one of the current artificial intelligence development technologies. Currently, reinforcement learning is the mainstream of intelligent transportation technology, including algorithms such as Q-learning, Sarsa, and TD lambda. [0003] How to enable agents to learn efficiently in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G08G1/01G08G1/07G06F30/20G06N20/00
CPCG08G1/0125G08G1/07G06F30/20G06N20/00Y02T10/40
Inventor 郑皎凌吴昊昇王茂帆
Owner CHENGDU UNIV OF INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products