[0062] In order to make the objects, technical solutions and advantages of the present invention, the present invention will be described in further detail below with reference to the embodiments. It is to be understood that the specific embodiments described herein are intended to explain the present invention and is not intended to limit the invention.
[0063] For problems in the prior art, the present invention provides a method, method, medium, apparatus, and application of the transport congestion driver control system, a method, medium, apparatus, and application, which will be described in detail below with reference to the accompanying drawings.
[0064] Such as figure 1 As shown, the method of mitigating the traffic congestion driver controlling method of the embodiment of the present invention includes the following steps:
[0065] S101, the environmental model and the construction of the enhanced learning model respectively;
[0066] S102, using the set of concentrated but dispersion, each target vehicle node makes a decision at each time, to achieve the same given goals as all nodes, that is, order to solve traffic congestion;
[0067]The S103, the communication and information propagation between the nodes is modeled in the model neural network GNN, and the decision processor uses Deep Q Learning, the formation of decision information is issued to the driver in each environment in the form of suggestion instructions.
[0068] Such as figure 2 As shown, the embodiment of the present invention provides a mitigation traffic congestion driver control system, including:
[0069] Environment Model Building Module 1, used to solve the environment model into local and global two layers according to the spatial position and relative relationship of the vehicle, and define the modeling of the environment as the information topology;
[0070] Strengthen learning model build module 2 for use node node characteristics X t Enter the full connection FCN layer, the output of the FCN and the associated matrix A t At the same time, enter the nerve network GCN layer for parallel calculation, and the output fruit is used with the index matrix M. t Performing a screening of the vehicle node, and finally the output Q value is used for parameters evolution by Q network calculation.
[0071] Traffic congestion mitigation module 3, used to use centralized learning but dispersion, each target vehicle node makes a decision at every time, realizing the same given objective for all nodes, that is, an orderly, to solve traffic congestion question;
[0072] The decision information is established, and the communication and information propagation between the nodes is used to model the nerve network GNN. The decision processor uses Deep Q Learning, and the decision information formed by the formation is sent to each place. Driver in the environment.
[0073] The technical solutions of the present invention will be further described below in conjunction with specific embodiments.
[0074] The present invention provides a problem based on the reinforcing learning frame to alleviate traffic congestion driver control systems, solving problems that existing techniques cannot perform directness and fundamental solutions.
[0075] Since multiple vehicle individuals are present at the same time, multiple vehicle nodes are interacting with the environment, but also have strong interactions between the nodes, but the entire inclusion can be summarized as a Malk. Game, such as image 3 Indicated.
[0076] The method of mitigation traffic congestion based on the reinforced learning algorithm proposed by the present invention is in the environment. Figure 4 As shown, according to the spatial position and relative relationship of the vehicle, the model is further decomposed into local and global two layers: the local network is a "star" map, including all other vehicle individuals around the target vehicle, and all other vehicles around; All vehicle individuals in the current environment consist. The target vehicle acquires local information from other vehicles from the vicinity of the vehicle sensor, and the global information is acquired from the vehicles in other environments via the cloud connection channel. The modeling of the environment can be defined as the information topology within this method.
[0077] In a local "star" network, information is passed from surrounding vehicles to the target vehicle because the target vehicle has sensing functions. From a global network, all target vehicles can share local perceptual information of other vehicles.
[0078] The method of mitigating traffic congestion based on an intensive learning algorithm proposed by the present invention adopts a concentrated study but scattered execution. In this setting, each target vehicle node must do a decision at every moment, the goal is to achieve the same given goal for all nodes - ie, to solve traffic congestion problems. Communication and information dissemination between nodes is modeled in the model neural network (GNN), and the decision processor uses Deep Q Learning, and finally formed decision information is issued to each environments under the form of suggestion instructions.
[0079] Strengthen learning model structure
[0080] At each time step t, n other vehicles around the target vehicle can be detected, so the input of the model space corresponding to each time step T is set to state S, which is a three information module consisting of three information modules. Membrane, including: node characteristic X t , Associated matrix A t , Record the index matrix M of the vehicle t Node characteristics X t Speed V i , Vertical position P i , Horizontal lane position L i And driving intentions i i , Associated matrix A t Indicates the interaction between the target vehicle and its surrounding vehicles, the index matrix M t Used to filter the target vehicle from all nodes.
[0081] Overall model structure Figure 5 As shown, first, the node node characteristic X t Enter the full connection (FCN) layer, the output of the FCN and the associated matrix A t At the same time, enter the diagram Neural Network (GCN) layer for parallel calculations, and output the output and index matrix M. t Performing a screening of the vehicle node for screening, and finally the Q network calculation output Q value is used for parameters evolution iteration.
[0082] In order to ensure the stability of model training, the vehicle node can fully explore the environment, and set T time step as a "warm-up phase" before the official training begins, which helps system guarantee the security of decisions. Starting from T + 1 Time, the model is trained according to the formerization of rewards and minimization of loss.
[0083] The actual application process of the method of mitigating the intensive traffic congestion based on the reinforced learning algorithm is emitted. Image 6 As shown, the preliminary processing of the local information obtained by the perceived system and the global information can be used to obtain the data type of the network input requirements, and the data tuple is input to the training mature network to obtain the global optimal decision output, and then pass the decision results The driver recommends that the system is issued to the actual operator of each vehicle node, and the actual operator completes the last vehicle driving control task, thereby effectively solving traffic congestion.
[0084] The embodiment of the present invention includes the process of modeling the relative relationship between the vehicle nodes in the environment in the form of a data acquisition, thereby forming a process of introducing a relationship between the relationship between the associated matrix on the gaming relationship between nodes.
[0085] Embodiments of the present invention include, but are not limited to, implementation of subsequent model learning processes using DQN, and can also use convolutional neural network (CNN), deep confidence network (DBN), limited Bolzmann machine (RBM), recursive neural network (RNN & LSTM & GRU), recursive tension neural network (RNTN), automatic encoder (AutoEncoder), generating a network of network (GaN) and other forms of functionality.
[0086] The optimal driving behavior of a single node formed in the present invention, including, but not limited to, by the driver's suggestion system, can also be transmitted to the vehicle driver by integration in mobile software Excellent command. For cars with automatic driving capabilities of L2 and above, instructions can be performed by the driver assistance system in a steering wheel and foot pedal feedback. For complete self-driving unmanned vehicle platforms, it can directly pass the information transfer between the vehicle node and the terminal directly to achieve the longitudinal acceleration of the vehicle and the cross-swing angle acceleration.
[0087] In order to demonstrate the planning decision-making capability of the driver recommended by the inventive learning frame to alleviate traffic congestion, the SUMO simulation platform can be modeled to the highway fork, using the existing rules-based planning decision-making method. Comparison of global rewards with the method (short-handed GCQ) proposed by the present invention, as shown in Table 1.
[0088] Table 1
[0089]
[0090] It can be found that as the number of vehicles in the environment (VEH / SEC) increases, the global reward value obtained by the GCQ algorithm proposed by the present invention is measured at an average value (MEAN), an intermediate value (Median), standard deviation (STD). The average is greatly better than rule-based algorithms.
[0091] In the above embodiment, it can be achieved through software, hardware, firmware, or any combination thereof in whole or in part. When fully or partially, the computer program product includes one or more computer instructions. When loading or executing the computer program instructions on a computer, all or partially generate the flow or function described in accordance with the embodiment of the present invention. The computer can be a general purpose computer, a dedicated computer, a computer network, or another programmable device. The computer instruction can be stored in a computer readable storage medium, or from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from one website site, computer, server or data center. Transfer by wired (such as contact with shaft cable, fiber, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.) to another). The computer readable storage medium can be any available medium that the computer can access or a data storage device such as a server, data center integrated with one or more available media. The usable medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape, a photoreaterial (e.g., a DVD), or a semiconductor medium (e.g., a solid state hard disk SOLID State Disk, etc..
[0092] As described above, only the embodiments of the present invention are described herein, but the scope of the invention is not limited thereto, and any techniques, those skilled in the art, and the spirit and principles of the present invention Any modification, equivalent replacement and improvement, etc., should be covered within the scope of the invention.