Multi-agent group cooperation strategy automatic generation method

An automatic generation and intelligent agent technology, applied in the field of artificial intelligence, can solve problems such as slow learning speed and difficult algorithm stability, and achieve the effect of improving training efficiency, improving generation and evaluation efficiency

Pending Publication Date: 2021-03-12
厦门渊亭信息科技有限公司
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The exploration of each agent may have an impact on the strategy of the companion agent, which will make the algorithm difficult to stabilize and slow to learn

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-agent group cooperation strategy automatic generation method
  • Multi-agent group cooperation strategy automatic generation method
  • Multi-agent group cooperation strategy automatic generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] The invention discloses a multi-agent group cooperation strategy automatic generation method of MADDPG (a multi-agent reinforcement learning framework based on a deep deterministic policy gradient algorithm), hereinafter referred to as the TTL-MADDPG algorithm, which is based on the MADDPG algorithm. Based on the original MADDPG algorithm, three major innovations are proposed: trace information, multi-agent cooperative teaming, and life-and-death training. The invention takes the MADDPG algorithm as the main body, adds the strategy network (actor network) of the MADDPG algorithm into the information trace, and changes it to a i =μ θi (O i ,x i )+N noise , where x i Represents the intelligent agent i The trace amount of information, the intelligent agent i The learning history in the environment will leave a trace of its own information in the environment. Through the trace of information, the agent can learn from other people's experience and avoid detours. In th...

Embodiment 2

[0071] Application of multi-agent group cooperative strategy automatic generation algorithm in traffic light control.

[0072] Take the traffic signal at each intersection as the intelligent body, expressed as agent i ;

[0073] Input: the collection of multiple traffic signals Agents={agent 0 , agent 1 , agent 2 ,...,agent i}.

[0074] Input: Initialize each traffic signal agent i The policy network π i (o,θ πi ) and the evaluation network Q i (s,a 1 ,a 2 ^a N ,θ Qi ) and network parameters θ πi and θ Qi ; where o represents the real-time information of the traffic signal machine observing the traffic environment; strategy network π i Indicates the control strategy of the i-th traffic signal machine on the traffic lights each time, and evaluates the network Q i Indicates the evaluation of the i-th traffic signal machine on the control strategy of traffic lights, s indicates the status information of the traffic signal machine, a indicates the traffic control a...

Embodiment 3

[0086] The algorithm in the automatic generation method of the multi-agent group cooperation strategy adopted by the present invention is evaluated through a simulation test.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of artificial intelligence, and discloses a multi-agent group cooperation strategy automatic generation method, which defines agents and strategy networks thereof according to a specific application environment, evaluates the networks and experiences, and realizes automatic generation of a multi-agent cooperation strategy. The adopted algorithm provides three innovations on the basis of the MADDPG algorithm: trace information, multi-agent cooperative team formation and birth and death training. The learning history of the intelligent agent in the environmentcan leave a trace amount of information in the environment, and the user can learn the experience of other people through the trace amount of information intelligent agent to avoid walking; the training efficiency can be improved through cooperative team formation of the multiple intelligent agents; finally, the agents with excellent learning ability in the environment are inherited to all information of themselves through filial generations to continue to be trained through birth and death training, the agents with poor learning ability in the environment return to the initial point to be trained again through death, and the generation and evaluation efficiency of the multi-agent cooperation strategy can be greatly improved.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a method for automatically generating multi-agent group cooperation strategies. Background technique [0002] MADDPG is a multi-agent reinforcement learning framework based on the deep deterministic policy gradient algorithm, which can be used for the automatic generation of multi-agent cooperative strategies. [0003] In a multi-agent system, each agent learns to improve its strategy by interacting with the environment to obtain a reward value (reward), so that the process of obtaining the optimal strategy in the environment is multi-agent reinforcement learning. [0004] In single-agent reinforcement learning, the environment of the agent is stable, but in multi-agent reinforcement learning, the environment is complex and dynamic, which brings great difficulties to the learning process. [0005] Dimension Explosion: In monolithic reinforcement learning, state-value funct...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/08G06N3/044G06N3/045
Inventor 洪万福钱智毅黄在斌
Owner 厦门渊亭信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products