Multi-agent group cooperation strategy automatic generation method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An automatic generation and intelligent agent technology, applied in the field of artificial intelligence, can solve problems such as slow learning speed and difficult algorithm stability, and achieve the effect of improving training efficiency, improving generation and evaluation efficiency

Pending Publication Date: 2021-03-12

厦门渊亭信息科技有限公司

View PDF3 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The exploration of each agent may have an impact on the strategy of the companion agent, which will make the algorithm difficult to stabilize and slow to learn

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0046] The invention discloses a multi-agent group cooperation strategy automatic generation method of MADDPG (a multi-agent reinforcement learning framework based on a deep deterministic policy gradient algorithm), hereinafter referred to as the TTL-MADDPG algorithm, which is based on the MADDPG algorithm. Based on the original MADDPG algorithm, three major innovations are proposed: trace information, multi-agent cooperative teaming, and life-and-death training. The invention takes the MADDPG algorithm as the main body, adds the strategy network (actor network) of the MADDPG algorithm into the information trace, and changes it to a i =μ θi (O i ,x i )+N noise , where x i Represents the intelligent agent i The trace amount of information, the intelligent agent i The learning history in the environment will leave a trace of its own information in the environment. Through the trace of information, the agent can learn from other people's experience and avoid detours. In th...

Embodiment 2

[0071] Application of multi-agent group cooperative strategy automatic generation algorithm in traffic light control.

[0072] Take the traffic signal at each intersection as the intelligent body, expressed as agent i ;

[0073] Input: the collection of multiple traffic signals Agents={agent 0 , agent 1 , agent 2 ,...,agent i}.

[0074] Input: Initialize each traffic signal agent i The policy network π i (o,θ πi ) and the evaluation network Q i (s,a 1 ,a 2 ^a N ,θ Qi ) and network parameters θ πi and θ Qi ; where o represents the real-time information of the traffic signal machine observing the traffic environment; strategy network π i Indicates the control strategy of the i-th traffic signal machine on the traffic lights each time, and evaluates the network Q i Indicates the evaluation of the i-th traffic signal machine on the control strategy of traffic lights, s indicates the status information of the traffic signal machine, a indicates the traffic control a...

Embodiment 3

[0086] The algorithm in the automatic generation method of the multi-agent group cooperation strategy adopted by the present invention is evaluated through a simulation test.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the field of artificial intelligence, and discloses a multi-agent group cooperation strategy automatic generation method, which defines agents and strategy networks thereof according to a specific application environment, evaluates the networks and experiences, and realizes automatic generation of a multi-agent cooperation strategy. The adopted algorithm provides three innovations on the basis of the MADDPG algorithm: trace information, multi-agent cooperative team formation and birth and death training. The learning history of the intelligent agent in the environmentcan leave a trace amount of information in the environment, and the user can learn the experience of other people through the trace amount of information intelligent agent to avoid walking; the training efficiency can be improved through cooperative team formation of the multiple intelligent agents; finally, the agents with excellent learning ability in the environment are inherited to all information of themselves through filial generations to continue to be trained through birth and death training, the agents with poor learning ability in the environment return to the initial point to be trained again through death, and the generation and evaluation efficiency of the multi-agent cooperation strategy can be greatly improved.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a method for automatically generating multi-agent group cooperation strategies. Background technique [0002] MADDPG is a multi-agent reinforcement learning framework based on the deep deterministic policy gradient algorithm, which can be used for the automatic generation of multi-agent cooperative strategies. [0003] In a multi-agent system, each agent learns to improve its strategy by interacting with the environment to obtain a reward value (reward), so that the process of obtaining the optimal strategy in the environment is multi-agent reinforcement learning. [0004] In single-agent reinforcement learning, the environment of the agent is stable, but in multi-agent reinforcement learning, the environment is complex and dynamic, which brings great difficulties to the learning process. [0005] Dimension Explosion: In monolithic reinforcement learning, state-value funct...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/08G06N3/04

CPCG06N3/08G06N3/044G06N3/045

Inventor 洪万福钱智毅黄在斌

Owner 厦门渊亭信息科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-agent group cooperation strategy automatic generation method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology