Multi-agent reinforcement learning method and system based on hierarchical attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A multi-agent and reinforcement learning technology, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems such as unscalability, increase in the size of state space and action space, and difficulty in realizing communication requirements, etc., to achieve The effect of improving scalability

Pending Publication Date: 2021-01-15

天津(滨海)人工智能军民融合创新中心

View PDF0 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This idea is not scalable, and the size of its state space and action space grows exponentially with the increase in the number of agents.

Moreover, since the central controller must collect the information of each agent and distribute the decision to each agent during the decision-making process, there are very high communication requirements for the agents that execute the decision, which is difficult to achieve in the real world.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0062] combine figure 1 , the present invention provides a multi-agent reinforcement learning method based on hierarchical attention mechanism, comprising:

[0063] Build a learning environment that includes multiple agents;

[0064] The critic network calculates the estimated value based on the observation and action values of other agents in multiple agents obtained by the hierarchical attention mechanism of each agent, and optimizes it through the minimum joint loss function until the minimum joint loss function converges;

[0065] Based on the observation value, action value and estimated value after training, the action-value function is calculated by estimating the Actor network and optimized by maximizing the advantage function until the optimal action-value function is obtained;

[0066] Deterministic actions are performed based on an optimal action-value function.

[0067] Build a learning environment that includes multiple agents, including:

[0068] Based on th...

Embodiment 2

[0103] combine figure 2 , the present invention comprises the following steps:

[0104] The first step, build the combination image 3intensive learning environment. It mainly includes three kinds of cooperation environment and hunting environment (mixed environment). The environment simulates a real physical world environment, where there are elastic forces, resistance, etc. Specific steps are as follows:

[0105] 1.1 Collaborative navigation scenario: In this scenario, X agents try to reach L preset target points (L=X) through cooperation. All agents only have physical actions, but can observe their relative positions with other agents and target points. The reward feedback of the agent is related to the distance from each target to any agent, so it is required that the agent must cover each target in order to obtain the maximum reward. And, when the agent collides with other agents, it will get a penalty. We try to let the agent learn to cover all objects and avoid ...

Embodiment 3

[0172] Learning environment module: build a learning environment, which includes multiple agents;

[0173] Critic module: The critic network calculates the estimated value based on the observation values and action values of other agents in multiple agents obtained by the hierarchical attention mechanism of each agent, and optimizes through the minimum joint loss function until the minimum joint loss function convergence;

[0174] Actor network module: Based on the observation value, action value and estimated value after training, the estimated Actor network calculates the action-value function and optimizes it by maximizing the advantage function until the optimal action-value function is obtained;

[0175] Execution Action Module: Based on the optimal action-value function, execute deterministic actions.

[0176] The acquisition module includes an estimated value submodule, a minimum joint loss function submodule and an optimization submodule;

[0177] The Critic modu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a multi-agent reinforcement learning method and system based on a hierarchical attention mechanism, and the method comprises the steps: building a learning environment which comprises a plurality of agents, enabling a Critic network to calculate estimated values based on the observation values and action values, obtained by the hierarchical attention mechanism of each agent,of other agents in the plurality of agents, optimizing through a minimum joint loss function until the minimum joint loss function converges, calculating an action value function based on the observed value, the action value and the trained estimated value in combination with an estimated Actor network, optimizing through a maximized dominance function until an optimal action value function is obtained, and calculating an optimal action-value function based on the optimal action value function. and performing a deterministic action; A hierarchical attention mechanism is combined with an Actor-Critic network framework, so that the expandability of the intelligent agent in an environment to which the intelligent agent belongs can be enhanced for learning.

Description

technical field [0001] The invention relates to a multi-agent, in particular to a multi-agent reinforcement learning method and system based on a layered attention mechanism. Background technique [0002] Deep reinforcement learning has made remarkable progress in many domains, such as Atari games, Go, and complex continuous control tasks related to locomotion. We usually refer to the robot that learns and implements decisions as the agent, and everything that interacts with it outside the agent is called the environment. The agent chooses actions, the environment responds to these actions, and presents a new state to the agent. At the same time, the environment will also generate a benefit (ie reward), which is the goal that the agent wants to maximize in the process of selecting actions. This series of decision-making processes can be modeled as a Markov decision process (Markov Decision Process, MDP), which is an idealized form of reinforcement learning problems in math...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/04G06N3/08

CPCG06N3/08G06N3/044G06N3/045

Inventor 史殿习王雅洁张拥军薛超郝锋姜浩王功举

Owner 天津(滨海)人工智能军民融合创新中心

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-agent reinforcement learning method and system based on hierarchical attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology