Multi-agent reinforcement learning method and system based on hierarchical attention mechanism

A multi-agent and reinforcement learning technology, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems such as unscalability, increase in the size of state space and action space, and difficulty in realizing communication requirements, etc., to achieve The effect of improving scalability

Pending Publication Date: 2021-01-15
天津(滨海)人工智能军民融合创新中心
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This idea is not scalable, and the size of its state space and action space grows exponentially with the increase in the number of agents.
Moreover, since the central controller must collect the information of each agent and distribute the decision to each agent during the decision-making process, there are very high communication requirements for the agents that execute the decision, which is difficult to achieve in the real world.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-agent reinforcement learning method and system based on hierarchical attention mechanism
  • Multi-agent reinforcement learning method and system based on hierarchical attention mechanism
  • Multi-agent reinforcement learning method and system based on hierarchical attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] combine figure 1 , the present invention provides a multi-agent reinforcement learning method based on hierarchical attention mechanism, comprising:

[0063] Build a learning environment that includes multiple agents;

[0064] The critic network calculates the estimated value based on the observation and action values ​​of other agents in multiple agents obtained by the hierarchical attention mechanism of each agent, and optimizes it through the minimum joint loss function until the minimum joint loss function converges;

[0065] Based on the observation value, action value and estimated value after training, the action-value function is calculated by estimating the Actor network and optimized by maximizing the advantage function until the optimal action-value function is obtained;

[0066] Deterministic actions are performed based on an optimal action-value function.

[0067] Build a learning environment that includes multiple agents, including:

[0068] Based on th...

Embodiment 2

[0103] combine figure 2 , the present invention comprises the following steps:

[0104] The first step, build the combination image 3intensive learning environment. It mainly includes three kinds of cooperation environment and hunting environment (mixed environment). The environment simulates a real physical world environment, where there are elastic forces, resistance, etc. Specific steps are as follows:

[0105] 1.1 Collaborative navigation scenario: In this scenario, X agents try to reach L preset target points (L=X) through cooperation. All agents only have physical actions, but can observe their relative positions with other agents and target points. The reward feedback of the agent is related to the distance from each target to any agent, so it is required that the agent must cover each target in order to obtain the maximum reward. And, when the agent collides with other agents, it will get a penalty. We try to let the agent learn to cover all objects and avoid ...

Embodiment 3

[0172] Learning environment module: build a learning environment, which includes multiple agents;

[0173] Critic module: The critic network calculates the estimated value based on the observation values ​​and action values ​​of other agents in multiple agents obtained by the hierarchical attention mechanism of each agent, and optimizes through the minimum joint loss function until the minimum joint loss function convergence;

[0174] Actor network module: Based on the observation value, action value and estimated value after training, the estimated Actor network calculates the action-value function and optimizes it by maximizing the advantage function until the optimal action-value function is obtained;

[0175] Execution Action Module: Based on the optimal action-value function, execute deterministic actions.

[0176] The acquisition module includes an estimated value submodule, a minimum joint loss function submodule and an optimization submodule;

[0177] The Critic modu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-agent reinforcement learning method and system based on a hierarchical attention mechanism, and the method comprises the steps: building a learning environment which comprises a plurality of agents, enabling a Critic network to calculate estimated values based on the observation values and action values, obtained by the hierarchical attention mechanism of each agent,of other agents in the plurality of agents, optimizing through a minimum joint loss function until the minimum joint loss function converges, calculating an action value function based on the observed value, the action value and the trained estimated value in combination with an estimated Actor network, optimizing through a maximized dominance function until an optimal action value function is obtained, and calculating an optimal action-value function based on the optimal action value function. and performing a deterministic action; A hierarchical attention mechanism is combined with an Actor-Critic network framework, so that the expandability of the intelligent agent in an environment to which the intelligent agent belongs can be enhanced for learning.

Description

technical field [0001] The invention relates to a multi-agent, in particular to a multi-agent reinforcement learning method and system based on a layered attention mechanism. Background technique [0002] Deep reinforcement learning has made remarkable progress in many domains, such as Atari games, Go, and complex continuous control tasks related to locomotion. We usually refer to the robot that learns and implements decisions as the agent, and everything that interacts with it outside the agent is called the environment. The agent chooses actions, the environment responds to these actions, and presents a new state to the agent. At the same time, the environment will also generate a benefit (ie reward), which is the goal that the agent wants to maximize in the process of selecting actions. This series of decision-making processes can be modeled as a Markov decision process (Markov Decision Process, MDP), which is an idealized form of reinforcement learning problems in math...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/08G06N3/044G06N3/045
Inventor 史殿习王雅洁张拥军薛超郝锋姜浩王功举
Owner 天津(滨海)人工智能军民融合创新中心
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products