Multi-agent cooperation model based on deep reinforcement learning

A reinforcement learning and multi-intelligence technology, applied in computing models, machine learning, computing, etc., can solve problems such as low efficiency, slow convergence, poor stability, etc., to ensure consistency, improve adaptability, and update rules.

Pending Publication Date: 2021-11-02
DALIAN UNIV
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the problems of low efficiency, slow convergence speed and poor stability of the existing multi-agent reinforcement learning methods, this application provides a multi-agent cooperation model based on deep reinforcement learning, which ensures the global optimal action and local optimal action. Consistency, thereby improving the efficiency of multi-agent exploration in continuous action spaces

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-agent cooperation model based on deep reinforcement learning
  • Multi-agent cooperation model based on deep reinforcement learning
  • Multi-agent cooperation model based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] This embodiment adopts the basic structure of CCDA. The distributed Actor network is conducive to the distributed execution of agents. It interacts with the environment to generate state-action information and store it in the experience buffer. In order to combat the non-stationarity of the environment, the centralized Critic The network takes the global state-action information as input, designs the global reward R with the task of the cooperative multi-agent system as the goal, and learns a global action value Q by using TD error tot . In order to ensure the consistency between a single agent and the global optimal action, the present invention introduces the idea of ​​value decomposition, adds the Q value decomposition network—QDN, and converts the global action value Q tot decomposes into an action value Q based on a single agent i , so that the implicit credit allocation is realized, so that the contribution of a single agent in the team can be expressed; in addit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-agent cooperation model based on deep reinforcement learning, which comprises a centralized Critic network, a plurality of distributed Actor networks and a Q value decomposition network. Each Actor network interacts with an environment to generate state-action information, the state-action information is stored in an empirical buffer area, the Critic network samples from the empirical buffer area, all state-action information serves as input, a global reward R is designed with the task of the cooperation multi-agent system as a target, and a global action value Qtot is obtained through learning in a TD error mode; the Q value decomposition network decomposes the global action value Qtot into action value Qi based on a single agent, and gradient update of each Actor network depends on the action value Qi of the corresponding single agent after decomposition. According to the method, the consistency of the global optimal action and the local optimal action is ensured, so that the exploration efficiency of multiple agents in a continuous action space is improved.

Description

technical field [0001] The invention relates to the technical field of multi-agent reinforcement learning, in particular to a multi-agent cooperation model based on deep reinforcement learning. Background technique [0002] MAS is a distributed decision-making system composed of multiple agents interacting with the environment. Since the 1970s, MAS has carried out numerous researches, the purpose of which is to establish a swarm intelligence system with a specific autonomous level and autonomous learning ability. The characteristics of MAS information sharing, distributed computing and collaborative execution have a very wide range of application requirements in real life, especially in many fields such as military, industry, and transportation. In decision-making optimization problems, reinforcement learning shows a huge advantage in online learning, and it is more in line with the learning mechanism of biological groups. With the upsurge of reinforcement learning led by ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 邹启杰蒋亚军高兵秦静李丹李文雪
Owner DALIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products