Supercharge Your Innovation With Domain-Expert AI Agents!

Multi-agent reinforcement learning method and system based on hierarchical consistency learning

A technology of reinforcement learning and consistency, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems that hinder efficient exploration and collaboration of agents, difficult communication protocols, etc., to improve time efficiency and task completion Effect

Pending Publication Date: 2022-03-01
EAST CHINA NORMAL UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, existing algorithms model agents as independent individuals in the centralized training phase. Although these agents can communicate explicitly or implicitly to achieve cooperation, it is difficult to learn effective communication through end-to-end training. agreement, thus hindering the efficient exploration and collaboration of agents in large-scale multi-agent systems, and a cluster of sorting robots that need to cooperate to complete sorting tasks constitutes a typical large-scale multi-agent system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-agent reinforcement learning method and system based on hierarchical consistency learning
  • Multi-agent reinforcement learning method and system based on hierarchical consistency learning
  • Multi-agent reinforcement learning method and system based on hierarchical consistency learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] This embodiment relates to the specific implementation of the above-mentioned multi-agent reinforcement learning method based on hierarchical consistency learning in the large-scale automatic sorting scenario of sorting robot clusters in the field of warehouse automation, such as Figure 6 shown.

[0028] In the sorting task of this embodiment, 12 sorting robots are located in the central area of ​​the square map, and 32 shelves are evenly distributed in the four corner areas of the square map. Sorting robots are rewarded for navigating to shelves. The local observation of each sorting robot includes all the information in a square area centered on itself and with a radius of 7 unit lengths, that is, the two-dimensional coordinates of the other sorting robots and the two-dimensional coordinates of the shelves. The decision-making actions of the sorting robot include Move up, move down, move to the left, move to the right, stay still and lift the shelf, and the moving d...

Embodiment 2

[0053]In this embodiment, on the basis of Embodiment 1, disturbance factors are introduced into the sorting environment, such as observation noise, decision-making delay, mistaken target, etc. These disturbance factors will affect the behavior of the sorting robot. In this embodiment, these real interference factors are generally and abstractly modeled by introducing an additional interference agent. Specifically, compared with Embodiment 1, the schematic diagram of the automatic sorting task handled by this embodiment is as follows Figure 8 As shown, including: 16 interference robots, 16 sorting robots that move faster and 32 shelves. The sorting robot can only be rewarded by successfully reaching the shelf position and lifting the shelf, while the interference robot can be rewarded by arriving at the location of the sorting robot. The actions of all agents include moving up, moving down, moving to the left, moving to the right, and staying still; the sorting robot has an a...

Embodiment 3

[0058] Such as Figure 8 As shown, it is a multi-agent reinforcement learning system based on hierarchical consistency that implements the above method, including: an initialization unit 510, a heuristic grouping unit 520, a team intention unit 530, an individual intention unit 540, and a multi-agent decision-making unit 550 And model optimization unit 560, wherein: initialization unit 510 constructs and randomly initializes team intention network, individual intention network and multi-agent decision-making network; heuristic grouping unit 520 utilizes hierarchical clustering algorithm to cluster and group all agents to form a team The team intention unit 530 uses the team intention network to generate team intentions for each team based on the joint observations of all agents in each team, and uses the team intention network to calculate unsupervised comparison losses according to different team intentions, and according to all The joint observation of agents calculates the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-intelligent reinforcement learning method and system based on hierarchical consistency learning, and the method combines a deep set network with a variational automatic encoder, and is introduced into multi-intelligent reinforcement learning to standardize the group behavior of a sorting robot cluster in a large-scale automatic sorting task. By introducing an auxiliary unsupervised learning task and a self-supervised learning task, efficient team intention representation and individual intention representation are obtained, and by applying diversity constraints to team intentions and applying consistency constraints to individual intentions, close cooperation in the teams and diverse exploration among the teams are ensured, so that the intellectual property of the teams is improved, and the intellectual property of the teams is improved. Therefore, the exploration efficiency and the cooperation efficiency of the large-scale multi-agent system when the large-scale multi-agent system completes the cooperation task can be effectively improved, and the exploration efficiency and the cooperation efficiency of the large-scale multi-agent system when the large-scale multi-agent system completes the cooperation task can be effectively improved.

Description

technical field [0001] The present invention relates to a technology in the field of warehousing automation, specifically a multi-agent reinforcement learning method and system based on hierarchical consistency learning for solving large-scale automatic sorting tasks of sorting robot clusters in the field of warehousing automation . Background technique [0002] Most of the existing multi-agent reinforcement learning algorithms follow the centralized training-decentralized execution framework. In the centralized training phase, the agent needs to learn a decentralized strategy by sharing local observations, parameters or gradients, etc. However, existing algorithms model agents as independent individuals in the centralized training phase. Although these agents can communicate explicitly or implicitly to achieve cooperation, it is difficult to learn effective communication through end-to-end training. Agreement, thus hindering the efficient exploration and collaboration of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 金博李文浩王祥丰朱骏
Owner EAST CHINA NORMAL UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More