Value decomposition multi-agent reinforcement learning training method using attention network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning and multi-agent technology, applied in the field of reinforcement learning training, can solve problems such as dimension explosion, credit allocation, and unstable environment, and achieve the effect of improving performance and performance

Pending Publication Date: 2022-08-05

NANJING UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] In the multi-agent environment, there are some problems that do not exist in the single-agent environment, such as unstable environment, dimension explosion and credit allocation problems, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 2

[0113] The value decomposition multi-agent reinforcement learning training method using the attention network of the present invention can be applied to many fields such as robot control, traffic coordination, and manufacturing control.

[0114] The multi-agent reinforcement learning training method using the value decomposition of the attention network described in the present invention is used for the control of the robot and cooperates with the work of each module, wherein the agent represents each module and part of the robot, including the power module, the arm module, leg module, head module. The strategy of the agent is the current action, in which the power module controls the magnitude of the output, and other actionable modules control the direction and magnitude of the action. All these actions are controlled by the agent value function network, and the reward represents the distance of the entire robot action. Each module can only see its own working status, but ca...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a value decomposition multi-agent reinforcement learning training method using an attention network. The method comprises experience collection using an exploration strategy and joint action value function calculation. When the joint action value function is calculated, two methods are used to improve the effect: an attention network and a separated value decomposition structure are used. The attention network uses the global state as input, so that the network can pay attention to important parts in the global state, and the weight is calculated more accurately. The separated value decomposition structure adopts two layers of weighted networks, and each layer of network uses different global state codes, so that the difficulty of super network learning is reduced. And the independent action value function of each agent is fused by using a super network, and finally a joint action value function is output. The attention network and the separated value decomposition structure used in the invention effectively improve the learning efficiency, accelerate the model convergence, and improve the effect of the multi-agent cooperation strategy.

Description

technical field [0001] The invention relates to a reinforcement learning and training method, in particular to a multi-agent reinforcement learning and training method using a value decomposition of an attention network. Background technique [0002] Reinforcement learning (RL) is a subdivision of machine learning, which is mainly used to solve sequential decision-making problems. Through continuous interaction with the environment, the agent continuously explores and optimizes the strategy, and finally achieves the maximization of the return or the established goal. In multi-agent reinforcement learning, multiple agents need to work together to accomplish the same goal. Because of this, multi-agent reinforcement learning is closer to real-world problems and has more practical significance. [0003] In the multi-agent environment, there are some problems that do not exist in the single-agent environment, such as environmental instability, dimensional explosion and credit a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N20/00G06N3/04G06N3/08

CPCG06N20/00G06N3/08G06N3/044

Inventor 杨育彬顾志浩

Owner NANJING UNIV

Value decomposition multi-agent reinforcement learning training method using attention network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology