Reinforcement learning training optimization method and device for multi-agent confrontation

A multi-agent and reinforcement learning technology, applied in the field of machine learning, can solve problems such as low training efficiency, achieve efficient training, and improve training efficiency

Active Publication Date: 2020-04-10
NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI
View PDF11 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods have great limitations. The main problems include: when the number of agents increases, the training efficiency is low
As the number of agents increases, the size of the action-state space of the multi-agent system increases exponentially, and more and more time is required for trial-and-error exploration, resulting in low training efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning training optimization method and device for multi-agent confrontation
  • Reinforcement learning training optimization method and device for multi-agent confrontation
  • Reinforcement learning training optimization method and device for multi-agent confrontation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0029] figure 1 It is a flow chart of the multi-agent confrontation-oriented reinforcement learning training optimization method provided by an embodiment of the present invention. Such as figure 1 As shown, the method includes:

[0030] Step 101, the rule coupling algorithm training process, including: for each training step, obtain the initia...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a reinforcement learning training optimization method and device for multi-agent confrontation. The method comprises the following steps of a rule coupling algorithm training process including the steps of acquiring an initial first state result set of a red party multi-agent for each training step, if the initial first state result set of the red party multi-agent meets a preset action rule, obtaining a decision-making behavior result set according to the preset action rule, and otherwise, acquiring the decision-making behavior result set according toa preset reinforcement training learning algorithm; and performing reinforcement learning training on the red-party multi-agent by utilizing a training sample formed by the decision-making behavior result set and other preset parameters. The embodiment of the invention provides the reinforcement learning training optimization method and device for multi-agent confrontation. In the whole training process, the preset action rule can guide the multiple agents to act, invalid actions are avoided, the problems that in the training process in the prior art, invalid exploration is much, and the training speed is low are solved, and the training efficiency is remarkably improved.

Description

technical field [0001] The invention relates to the technical field of machine learning, in particular to a multi-agent confrontation-oriented reinforcement learning training optimization method and device. Background technique [0002] Artificial intelligence is a technical science that researches and develops theories, methods, technologies and applications for simulating and expanding human intelligence. One of the main goals of artificial intelligence research is to simulate human decision-making by intelligent agents (Agents), so as to be competent for some complex tasks that require human intelligence to complete. The limited functionality of a single agent to cope with complex tasks has driven the concept of multi-agent systems. A multi-agent system is composed of multiple agents that can make independent decisions and interact with each other. They share the same environment and have perception and execution mechanisms. At present, multi-agent systems have become a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/00
CPCG06N3/008G06F18/214
Inventor 徐新海李渊戴华东王之元张冠宇宋菲菲
Owner NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products