Reinforcement learning intelligent agent training method based on PPO algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and training methods, applied in the field of agent training, can solve the problem of not being able to solve the problem of maximizing the benefit signal reinforcement learning, and achieve the effect of reasonable design

Inactive Publication Date: 2021-08-13

中国人民解放军军事科学院评估论证研究中心

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Revealing the structure of the agent through its experience is certainly beneficial in reinforcement learning, but it does not solve the reinforcement learning problem of maximizing the payoff signal

[0004] Reinforcement learning poses a unique challenge, the trade-off between "testing" and "exploiting"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0125] see figure 1 , a PPO algorithm-based reinforcement learning agent training method, comprising the following steps:

[0126] Step 1, simulation configuration management, build the IP port and interval of the simulation platform, and start the simulation system according to the simulation IP and port;

[0127] Step 2. Training environment configuration. The training environment includes combat information such as scenario, environment, and army type required by the agent. Through the configuration of the training environment to construct different combat scenarios, the intelligence will respond differently according to different scenarios and environments;

[0128] Step 3: Select the PPO algorithm for hyperparameters, use Bayesian optimization, PBT and other technologies to select the PPO algorithm for reinforcement learning, and automatically adjust hyperparameters during the training process. At the same time, it supports manual configuration of algorithms and...

Embodiment 2

[0133] see figure 2 , a PPO algorithm-based reinforcement learning agent training method, comprising the following steps:

[0134] Step 1, simulation configuration management, build the IP port and interval of the simulation platform, and facilitate the activation of the blue army combat operation simulation system according to the simulation IP and port;

[0135] Step 2. Training environment configuration. The training environment includes combat information such as scenario, environment, and army type required by the agent. Through the configuration of the training environment to construct different combat scenarios, the intelligence will respond differently according to different scenarios and environments;

[0136] Step 3: Select the PPO algorithm for hyperparameters, use Bayesian optimization, PBT and other technologies to select the PPO algorithm for reinforcement learning, and automatically adjust hyperparameters during the training process. At the same time,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a reinforcement learning intelligent agent training method based on a PPO algorithm. The method comprises: simulation configuration management, training environment configuration, PPO algorithm hyper-parameter selection, intelligent agent training, training result analysis and model generation. The method has the beneficial effects that the established platform can analyze key information of the environment, provide an automatic model generation function, and provide an automatic model generation function according to characteristics and data types of state space and action space of each intelligent agent in the environment and 'scores' provided by the simulation environment; and a deep network model built in the system is automatically selected to express a strategy model and a value function model, input data of types such as discrete, continuous, image and variable-length lists are supported, and discrete and continuous action types are also supported. The model is automatically generated by the platform without manual adjustment and setting.

Description

technical field [0001] The invention relates to an agent training method, in particular to a PPO algorithm-based reinforcement learning agent training method, and belongs to the technical field of agent training. Background technique [0002] Reinforcement learning is all about learning "what to do" to maximize the numerical payoff signal. Learners need to discover which actions will produce the most lucrative benefits through their own attempts in the process of exploration. In practical cases, actions often affect not only immediate benefits, but also the next situation, thereby affecting subsequent benefits. These two features, trial-and-error and delayed payoff, are two of the most important and salient features of reinforcement learning. [0003] Reinforcement learning is different from supervised learning and unsupervised learning widely used in the field of machine learning. Among them, supervised learning is to learn from the labeled training set provided by an ext...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06N3/08G06N7/00G06F30/27

CPCG06N3/08G06F30/27G06N7/01G06N3/044G06N3/045

Inventor 伊山燕玉林刘晓光王锐华路越李禾杨洲

Owner 中国人民解放军军事科学院评估论证研究中心

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Reinforcement learning intelligent agent training method based on PPO algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology