Supercharge Your Innovation With Domain-Expert AI Agents!

Reinforcement learning intelligent agent training method based on PPO algorithm

A technology of reinforcement learning and training methods, applied in the field of agent training, can solve the problem of not being able to solve the problem of maximizing the benefit signal reinforcement learning, and achieve the effect of reasonable design

Inactive Publication Date: 2021-08-13
中国人民解放军军事科学院评估论证研究中心
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Revealing the structure of the agent through its experience is certainly beneficial in reinforcement learning, but it does not solve the reinforcement learning problem of maximizing the payoff signal
[0004] Reinforcement learning poses a unique challenge, the trade-off between "testing" and "exploiting"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning intelligent agent training method based on PPO algorithm
  • Reinforcement learning intelligent agent training method based on PPO algorithm
  • Reinforcement learning intelligent agent training method based on PPO algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0125] see figure 1 , a PPO algorithm-based reinforcement learning agent training method, comprising the following steps:

[0126] Step 1, simulation configuration management, build the IP port and interval of the simulation platform, and start the simulation system according to the simulation IP and port;

[0127] Step 2. Training environment configuration. The training environment includes combat information such as scenario, environment, and army type required by the agent. Through the configuration of the training environment to construct different combat scenarios, the intelligence will respond differently according to different scenarios and environments;

[0128] Step 3: Select the PPO algorithm for hyperparameters, use Bayesian optimization, PBT and other technologies to select the PPO algorithm for reinforcement learning, and automatically adjust hyperparameters during the training process. At the same time, it supports manual configuration of algorithms and...

Embodiment 2

[0133] see figure 2 , a PPO algorithm-based reinforcement learning agent training method, comprising the following steps:

[0134] Step 1, simulation configuration management, build the IP port and interval of the simulation platform, and facilitate the activation of the blue army combat operation simulation system according to the simulation IP and port;

[0135] Step 2. Training environment configuration. The training environment includes combat information such as scenario, environment, and army type required by the agent. Through the configuration of the training environment to construct different combat scenarios, the intelligence will respond differently according to different scenarios and environments;

[0136] Step 3: Select the PPO algorithm for hyperparameters, use Bayesian optimization, PBT and other technologies to select the PPO algorithm for reinforcement learning, and automatically adjust hyperparameters during the training process. At the same time,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a reinforcement learning intelligent agent training method based on a PPO algorithm. The method comprises: simulation configuration management, training environment configuration, PPO algorithm hyper-parameter selection, intelligent agent training, training result analysis and model generation. The method has the beneficial effects that the established platform can analyze key information of the environment, provide an automatic model generation function, and provide an automatic model generation function according to characteristics and data types of state space and action space of each intelligent agent in the environment and 'scores' provided by the simulation environment; and a deep network model built in the system is automatically selected to express a strategy model and a value function model, input data of types such as discrete, continuous, image and variable-length lists are supported, and discrete and continuous action types are also supported. The model is automatically generated by the platform without manual adjustment and setting.

Description

technical field [0001] The invention relates to an agent training method, in particular to a PPO algorithm-based reinforcement learning agent training method, and belongs to the technical field of agent training. Background technique [0002] Reinforcement learning is all about learning "what to do" to maximize the numerical payoff signal. Learners need to discover which actions will produce the most lucrative benefits through their own attempts in the process of exploration. In practical cases, actions often affect not only immediate benefits, but also the next situation, thereby affecting subsequent benefits. These two features, trial-and-error and delayed payoff, are two of the most important and salient features of reinforcement learning. [0003] Reinforcement learning is different from supervised learning and unsupervised learning widely used in the field of machine learning. Among them, supervised learning is to learn from the labeled training set provided by an ext...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/04G06N3/08G06N7/00G06F30/27
CPCG06N3/08G06F30/27G06N7/01G06N3/044G06N3/045
Inventor 伊山燕玉林刘晓光王锐华路越李禾杨洲
Owner 中国人民解放军军事科学院评估论证研究中心
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More