Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

GP world model using strategy model to assist in training and training method thereof

A technology for world model, strategy

Inactive Publication Date: 2022-05-13
NANHU LAB
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The applicant has found in the long-term research that the world model training effect of deep reinforcement learning implemented in this way is not good, but there has been no suitable solution before

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GP world model using strategy model to assist in training and training method thereof
  • GP world model using strategy model to assist in training and training method thereof
  • GP world model using strategy model to assist in training and training method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0065] In this embodiment, this solution is used in the Dyna-PPO framework designed by the applicant to realize continuous action decision-making. In this framework, the policy model includes the PPO algorithm, so the second loss function in this embodiment is the loss of the PPO algorithm. function.

[0066] The PPO algorithm is a new policy gradient (Policy Gradient, PG) algorithm. The PPO method encourages exploration and limits policy changes to keep the policy update slow. It is a method that integrates intelligence optimization and policy optimization and can be used to process Continuity of motion problem. The PPO algorithm proposes that the objective function can be updated in small batches in multiple training steps, which solves the problem that the step size is difficult to determine in the traditional policy gradient algorithm. It will try to calculate a new strategy in each iteration step, and can achieve a new balance between the ease of implementation, sampling...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a GP world model utilizing a strategy model to assist training and a training method thereof, the GP world model comprises loss functions used for training the world model, the loss functions comprise a first loss function and a second loss function, the first loss function is an own loss function of the GP world model, and the second loss function is an own loss function of the GP world model. The second loss function is a loss function of the strategy model, and the training method comprises the following steps: S1, the world model updates model parameters by using the loss function; s2, updating model parameters by the strategy model by using a loss function, and storing each step in the training; and S3, taking the average value as a descendant for next training of the world model. According to the method, a training mechanism of a method for training the GP world model through the strategy model in an auxiliary mode is provided, the purpose of modulating and training the world model can be achieved through the stability of strategy training, and therefore the training effect and performance of the world model are improved.

Description

technical field [0001] The invention belongs to the technical field of world models, and in particular relates to a GP world model and a training method thereof using a strategy model for auxiliary training. Background technique [0002] The deep reinforcement learning framework is a framework that can well solve the problem of limited sample data. The deep reinforcement learning framework mainly includes two parts: the policy model and the world model. The policy model is trained using the experience in the experience pool. The world model imitates the environment by learning state transitions and rewards. The experience generated by the world model learning environment is also stored in the experience pool to provide more training data for the policy model, so it can overcome sample data. Insufficient problem. [0003] At present, the policy model and the world model of deep reinforcement learning are trained separately: the simulated experience generated by the world mod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F30/27G06N20/00G06Q50/30
CPCG06F30/27G06N20/00G06Q50/40
Inventor 葛品吴冠霖方文其平洋栾绍童缪正元戴迎枫沈源源金新竹
Owner NANHU LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products