GP world model using strategy model to assist in training and training method thereof

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for world model, strategy

Inactive Publication Date: 2022-05-13

NANHU LAB

View PDF8 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The applicant has found in the long-term research that the world model training effect of deep reinforcement learning implemented in this way is not good, but there has been no suitable solution before

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 2

[0065] In this embodiment, this solution is used in the Dyna-PPO framework designed by the applicant to realize continuous action decision-making. In this framework, the policy model includes the PPO algorithm, so the second loss function in this embodiment is the loss of the PPO algorithm. function.

[0066] The PPO algorithm is a new policy gradient (Policy Gradient, PG) algorithm. The PPO method encourages exploration and limits policy changes to keep the policy update slow. It is a method that integrates intelligence optimization and policy optimization and can be used to process Continuity of motion problem. The PPO algorithm proposes that the objective function can be updated in small batches in multiple training steps, which solves the problem that the step size is difficult to determine in the traditional policy gradient algorithm. It will try to calculate a new strategy in each iteration step, and can achieve a new balance between the ease of implementation, sampling...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a GP world model utilizing a strategy model to assist training and a training method thereof, the GP world model comprises loss functions used for training the world model, the loss functions comprise a first loss function and a second loss function, the first loss function is an own loss function of the GP world model, and the second loss function is an own loss function of the GP world model. The second loss function is a loss function of the strategy model, and the training method comprises the following steps: S1, the world model updates model parameters by using the loss function; s2, updating model parameters by the strategy model by using a loss function, and storing each step in the training; and S3, taking the average value as a descendant for next training of the world model. According to the method, a training mechanism of a method for training the GP world model through the strategy model in an auxiliary mode is provided, the purpose of modulating and training the world model can be achieved through the stability of strategy training, and therefore the training effect and performance of the world model are improved.

Description

technical field [0001] The invention belongs to the technical field of world models, and in particular relates to a GP world model and a training method thereof using a strategy model for auxiliary training. Background technique [0002] The deep reinforcement learning framework is a framework that can well solve the problem of limited sample data. The deep reinforcement learning framework mainly includes two parts: the policy model and the world model. The policy model is trained using the experience in the experience pool. The world model imitates the environment by learning state transitions and rewards. The experience generated by the world model learning environment is also stored in the experience pool to provide more training data for the policy model, so it can overcome sample data. Insufficient problem. [0003] At present, the policy model and the world model of deep reinforcement learning are trained separately: the simulated experience generated by the world mod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F30/27G06N20/00G06Q50/30

CPCG06F30/27G06N20/00G06Q50/40

Inventor 葛品吴冠霖方文其平洋栾绍童缪正元戴迎枫沈源源金新竹

Owner NANHU LAB

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

GP world model using strategy model to assist in training and training method thereof

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A technology for world model, strategy

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for world model, strategy

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology