Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An intelligent decision-making and continuous technology, applied in knowledge-based computer systems, dynamic trees, dynamic search technology, etc., can solve the difficulty of selecting appropriate experience, poor generalization ability, unsatisfactory efficiency and accuracy and other issues to achieve the effect of improving training efficiency and performance and reducing dependence

Pending Publication Date: 2022-06-21

NANHU LAB

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Rule-based methods are usually designed manually and can realize their functions quickly, but the method generalizes poorly to unknown situations and cannot cope with high-variability scenarios

The learning-based method is mainly based on the DRL (Deep Reinforcement Learning) method, which uses a deep neural network to map the perception state to the vehicle action. In the past DRL research on autonomous driving, there are mainly DRL methods based on model-free methods. And the model-based DRL method, however, the model-free DRL method usually needs to spend a lot of time on training, learn experience through trial and error, and the learning efficiency is very low; if the model-based DRL method cannot learn enough accurate data from the data It is difficult to achieve the desired effect without a model, and the data recorded during the interaction between the algorithm and the unknown environment often contain a lot of useless information, so it is difficult to choose the appropriate experience. In addition, creating and verifying the dynamic model also depends on professional knowledge

[0004] The Dyna-Q framework combines the advantages of the above two methods and is a good feasible method. However, the Dyna-Q framework itself cannot handle the continuous action problem well, and the Dyna-Q framework is limited to integration at the data level. learning and planning

In addition, due to the low-quality data in the experience pool, a large number of planning steps will hurt learning after sufficient training, although it can be avoided by designing a discriminative module, it is still unsatisfactory in terms of efficiency and accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience pool, the world model is a world model based on GP, the strategy model comprises a PPO algorithm, and the PPO algorithm comprises a PPO algorithm. And the PPO algorithm utilizes simulation experience in the experience pool to carry out reinforcement learning. According to the Dyna-PPO method based on the GP, a DQN algorithm in a Dyna-Q framework is replaced with an optimized PPO algorithm, the improved framework has the advantages of a model-free DRL scheme and a model-based DRL scheme and can be used for solving the decision-making problem of continuous actions, and therefore continuous action decision-making based on the Dyna-framework is achieved.

Description

technical field [0001] The invention belongs to the field of intelligent decision-making, and in particular relates to an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO. Background technique [0002] The decision-making problem scenarios are complex and changeable, and there are not only discrete action problems such as gesture commands, but also continuous action problems such as aircraft and vehicle control decisions. But at present, more research is limited to the situation of discrete action space. The applicant's previous research (patents applied for: CN113392956B, CN112989017B, CN112989016B) described the Dyna-Q method based on GP, which is also more limited to discrete actions In the case of space, although the action space can also be discretized, once the dimension of the discrete space is increased, it is easy to encounter the problem of convergence difficulties, which will significantly destroy ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F30/27G06N5/00

CPCG06F30/27G06N5/01Y02T10/40

Inventor 方文其吴冠霖葛品平洋栾绍童戴迎枫缪正元沈源源金新竹

Owner NANHU LAB

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology