Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO

An intelligent decision-making and continuous technology, applied in knowledge-based computer systems, dynamic trees, dynamic search technology, etc., can solve the difficulty of selecting appropriate experience, poor generalization ability, unsatisfactory efficiency and accuracy and other issues to achieve the effect of improving training efficiency and performance and reducing dependence

Pending Publication Date: 2022-06-21
NANHU LAB
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Rule-based methods are usually designed manually and can realize their functions quickly, but the method generalizes poorly to unknown situations and cannot cope with high-variability scenarios
The learning-based method is mainly based on the DRL (Deep Reinforcement Learning) method, which uses a deep neural network to map the perception state to the vehicle action. In the past DRL research on autonomous driving, there are mainly DRL methods based on model-free methods. And the model-based DRL method, however, the model-free DRL method usually needs to spend a lot of time on training, learn experience through trial and error, and the learning efficiency is very low; if the model-based DRL method cannot learn enough accurate data from the data It is difficult to achieve the desired effect without a model, and the data recorded during the interaction between the algorithm and the unknown environment often contain a lot of useless information, so it is difficult to choose the appropriate experience. In addition, creating and verifying the dynamic model also depends on professional knowledge
[0004] The Dyna-Q framework combines the advantages of the above two methods and is a good feasible method. However, the Dyna-Q framework itself cannot handle the continuous action problem well, and the Dyna-Q framework is limited to integration at the data level. learning and planning
In addition, due to the low-quality data in the experience pool, a large number of planning steps will hurt learning after sufficient training, although it can be avoided by designing a discriminative module, it is still unsatisfactory in terms of efficiency and accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO
  • Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO
  • Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience pool, the world model is a world model based on GP, the strategy model comprises a PPO algorithm, and the PPO algorithm comprises a PPO algorithm. And the PPO algorithm utilizes simulation experience in the experience pool to carry out reinforcement learning. According to the Dyna-PPO method based on the GP, a DQN algorithm in a Dyna-Q framework is replaced with an optimized PPO algorithm, the improved framework has the advantages of a model-free DRL scheme and a model-based DRL scheme and can be used for solving the decision-making problem of continuous actions, and therefore continuous action decision-making based on the Dyna-framework is achieved.

Description

technical field [0001] The invention belongs to the field of intelligent decision-making, and in particular relates to an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO. Background technique [0002] The decision-making problem scenarios are complex and changeable, and there are not only discrete action problems such as gesture commands, but also continuous action problems such as aircraft and vehicle control decisions. But at present, more research is limited to the situation of discrete action space. The applicant's previous research (patents applied for: CN113392956B, CN112989017B, CN112989016B) described the Dyna-Q method based on GP, ​​which is also more limited to discrete actions In the case of space, although the action space can also be discretized, once the dimension of the discrete space is increased, it is easy to encounter the problem of convergence difficulties, which will significantly destroy ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F30/27G06N5/00
CPCG06F30/27G06N5/01Y02T10/40
Inventor 方文其吴冠霖葛品平洋栾绍童戴迎枫缪正元沈源源金新竹
Owner NANHU LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products