Trust region strategy optimization method and device based on post-event experience and related equipment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An optimization method and trust region technology, applied in the field of machine learning intelligent robots, can solve problems such as slow learning speed, low exploration efficiency, and difficulty in reward function design, and achieve the effects of increasing accuracy, improving convergence speed, and reducing variance

Pending Publication Date: 2020-12-18

XI AN JIAOTONG UNIV

View PDF1 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, reinforcement learning is currently facing many problems such as slow learning speed, difficult reward function design, and low exploration efficiency, so it is difficult to be applied in actual complex tasks.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] In order to enable those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0040] The present invention is a trust region policy optimization method based on post-event experience, the principle of which is to use the robot execution action experience data collected in the policy training process under target conditions, and use the reached target point in the robot execution action experience data as a virtual...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a trust region strategy optimization method and device based on post-event experience and related equipment, and the method comprises the steps: S100, taking an arrived targetpoint in experience data as a virtual target point, and generating virtual post-event experience data; S200, filtering the virtual target based on a post-event target filtering algorithm to obtain corresponding training data; s300, based on the virtual experience data, correcting the distribution deviation of the virtual empirical data and the original empirical data through weighted importance sampling; s400, correcting the distribution deviation between the virtual experience data and the original empirical data based on weighted importance sampling so as to estimate an inter-strategy KL divergence value; and S500, correcting the strategy gradient direction through the KL divergence, and calculating and updating the strategy step length through the maximum KL divergence step length. According to the method, an intelligent agent can complete an effective exploration process on the environment and tasks based on a small amount of interaction data and a simply designed reward function,and behavior strategies are efficiently learned and updated.

Description

technical field [0001] The invention belongs to the field of machine learning intelligent robots, and in particular relates to a trust domain policy optimization method, device and related equipment based on post-event experience. Background technique [0002] With the rapid development of artificial intelligence technology, it has emerged in many industries through intelligent and automated information processing. However, the current mainstream deep learning methods in the field of artificial intelligence mostly rely on large-scale human-labeled data. How to obtain data and complete the learning process through the autonomous interaction between robots or agents and the environment is a major difficulty in the field of artificial intelligence. As an important branch technology in the field of artificial intelligence, reinforcement learning can help robots explore and learn in the process of autonomous interaction with the environment. However, reinforcement learning is cu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N20/00

CPCG06N20/00

Inventor 兰旭光张翰博柏思特郑南宁

Owner XI AN JIAOTONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Trust region strategy optimization method and device based on post-event experience and related equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology