Robot learning control method based on policy gradient

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of learning control and robotics, applied in the direction of program control manipulators, manipulators, manufacturing tools, etc., can solve problems such as discontinuous algorithm convergence, dimension disaster, and inability to be directly applied, so as to improve learning ability and intelligence, and shorten learning time , The effect of simplifying the design difficulty

Inactive Publication Date: 2017-08-08

CHONGQING UNIV

View PDF0 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, these methods are more effective for discrete state-action spaces. When solving continuous state-action problems, many problems will arise.

In a continuous space, in order to achieve a reliable estimation of the value function, it is often necessary to collect a large amount of data in the corresponding space, which is difficult to achieve in the actual complex robot system

And as the robot's degrees of freedom continue to increase, there will be a "curse of dimensionality" problem

[0003] In addition, the method based on value function approximation also faces other problems: 1. This method is often used to solve deterministic strategies, and it is very difficult to deal with random strategies, but the best strategy is often random; 2. A random small change in the estimated value of an action will cause the action to not be executed, and this discontinuous change has been identified as a key obstacle to ensure algorithm convergence; 3. This method cannot guarantee that in robot learning The instructions sent to the robot during the process are safe and reliable

Therefore, most of the methods first search for strategies in the simulation environment, which cannot be directly applied in the actual physical environment.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] The method described in the present invention will be further described in detail below in conjunction with the accompanying drawings. figure 1 The structural block diagram of the robot learning control method based on the policy gradient provided by the present invention; figure 2 The schematic diagram of the robot learning control method based on the strategy gradient provided by the present invention is shown in the figure: the robot learning control method based on the strategy gradient provided by the present invention includes the following steps:

[0021] S1: Input the status information data of the robot during the movement process and the perception information data of the interaction with the environment;

[0022] S2: According to the state information data obtained by the robot and the environmental perception information data, calculate the approximate estimation model of the timely reward and the value function;

[0023] S3: According to the obtained cumu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a strategy gradient method suitable for robot learning control, which relates to robot learning control technology, including a data acquisition module, which acquires information data during the operation of the robot; a value function approximation module, which uses observed state information and obtained The timely reward is input to obtain an approximate estimation model of the value function; the strategy gradient optimization module parameterizes the robot learning control strategy, and adjusts and optimizes the parameters to make the robot reach an ideal operating state. The action execution module maps the action output by the controller to the action command actually executed by the robot. The method proposed by the present invention can be used for different types of robots, especially multi-degree-of-freedom robots, which have the ability to learn complex actions and solve random strategies, thereby improving the intelligence of the robot, reducing the danger in the learning process, and shortening the robot life. Learning time simplifies controller design difficulty.

Description

technical field [0001] The invention relates to robot learning control technology, in particular to a robot learning control method capable of parameterizing control strategies. Background technique [0002] There are already some technical methods in the field of robot learning control, the most commonly used method is based on value function approximation. In order to obtain the value of the state-action pair, TD (Temporal Difference) learning algorithm and Q-learning algorithm are usually used. While these methods are effective for discrete state-action spaces, many problems arise when solving continuous state-action problems. In a continuous space, in order to achieve a reliable estimation of the value function, it is often necessary to collect a large amount of data in the corresponding space, which is difficult to achieve in the actual complex robot system. And as the robot's degrees of freedom continue to increase, the problem of "dimensional disaster" will appear. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): B25J9/16

CPCB25J9/163

Inventor李军沈广田陈剑斌高杨建许阳

OwnerCHONGQING UNIV

Robot learning control method based on policy gradient

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology