Robot learning control method based on policy gradient

A technology of learning control and robotics, applied in the direction of program control manipulators, manipulators, manufacturing tools, etc., can solve problems such as discontinuous algorithm convergence, dimension disaster, and inability to be directly applied, so as to improve learning ability and intelligence, and shorten learning time , The effect of simplifying the design difficulty

Inactive Publication Date: 2017-08-08
CHONGQING UNIV
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods are more effective for discrete state-action spaces. When solving continuous state-action problems, many problems will arise.
In a continuous space, in order to achieve a reliable estimation of the value function, it is often necessary to collect a large amount of data in the corresponding space, which is difficult to achieve in the actual complex robot system
And as the robot's degrees of freedom continue to increase, there will be a "curse of dimensionality" problem
[0003] In addition, the method based on value function approximation also faces other problems: 1. This method is often used ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robot learning control method based on policy gradient
  • Robot learning control method based on policy gradient

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The method described in the present invention will be further described in detail below in conjunction with the accompanying drawings. figure 1 The structural block diagram of the robot learning control method based on the policy gradient provided by the present invention; figure 2 The schematic diagram of the robot learning control method based on the strategy gradient provided by the present invention is shown in the figure: the robot learning control method based on the strategy gradient provided by the present invention includes the following steps:

[0021] S1: Input the status information data of the robot during the movement process and the perception information data of the interaction with the environment;

[0022] S2: According to the state information data obtained by the robot and the environmental perception information data, calculate the approximate estimation model of the timely reward and the value function;

[0023] S3: According to the obtained cumu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a strategy gradient method suitable for robot learning control, which relates to robot learning control technology, including a data acquisition module, which acquires information data during the operation of the robot; a value function approximation module, which uses observed state information and obtained The timely reward is input to obtain an approximate estimation model of the value function; the strategy gradient optimization module parameterizes the robot learning control strategy, and adjusts and optimizes the parameters to make the robot reach an ideal operating state. The action execution module maps the action output by the controller to the action command actually executed by the robot. The method proposed by the present invention can be used for different types of robots, especially multi-degree-of-freedom robots, which have the ability to learn complex actions and solve random strategies, thereby improving the intelligence of the robot, reducing the danger in the learning process, and shortening the robot life. Learning time simplifies controller design difficulty.

Description

technical field [0001] The invention relates to robot learning control technology, in particular to a robot learning control method capable of parameterizing control strategies. Background technique [0002] There are already some technical methods in the field of robot learning control, the most commonly used method is based on value function approximation. In order to obtain the value of the state-action pair, TD (Temporal Difference) learning algorithm and Q-learning algorithm are usually used. While these methods are effective for discrete state-action spaces, many problems arise when solving continuous state-action problems. In a continuous space, in order to achieve a reliable estimation of the value function, it is often necessary to collect a large amount of data in the corresponding space, which is difficult to achieve in the actual complex robot system. And as the robot's degrees of freedom continue to increase, the problem of "dimensional disaster" will appear. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): B25J9/16
CPCB25J9/163
Inventor 李军沈广田陈剑斌高杨建许阳
Owner CHONGQING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products