Actor-Critic neural network continuous control-based fast learning algorithm

A learning algorithm and neural network technology, applied in the field of fast learning algorithm based on Actor-Critic neural network continuous control, can solve problems such as random sampling and low learning efficiency

Inactive Publication Date: 2018-05-15
HUBEI UNIV OF TECH
View PDF0 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method based only on the size of the time difference error TD_error sometimes performs worse than random sampling, resulting in low learning efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Actor-Critic neural network continuous control-based fast learning algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0056] Such as figure 1 Shown is a schematic flow chart of the fast learning algorithm based on Actor-Critic neural network continuous control, including the following steps:

[0057] Step 1) Initialize

[0058] 1.1) Experience pool initialization: set the experience pool as a two-dimensional matrix with m rows and n columns, and initialize the value of each element in the two-dimensional matrix to 0, where m is the size of the sample and n is the information stored in each sample Quantity, n=2×state_dim+action_dim+3, state_dim is the dimension of the state, action_dim is the dimension of the action; at the same time, reserve space in the experience pool for storing reward information, usage traces and time difference errors, n= The 3 in the formula 2×state_dim+action_dim+3 is the reserved space for storing reward information, usage...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an Actor-Critic neural network continuous control-based fast learning algorithm. The algorithm comprises the following steps of initializing an experience pool, initializing aneural network, constructing output interference, accumulating the experience tool, and sampling and training a deep reinforcement learning neural network according to a priority number prop. According to the algorithm, sampling is optimized according to the priority number prop calculated by TD-diff, sigmoid-TD and a using trace UT, and a convergence speed of TD-error is improved, so that the learning speed of the algorithm is improved.

Description

technical field [0001] The invention belongs to the technical field of reinforcement learning algorithms, in particular to a fast learning algorithm based on Actor-Critic neural network continuous control. Background technique [0002] In recent years, deep reinforcement learning has been brilliant. The Go program AlphaGo developed by Google successfully defeated the world's top chess player Lee Sedol, setting off a wave of artificial intelligence worldwide. The success of AlphaGo is attributed to the deep reinforcement learning algorithm. Most of the current deep reinforcement learning algorithms use the method of memory replay. The concept of memory playback was proposed in 1993. By 2013, with the introduction of the DQN algorithm, memory playback has been widely used in various aspects of deep reinforcement learning. However, because memory playback often adopts random sampling, the neural network repeatedly learns some states, and the priority state cannot be learned, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/08G06N3/04
Inventor 柯丰恺周唯倜赵大兴孙国栋许万丁国龙吴震宇赵迪
Owner HUBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products