Actor-Critic neural network continuous control-based fast learning algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A learning algorithm and neural network technology, applied in the field of fast learning algorithm based on Actor-Critic neural network continuous control, can solve problems such as random sampling and low learning efficiency

Inactive Publication Date: 2018-05-15

HUBEI UNIV OF TECH

View PDF0 Cites 35 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The method based only on the size of the time difference error TD_error sometimes performs worse than random sampling, resulting in low learning efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0055] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0056] Such as figure 1 Shown is a schematic flow chart of the fast learning algorithm based on Actor-Critic neural network continuous control, including the following steps:

[0057] Step 1) Initialize

[0058] 1.1) Experience pool initialization: set the experience pool as a two-dimensional matrix with m rows and n columns, and initialize the value of each element in the two-dimensional matrix to 0, where m is the size of the sample and n is the information stored in each sample Quantity, n=2×state_dim+action_dim+3, state_dim is the dimension of the state, action_dim is the dimension of the action; at the same time, reserve space in the experience pool for storing reward information, usage traces and time difference errors, n= The 3 in the formula 2×state_dim+action_dim+3 is the reserved space for storing reward information, usage...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an Actor-Critic neural network continuous control-based fast learning algorithm. The algorithm comprises the following steps of initializing an experience pool, initializing aneural network, constructing output interference, accumulating the experience tool, and sampling and training a deep reinforcement learning neural network according to a priority number prop. According to the algorithm, sampling is optimized according to the priority number prop calculated by TD-diff, sigmoid-TD and a using trace UT, and a convergence speed of TD-error is improved, so that the learning speed of the algorithm is improved.

Description

technical field [0001] The invention belongs to the technical field of reinforcement learning algorithms, in particular to a fast learning algorithm based on Actor-Critic neural network continuous control. Background technique [0002] In recent years, deep reinforcement learning has been brilliant. The Go program AlphaGo developed by Google successfully defeated the world's top chess player Lee Sedol, setting off a wave of artificial intelligence worldwide. The success of AlphaGo is attributed to the deep reinforcement learning algorithm. Most of the current deep reinforcement learning algorithms use the method of memory replay. The concept of memory playback was proposed in 1993. By 2013, with the introduction of the DQN algorithm, memory playback has been widely used in various aspects of deep reinforcement learning. However, because memory playback often adopts random sampling, the neural network repeatedly learns some states, and the priority state cannot be learned, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/08G06N3/04

CPCG06N3/08G06N3/04

Inventor 柯丰恺周唯倜赵大兴孙国栋许万丁国龙吴震宇赵迪

Owner HUBEI UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Actor-Critic neural network continuous control-based fast learning algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology