Strategy collaborative selection method based on deep reinforcement learning DDPG algorithm framework

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A reinforcement learning and strategy technology, applied in the field of reinforcement learning, can solve problems such as excessive fluctuation of the strategy network, and achieve the effect of increasing generalization, improving overestimation problems, and improving overfitting.

Pending Publication Date: 2021-06-04

UNIV OF ELECTRONICS SCI & TECH OF CHINA

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The purpose of the present invention is to provide a strategy collaborative selection method based on the deep reinforcement learning DDPG algorithm framework, which changes the network structure of DDPG, effectively improves the overestimation problem of DDPG, and avoids the problem of excessive fluctuation of the strategy network

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0024] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0025] In describing the present invention, it should be understood that the terms "length", "width", "upper", "lower", "front", "rear", "left", "right", "vertical", The orientation or positional relationship indicated by "horizontal", "top", "bottom", "inner", "outer", etc. are based on the orientation or positional relationship shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than Nothing indicating or implying that a referenced device or element...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a strategy cooperation selection method based on a deep reinforcement learning DDPG algorithm framework. According to the strategy cooperation selection method, an output action is selected by adopting a strategy cooperation mode, a pair of strategy network output actions are used for evaluation, a Q value obtained by evaluation is used as a weight, and the action is selected by using a probability. Policy cooperation can reduce the possibility of local optimum, improve overfitting, reduce strategy fluctuation and increase stability. And in addition, dropout is added into the actor network, so that the coupling is reduced, the generalization is improved, and the training speed is increased. Meanwhile, according to the thought of a TD3 algorithm, noise is added after the action is selected by the actor target network so as to reduce the size of errors, the network structure of the DDPG is changed, the problem of over-estimation of the DDPG is effectively solved, and the problem of overlarge fluctuation of a strategy network is avoided.

Description

technical field [0001] The present invention relates to the technical field of reinforcement learning, in particular to a method for strategic cooperation selection based on a deep reinforcement learning DDPG algorithm framework. Background technique [0002] The problem discussed in reinforcement learning is how an agent finds a strategy to maximize the reward it can obtain in a complex and uncertain environment. Lillicrap et al. proposed the DDPG (deep deterministic policy gradient) algorithm in 2015, which is a deep reinforcement learning algorithm on the actor-critic framework (Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez , T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning.). DDPG is the first reinforcement learning algorithm to efficiently solve many high-dimensional continuous control tasks. It is also a deterministic policy gradient algorithm based on actor-critic architecture. Contains actor current ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06N3/08

CPCG06N3/08

Inventor钟颖嘉朱清新

OwnerUNIV OF ELECTRONICS SCI & TECH OF CHINA

Strategy collaborative selection method based on deep reinforcement learning DDPG algorithm framework

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology