Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Improved deep reinforcement learning method and system based on Double DQN

A reinforcement learning and deep technology, applied in the field of reinforcement learning, can solve problems such as inability to converge

Inactive Publication Date: 2020-07-28
NANJING UNIV OF SCI & TECH
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is greatly affected by noise, which may cause unconvergent results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved deep reinforcement learning method and system based on Double DQN
  • Improved deep reinforcement learning method and system based on Double DQN
  • Improved deep reinforcement learning method and system based on Double DQN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0060] In one embodiment, combined with figure 1 , provides an improved deep reinforcement learning method based on Double Deep Q-Learning Network, which includes the following steps:

[0061] Step 1, initialize the environment and DQN network parameters;

[0062] Here, the environment includes: state space action space and reward function r; DQN network parameters include current value neural network parameters, target value neural network parameters, DQN error function and playback memory unit Among them, the neural network parameters include the number of network...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an improved deep reinforcement learning method and system based on Double DQNs, and belongs to the field of reinforcement learning, and the method comprises the following steps: initializing an environment and DQN network parameters; performing experience accumulation based on an epsilon-greedy strategy, and storing experience into a playback memory unit; and training and optimizing the DQN network by using samples in the playback memory unit to obtain a decision network. According to the method, the convergence speed of Double Q-Learning Network can be increased, the final convergence value can be optimized, the interference of noise on the effect of the DQN algorithm can be reduced, the application effect of deep reinforcement learning in actual production and life can be improved, and the application range of deep reinforcement learning can be expanded.

Description

technical field [0001] The invention belongs to the field of reinforcement learning, in particular to an improved deep reinforcement learning method and system based on Double DQN. Background technique [0002] Double Deep Q-Learning Network is one of the most common frameworks in deep reinforcement learning. It has good results in practice. DQN is divided into three parts: environment, playback memory unit and neural network. Among them, the agent interacts with the environment to obtain the current state s, and obtains the next state s' and reward r after taking an action a. The playback memory unit stores each item (s, a, s', r), and after storing a certain amount, extracts part of the data according to a certain extraction method and inputs them to the neural network for training. There are two neural networks with exactly the same network structure, namely the current value network (Q-eval) and the target value network (Q-target). The input of the current value networ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/08G06N3/045
Inventor 奚思遥王力立肖强林高尚杜万年闫晓黄成单梁张永
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products