Dynamic epsilone deep reinforcement learning method based on epsilone-greedy

A reinforcement learning, deep technique used in neural learning methods, machine learning, coinless or similar appliances

Pending Publication Date: 2022-06-07
NANJING UNIV OF INFORMATION SCI & TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the classic multi-armed slot machine problem, the traditional epsilon-greedy has strong randomness. After continuous interaction, in order to get a higher return, the value of epsilon will become smaller and smaller, and the choice will be biased to obtain a large return. Action, if there are more rods, there will be a local optimal situation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic epsilone deep reinforcement learning method based on epsilone-greedy
  • Dynamic epsilone deep reinforcement learning method based on epsilone-greedy
  • Dynamic epsilone deep reinforcement learning method based on epsilone-greedy

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0064] The simulation experiment was carried out under the python3.8 version, and the tensorflow version 2.3 framework was used, and the information of the multi-arm gambling machine was defined in the class, including the number of drawbars and the income brought by each drawbar; then tensorflow was used to build Dueling DQN algorithm deep reinforcement learning framework, input the obtained state set to Dueling DQN, and the obtained Q value is the sum of the state function value and the advantage function value. According to the TD-error of the behavior value function, it is judged whether to explore or use at this time. and update the value of epsilon.

[0065] like image 3 shown, this is the final result, where DEG is the abbreviation of the algorithm of the present invention, the full name is DynamicEpsilon Greedy; the solid line represents the expectation curve, the horizontal axis represents how much time has passed, and the vertical axis represents the regret rate. ;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for deep reinforcement learning of dynamic epsilon on the basis of epsilone-greedy. The method comprises the following steps of: firstly, carrying out deep reinforcement learning on the dynamic epsilon; the method relates to the field of exploration and utilization dilemma, and comprises the following steps: preprocessing data of the dobby machine; according to the difference between the instant reward rt + 1 and the average reward R average, judging whether to update epsilon or not; a deep reinforcement learning framework is constructed by using a Dueling DQN algorithm; according to the TD-error of the behavior value function, judging whether to explore or utilize and update the value of epsilon at the moment; and importing the TD error in the experience pool into a Dueling DQN reinforcement learning framework, and training and updating the network. According to the invention, dilemma problem research of exploration and utilization is promoted; the exploration time and the utilization time are reasonably designed; according to the method, the final result is that dynamic updating of the epsillon is achieved, the dynamic updating problem of the epsillon can be achieved, and an effective theoretical basis is provided for development of the dynamic epsillon.

Description

technical field [0001] The invention belongs to the field of deep reinforcement learning and exploration and utilization dilemma, and relates to a method for deep reinforcement learning dynamic epsilon, in particular to a deep reinforcement learning dynamic epsilon method based on epsilon-greedy. Background technique [0002] Reinforcement learning is continuous learning in the process of interacting with the environment, and the quality of data obtained in the interaction largely determines the level of policies that the agent can learn. Currently, reinforcement learning (including deep reinforcement learning DRL and multi-agent reinforcement learning MARL) has excellent performance in the fields of games, robotics, etc., but nonetheless, in the case of reaching the same level, the sample size required for reinforcement learning ( interactions) is still far more than that of humans. This need for a large number of interaction samples seriously hinders the application of re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N20/00G07F17/34
CPCG06N3/08G06N20/00G07F17/34
Inventor 孔燕曹俊豪
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products