Dynamic epsilone deep reinforcement learning method based on epsilone-greedy

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning, deep technique used in neural learning methods, machine learning, coinless or similar appliances

Pending Publication Date: 2022-06-07

NANJING UNIV OF INFORMATION SCI & TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] In the classic multi-armed slot machine problem, the traditional epsilon-greedy has strong randomness. After continuous interaction, in order to get a higher return, the value of epsilon will become smaller and smaller, and the choice will be biased to obtain a large return. Action, if there are more rods, there will be a local optimal situation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment

[0064] The simulation experiment was carried out under the python3.8 version, and the tensorflow version 2.3 framework was used, and the information of the multi-arm gambling machine was defined in the class, including the number of drawbars and the income brought by each drawbar; then tensorflow was used to build Dueling DQN algorithm deep reinforcement learning framework, input the obtained state set to Dueling DQN, and the obtained Q value is the sum of the state function value and the advantage function value. According to the TD-error of the behavior value function, it is judged whether to explore or use at this time. and update the value of epsilon.

[0065] like image 3 shown, this is the final result, where DEG is the abbreviation of the algorithm of the present invention, the full name is DynamicEpsilon Greedy; the solid line represents the expectation curve, the horizontal axis represents how much time has passed, and the vertical axis represents the regret rate. ;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for deep reinforcement learning of dynamic epsilon on the basis of epsilone-greedy. The method comprises the following steps of: firstly, carrying out deep reinforcement learning on the dynamic epsilon; the method relates to the field of exploration and utilization dilemma, and comprises the following steps: preprocessing data of the dobby machine; according to the difference between the instant reward rt + 1 and the average reward R average, judging whether to update epsilon or not; a deep reinforcement learning framework is constructed by using a Dueling DQN algorithm; according to the TD-error of the behavior value function, judging whether to explore or utilize and update the value of epsilon at the moment; and importing the TD error in the experience pool into a Dueling DQN reinforcement learning framework, and training and updating the network. According to the invention, dilemma problem research of exploration and utilization is promoted; the exploration time and the utilization time are reasonably designed; according to the method, the final result is that dynamic updating of the epsillon is achieved, the dynamic updating problem of the epsillon can be achieved, and an effective theoretical basis is provided for development of the dynamic epsillon.

Description

technical field [0001] The invention belongs to the field of deep reinforcement learning and exploration and utilization dilemma, and relates to a method for deep reinforcement learning dynamic epsilon, in particular to a deep reinforcement learning dynamic epsilon method based on epsilon-greedy. Background technique [0002] Reinforcement learning is continuous learning in the process of interacting with the environment, and the quality of data obtained in the interaction largely determines the level of policies that the agent can learn. Currently, reinforcement learning (including deep reinforcement learning DRL and multi-agent reinforcement learning MARL) has excellent performance in the fields of games, robotics, etc., but nonetheless, in the case of reaching the same level, the sample size required for reinforcement learning ( interactions) is still far more than that of humans. This need for a large number of interaction samples seriously hinders the application of re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/08G06N20/00G07F17/34

CPCG06N3/08G06N20/00G07F17/34

Inventor 孔燕曹俊豪

Owner NANJING UNIV OF INFORMATION SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Dynamic epsilone deep reinforcement learning method based on epsilone-greedy

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A reinforcement learning, deep technique used in neural learning methods, machine learning, coinless or similar appliances

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning, deep technique used in neural learning methods, machine learning, coinless or similar appliances

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology