Experience playback sampling reinforcement learning method and system based on confidence upper bound thought

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and experience, applied in the field of reinforcement learning, can solve problems such as limiting the scope of application, achieve the effect of improving sampling efficiency, improving sampling efficiency and sample utilization, and improving exploration ability

Pending Publication Date: 2021-04-30

SHANDONG UNIV

View PDF3 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0010] From this point of view, there are many problems in the design principle and generalization ability of the existing sampling technology, which limits its application range.

The method based on experience replay technology still has a lot of room for improvement. Therefore, it is necessary to improve the experience replay sampling method for some problems to improve the sampling efficiency and application potential of deep reinforcement learning algorithms.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0077] This embodiment discloses an experience playback sampling reinforcement learning method based on the belief upper bound idea, which includes the following steps: collect the experience obtained by the interaction between the agent and the environment, and store the experience data in the experience playback pool; update the current During the training strategy, randomly select λ·K pieces of experience from the experience playback pool according to the priority probability to generate a candidate training sample set; select the training sample set according to the confidence upper bound value of each candidate training sample; The data updates the parameters of the neural network used for function approximation.

[0078] In specific implementation examples, the purpose of the present invention is achieved through the following technical solutions:

[0079] Such as figure 2 As shown, an experience replay sampling reinforcement learning strategy based on the belief upper...

Embodiment 3

[0133] The purpose of this embodiment is to provide a computing device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. step.

Embodiment 4

[0135] The purpose of this embodiment is to provide a computer-readable storage medium.

[0136] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, specific steps in the methods of the above-mentioned implementation examples are executed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an experience playback sampling reinforcement learning method and system based on a confidence upper bound thought. The method comprises the steps: collecting the experience obtained through the interaction between an intelligent agent and an environment, and storing the experience data into an experience playback pool; when the current training strategy is updated, experience is randomly selected from the experience playback pool according to the priority probability, and a candidate training sample set is generated; selecting a training sample set according to the confidence upper bound value of each candidate training sample; and updating parameters of a neural network for function approximation according to the training sample data. The technical scheme disclosed by the invention can be combined with any offline RL algorithm, so that the problems of insufficient sample utilization and low learning efficiency of an updating algorithm in related technologies are solved to a certain extent, the sampling efficiency is effectively improved, and the generalization ability of algorithm updating is further improved.

Description

technical field [0001] The disclosure belongs to the technical field of reinforcement learning, and in particular relates to an experience playback sampling reinforcement learning method and system based on the belief upper bound idea. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] Deep reinforcement learning is an important research direction in the field of artificial intelligence. Through the process of continuous interaction with the environment, agents autonomously learn the optimal strategy for action execution to maximize their cumulative rewards. Deep reinforcement learning methods have achieved great success in a variety of domains and tasks, including video games, the game of Go, and robot control. Since the huge potential of deep reinforcement learning has not been fully exploited, many works have been devoted to studying its...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/04G06N3/08

CPCG06N3/049G06N3/084G06N3/047

Inventor 刘帅韩思源王小文

Owner SHANDONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Experience playback sampling reinforcement learning method and system based on confidence upper bound thought

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 3

Embodiment 4

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology