Experience playback sampling reinforcement learning method and system based on confidence upper bound thought

A technology of reinforcement learning and experience, applied in the field of reinforcement learning, can solve problems such as limiting the scope of application, achieve the effect of improving sampling efficiency, improving sampling efficiency and sample utilization, and improving exploration ability

Pending Publication Date: 2021-04-30
SHANDONG UNIV
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] From this point of view, there are many problems in the design principle and generalization ability of the existing sampling technology, which limits its application range.
The method based on experience replay technology still has a lot of room for improvement. Therefore, it is necessary to improve the experience replay sampling method for some problems to improve the sampling efficiency and application potential of deep reinforcement learning algorithms.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Experience playback sampling reinforcement learning method and system based on confidence upper bound thought
  • Experience playback sampling reinforcement learning method and system based on confidence upper bound thought
  • Experience playback sampling reinforcement learning method and system based on confidence upper bound thought

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] This embodiment discloses an experience playback sampling reinforcement learning method based on the belief upper bound idea, which includes the following steps: collect the experience obtained by the interaction between the agent and the environment, and store the experience data in the experience playback pool; update the current During the training strategy, randomly select λ·K pieces of experience from the experience playback pool according to the priority probability to generate a candidate training sample set; select the training sample set according to the confidence upper bound value of each candidate training sample; The data updates the parameters of the neural network used for function approximation.

[0078] In specific implementation examples, the purpose of the present invention is achieved through the following technical solutions:

[0079] Such as figure 2 As shown, an experience replay sampling reinforcement learning strategy based on the belief upper...

Embodiment 3

[0133] The purpose of this embodiment is to provide a computing device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. step.

Embodiment 4

[0135] The purpose of this embodiment is to provide a computer-readable storage medium.

[0136] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, specific steps in the methods of the above-mentioned implementation examples are executed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an experience playback sampling reinforcement learning method and system based on a confidence upper bound thought. The method comprises the steps: collecting the experience obtained through the interaction between an intelligent agent and an environment, and storing the experience data into an experience playback pool; when the current training strategy is updated, experience is randomly selected from the experience playback pool according to the priority probability, and a candidate training sample set is generated; selecting a training sample set according to the confidence upper bound value of each candidate training sample; and updating parameters of a neural network for function approximation according to the training sample data. The technical scheme disclosed by the invention can be combined with any offline RL algorithm, so that the problems of insufficient sample utilization and low learning efficiency of an updating algorithm in related technologies are solved to a certain extent, the sampling efficiency is effectively improved, and the generalization ability of algorithm updating is further improved.

Description

technical field [0001] The disclosure belongs to the technical field of reinforcement learning, and in particular relates to an experience playback sampling reinforcement learning method and system based on the belief upper bound idea. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] Deep reinforcement learning is an important research direction in the field of artificial intelligence. Through the process of continuous interaction with the environment, agents autonomously learn the optimal strategy for action execution to maximize their cumulative rewards. Deep reinforcement learning methods have achieved great success in a variety of domains and tasks, including video games, the game of Go, and robot control. Since the huge potential of deep reinforcement learning has not been fully exploited, many works have been devoted to studying its...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/049G06N3/084G06N3/047
Inventor 刘帅韩思源王小文
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products