Deep reinforcement learning method and device based on environment state prediction

A technology of reinforcement learning and environmental status, applied in the field of artificial intelligence, it can solve the problems that the algorithm cannot be obtained, does not have generality and migration, and achieves the effect of rapid migration and improved learning efficiency.

Active Publication Date: 2018-07-17
TSINGHUA UNIV
View PDF7 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

On the one hand, the basis of reinforcement learning algorithms is Markov, and a necessary condition to satisfy Markov is that the state is required to be fully observed. Therefore, for partially observable Markov decision problems, current algorithms usually use A good strategy cannot be obtained; on the other hand, most of the current reinforcement learning algorithms can only solve one task in the environment. When the task is switched, the network often needs to be retrained, which does not have good versatility and Migration needs to be solved urgently

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep reinforcement learning method and device based on environment state prediction
  • Deep reinforcement learning method and device based on environment state prediction
  • Deep reinforcement learning method and device based on environment state prediction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0034]The following describes the deep reinforcement learning method and device based on environmental state prediction according to the embodiments of the present invention with reference to the accompanying drawings. First, the deep reinforcement learning method based on environmental state prediction according to the embodiments of the present invention will be described with reference to the accompanying drawings.

[0035] figure 1 It is a flowchart of a deep reinforcement learning method based on environment state prediction...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep reinforcement learning method and device based on environment state prediction. The method comprises the following steps that: establishing a deep reinforcement learningnetwork based on the environment prediction, and selecting a proper strategy decision method according to the characteristics of tasks; initializing network parameters, and establishing a storage area which meets a storage condition as an experience replaying area; according to the output of a strategy decision network, selecting a proper strategy to interact with environment, and continuously storing the interaction information of an interaction process into the experience replaying area; sampling a first sample sequence from the experience replaying area, utilizing a supervised learning method to train an environment prediction part, and repeating a first preset frequency; sampling a second sample sequence from the experience replaying area, fixing the parameter of the environment prediction part to be constant, utilizing a reinforcement learning method to train the strategy decision part, and repeating a second preset frequency; when network convergence meets a preset condition, obtaining a reinforcement learning network. By use of the method, learning efficiency can be effectively improved.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a deep reinforcement learning method and device based on environmental state prediction. Background technique [0002] Learning in the process of interacting with the environment is a research hotspot in the field of artificial intelligence, and an important mathematical and theoretical tool to solve such problems is reinforcement learning. By solving the Markov decision process, reinforcement learning can learn a goal-oriented behavior strategy through the interaction of the unknown environment. Moreover, since reinforcement learning does not require explicit supervisory signals, and its learning process is similar to the learning behavior of animals, reinforcement learning is also considered to be a trend in solving general artificial intelligence problems. [0003] Reinforcement learning algorithms in related technologies are limited by time, space, and sample...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08
CPCG06N3/08
Inventor 陈峰陈达贵闫琪
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products