Deep reinforcement learning method and device based on environment state prediction

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and environmental status, applied in the field of artificial intelligence, it can solve the problems that the algorithm cannot be obtained, does not have generality and migration, and achieves the effect of rapid migration and improved learning efficiency.

Active Publication Date: 2018-07-17

TSINGHUA UNIV

View PDF7 Cites 27 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

On the one hand, the basis of reinforcement learning algorithms is Markov, and a necessary condition to satisfy Markov is that the state is required to be fully observed. Therefore, for partially observable Markov decision problems, current algorithms usually use A good strategy cannot be obtained; on the other hand, most of the current reinforcement learning algorithms can only solve one task in the environment. When the task is switched, the network often needs to be retrained, which does not have good versatility and Migration needs to be solved urgently

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0034]The following describes the deep reinforcement learning method and device based on environmental state prediction according to the embodiments of the present invention with reference to the accompanying drawings. First, the deep reinforcement learning method based on environmental state prediction according to the embodiments of the present invention will be described with reference to the accompanying drawings.

[0035] figure 1 It is a flowchart of a deep reinforcement learning method based on environment state prediction...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deep reinforcement learning method and device based on environment state prediction. The method comprises the following steps that: establishing a deep reinforcement learningnetwork based on the environment prediction, and selecting a proper strategy decision method according to the characteristics of tasks; initializing network parameters, and establishing a storage area which meets a storage condition as an experience replaying area; according to the output of a strategy decision network, selecting a proper strategy to interact with environment, and continuously storing the interaction information of an interaction process into the experience replaying area; sampling a first sample sequence from the experience replaying area, utilizing a supervised learning method to train an environment prediction part, and repeating a first preset frequency; sampling a second sample sequence from the experience replaying area, fixing the parameter of the environment prediction part to be constant, utilizing a reinforcement learning method to train the strategy decision part, and repeating a second preset frequency; when network convergence meets a preset condition, obtaining a reinforcement learning network. By use of the method, learning efficiency can be effectively improved.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a deep reinforcement learning method and device based on environmental state prediction. Background technique [0002] Learning in the process of interacting with the environment is a research hotspot in the field of artificial intelligence, and an important mathematical and theoretical tool to solve such problems is reinforcement learning. By solving the Markov decision process, reinforcement learning can learn a goal-oriented behavior strategy through the interaction of the unknown environment. Moreover, since reinforcement learning does not require explicit supervisory signals, and its learning process is similar to the learning behavior of animals, reinforcement learning is also considered to be a trend in solving general artificial intelligence problems. [0003] Reinforcement learning algorithms in related technologies are limited by time, space, and sample...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/08

CPCG06N3/08

Inventor 陈峰陈达贵闫琪

Owner TSINGHUA UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Deep reinforcement learning method and device based on environment state prediction

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology