Reinforcement learning exploration method and device based on generative adversarial mechanism

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and mechanism, applied in the field of machine learning, which can solve problems such as fluctuations in the estimation of the value of the agent and affecting the stability of training.

Active Publication Date: 2020-12-08

TSINGHUA UNIV

View PDF4 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] There are still some deficiencies in the above exploration methods: the exploration method based on random ideas cannot provide sufficient exploration for the agent; the exploration method of designing the intrinsic incentive function causes the value estimation of the agent to fluctuate due to the attenuation nature of the intrinsic incentive, which affects the training stability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0026] The reinforcement learning exploration method and device based on the generative confrontation mechanism proposed according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

[0027] First, a reinforcement learning exploration method based on a generative confrontation mechanism proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.

[0028] figure 1 It is a flow chart of the reinforcement le...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a reinforcement learning exploration method and device based on a generative adversarial mechanism, and the method comprises the steps: constructing a first action value network, a second action value network, a state value network, a target state value network, a strategy network, a density model network and an identification network; updating the first action value network, the second action value network, the state value network, the target state value network, the strategy network, the density model network and the identification network based on a generative adversarial mechanism and a learning process of an offline reinforcement learning algorithm; and generating an updated strategy model according to the plurality of updated networks, and testing the strategymodel. According to the method, an exploration algorithm utilizing a correct decision acceleration and stable reinforcement learning training process in an exploration process is designed.

Description

technical field [0001] The invention relates to the technical field of machine learning, in particular to a reinforcement learning exploration method and device based on a generative confrontation mechanism. Background technique [0002] In recent years, computers have made huge leaps in computing and storage performance, triggering the rapid rise of deep learning. Deep learning has not only made great progress in the fields of image classification, speech recognition, and natural language processing, but also provides convenient conditions for the approximation of value functions in reinforcement learning and the expression of agent behavior strategies. Intensive learning combined with neural networks has the ability to process large-scale simulated data and complete iterative learning through gradient updates one after another, making AlphaGo, which defeated the world's Go masters, a reality. [0003] However, reinforcement learning still faces many problems, one of which...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06N3/04G06N3/08

CPCG06N3/08G06N3/045

Inventor杨君袁凯钊马骁腾芦维宁陈章梁斌

OwnerTSINGHUA UNIV

Reinforcement learning exploration method and device based on generative adversarial mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology