Reinforcement learning exploration method and device based on generative adversarial mechanism

A technology of reinforcement learning and mechanism, applied in the field of machine learning, which can solve problems such as fluctuations in the estimation of the value of the agent and affecting the stability of training.

Active Publication Date: 2020-12-08
TSINGHUA UNIV
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] There are still some deficiencies in the above exploration methods: the exploration method based on random ideas cannot provide sufficient exploration for the agent; the exploration method of designing the intrinsic incentive function causes the value estimation of the agent to fluctuate due to the attenuation nature of the intrinsic incentive, which affects the training stability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning exploration method and device based on generative adversarial mechanism
  • Reinforcement learning exploration method and device based on generative adversarial mechanism
  • Reinforcement learning exploration method and device based on generative adversarial mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0026] The reinforcement learning exploration method and device based on the generative confrontation mechanism proposed according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

[0027] First, a reinforcement learning exploration method based on a generative confrontation mechanism proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.

[0028] figure 1 It is a flow chart of the reinforcement le...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a reinforcement learning exploration method and device based on a generative adversarial mechanism, and the method comprises the steps: constructing a first action value network, a second action value network, a state value network, a target state value network, a strategy network, a density model network and an identification network; updating the first action value network, the second action value network, the state value network, the target state value network, the strategy network, the density model network and the identification network based on a generative adversarial mechanism and a learning process of an offline reinforcement learning algorithm; and generating an updated strategy model according to the plurality of updated networks, and testing the strategymodel. According to the method, an exploration algorithm utilizing a correct decision acceleration and stable reinforcement learning training process in an exploration process is designed.

Description

technical field [0001] The invention relates to the technical field of machine learning, in particular to a reinforcement learning exploration method and device based on a generative confrontation mechanism. Background technique [0002] In recent years, computers have made huge leaps in computing and storage performance, triggering the rapid rise of deep learning. Deep learning has not only made great progress in the fields of image classification, speech recognition, and natural language processing, but also provides convenient conditions for the approximation of value functions in reinforcement learning and the expression of agent behavior strategies. Intensive learning combined with neural networks has the ability to process large-scale simulated data and complete iterative learning through gradient updates one after another, making AlphaGo, which defeated the world's Go masters, a reality. [0003] However, reinforcement learning still faces many problems, one of which...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 杨君袁凯钊马骁腾芦维宁陈章梁斌
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products