Unlock instant, AI-driven research and patent intelligence for your innovation.

Modeling strategy searching learning method based on condition generation confrontation network

A condition generation and strategy search technology, applied in the field of machine learning, can solve problems that cannot effectively solve practical problems, and achieve the effect of solving complex decision-making problems and reasonable design

Inactive Publication Date: 2018-04-13
TIANJIN UNIV OF SCI & TECH
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] To sum up, although there have been some research results on the modeled policy search learning method, it still cannot effectively solve the practical problems in practical applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Modeling strategy searching learning method based on condition generation confrontation network
  • Modeling strategy searching learning method based on condition generation confrontation network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0027] In the implementation process of the present invention, the interactive process between the agent and the environment is modeled as a Markov decision process (MDP), and the MDP can be represented by a tuple (S, A, P T ,P I , r, γ): where S represents the continuous state space, A represents the continuous action space, and P T (s t+1 |s t , a t ) means that in the current state s t take action a t then transition to the next state s t+1 The state transition probability density function of P I (s 1 ) is the initial state probability density function of the agent, r(s t , a t ,s t+1 ) represents the immediate reward of the agent’s state transition due to taking an action, and γ∈[0, 1] is the discount factor. The specific process of MDP is: the agent is in the current perceived state Next, according to the random policy functi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a modeling strategy searching learning method based on a condition generation confrontation network. The method is characterized by comprising following steps of collecting areal state transferring sample of an environment; constructing a condition generation confrontation network model, wherein the condition generation confrontation network comprises a generator and a discriminator; by use of the real state transferring sample, training the condition generation confrontation network model until convergence so that the generator obtained through the training is an environment state transferring prediction model; generating enough quantity of path samples; and by use of a path sample updating strategy, searching parameters of the strategy model in the reinforcementlearning algorithm until the parameters of the strategy model are updated and converged. According to the invention, the method is proper in design; when the generator of the environment model is obtained, there is no need to require the additional expense to carry out the sampling of the samples during learning of the strategies; and a complex decision problem in a large-scale environment can besystematically and effectively solved.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and relates to a reinforcement learning algorithm, in particular to a modeling strategy search learning method based on a conditional generation confrontation network. Background technique [0002] Reinforcement learning (RL) is an important learning method in the field of machine learning. It mainly studies how the agent makes better decisions according to the environment at that time. one of the research areas of interest. [0003] Reinforcement learning describes the process of continuous decision-making and control of agents to achieve tasks. It does not require prior knowledge like supervised learning, nor does it require experts to give accurate reference standards, but acquires knowledge by interacting with the environment. , make action selection autonomously, and finally find an optimal action selection strategy suitable for the current state, and obtain the maximum cumulative ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N99/00
CPCG06N20/00
Inventor 赵婷婷孔乐杨巨成胡志强任德化
Owner TIANJIN UNIV OF SCI & TECH