Modeling strategy searching learning method based on condition generation confrontation network
A condition generation and strategy search technology, applied in the field of machine learning, can solve problems that cannot effectively solve practical problems, and achieve the effect of solving complex decision-making problems and reasonable design
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0026] Embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.
[0027] In the implementation process of the present invention, the interactive process between the agent and the environment is modeled as a Markov decision process (MDP), and the MDP can be represented by a tuple (S, A, P T ,P I , r, γ): where S represents the continuous state space, A represents the continuous action space, and P T (s t+1 |s t , a t ) means that in the current state s t take action a t then transition to the next state s t+1 The state transition probability density function of P I (s 1 ) is the initial state probability density function of the agent, r(s t , a t ,s t+1 ) represents the immediate reward of the agent’s state transition due to taking an action, and γ∈[0, 1] is the discount factor. The specific process of MDP is: the agent is in the current perceived state Next, according to the random policy functi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

