Guidance-type policy search reinforcement learning algorithm
A technology of reinforcement learning and strategy search, applied in the field of machine learning, can solve the problems of undiscovered patents, literature reports, etc., and achieve the effect of solving the large demand for samples, accurate strategy search, and reducing the number of samples
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0025] The present invention will be further described in detail below in conjunction with the accompanying drawings and through specific embodiments. The following embodiments are only descriptive, not restrictive, and cannot limit the protection scope of the present invention.
[0026] A guided strategy search reinforcement learning algorithm, first selects high-quality learning samples according to the definition of guided learning samples, then uses the selected samples to perform gradient estimation on the objective function constructed in the present invention, and updates parameters according to the policy update principle until convergence . Specific steps are as follows:
[0027] (1) Sample collection: Under the framework of the Markov decision process, the agent is in the current state s, chooses an action a according to the current policy function π(a|s, θ), then transfers to the state s′, and receives an immediate reward r(s, a, s'). The agent collects state, act...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com