Trust region strategy optimization method and device based on post-event experience and related equipment
An optimization method and trust region technology, applied in the field of machine learning intelligent robots, can solve problems such as slow learning speed, low exploration efficiency, and difficulty in reward function design, and achieve the effects of increasing accuracy, improving convergence speed, and reducing variance
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0039]In order to enable those skilled in the art to better understand the technical solutions in the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described The embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
[0040]The present invention is a trust region strategy optimization method based on post-experience. The principle is to use the empirical data of robot actions collected during the strategy training process under target conditions, and use the reached target points in the robot’s action empirical data as virtual targets. Poi...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap