Reinforcement learning model optimization method and device, storage medium and electronic equipment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An optimization method and reinforcement learning technology, applied in the field of deep reinforcement learning, can solve problems such as lack of generalization ability, difficulty in reproducing, and insufficient utilization of experience, and achieve the goals of improving learning quality, good usability, and good generalization Effect

Pending Publication Date: 2021-09-24

JILIN UNIV

View PDF0 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Although the direction of the optimization method for the reinforcement learning model of the agent is novel and has attracted much attention, it also has the problem of low utilization efficiency of sampling experience and the problem of too much dependence on the modeling of the environment due to its characteristics.

The former problem is because the historical data provided to the ActorCritic network for updating is usually replaced once the on-orbit algorithm is updated, causing a lot of problems of insufficient utilization of experience, and interacting with the environment to generate a large amount of data, only after The simple random sampling method cannot quickly and accurately enable the agent to learn effective information

The latter problem is due to the lack of generalization ability of the existing deep reinforcement learning, which leads to the agent's excellent performance in one environment, but it is difficult to reproduce in another environment.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0026] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0027] In the description of the present application, it should be understood that the terms "first", "second" and so on are used for descriptive purposes only, and should not be understood as indicating or implying relative importance. In the description of the present application, it should be noted that, unless otherwise specified and limited, "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a reinforcement learning model optimization method and device, a storage medium and electronic equipment, and relates to the field of deep reinforcement learning. The method comprises: obtaining a variational reasoning network based on historical data generated by interaction between an actor reviewer network and an environment, and training the actor reviewer network through the variational reasoning network; generating an initial actor double-reviewer network based on the trained actor reviewer network; and replacing a dominant function in the initial actor double-commentator network with a generalized dominant function to obtain an actor double-commentator network corresponding to the initial actor double-commentator network. By adopting the embodiment of the invention, the utilization rate of historical data can be improved, the generalization of the deep reinforcement learning model is improved, and the training time is reduced.

Description

technical field [0001] The present application relates to the field of deep reinforcement learning, in particular to an optimization method, device, storage medium and electronic equipment for a reinforcement learning model. Background technique [0002] Reinforcement learning refers to an unsupervised active learning method in which the agent (Agent) accumulates frequent interactions with the environment to obtain rewards, with the goal of maximizing the ultimate benefit. The quality of the actions of the agent is completely determined by the signal of the environmental feedback, and this is used as a way to guide the reinforcement learning model, and through exploration to obtain strategies that can obtain greater value. An excellent reinforcement learning algorithm can well balance exploration (Exploration) and development (Exploitation), and better explore the environment after maximizing the development of known feedback information. [0003] Although the direction of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N20/00G06N5/04G06F17/15

CPCG06N20/00G06N5/04G06F17/15

Inventor 张棋杨博陈贺昌孙智孝朴海音詹光常毅

Owner JILIN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Reinforcement learning model optimization method and device, storage medium and electronic equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology