Reinforced learning method and device based on short-time access mechanism and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning and mechanism technology, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems such as low learning efficiency of action strategies, and achieve the effect of reducing sample complexity, improving learning efficiency, and improving exploration efficiency.

Pending Publication Date: 2020-11-06

TSINGHUA UNIV

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] In view of this, this disclosure proposes a reinforcement learning technical solution based on a short-term access mechanism to solve the problem of low learning efficiency of the agent's action strategy in the reinforcement learning process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

[0047] The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

[0048] In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present disclosure may be practiced without some of the specific details. In some instances, methods, means, componen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a reinforcement learning method and device based on a short-time access mechanism, and a storage medium. The method comprises the steps: configuring a state cache list which is used for storing state increment information obtained through the change of a current environment state of an intelligent agent under the condition that the intelligent agent meets a preset short-time access mechanism; inputting all actions of the intelligent agent at the next moment into the environment state transition probability model, and outputting a plurality of environment states of allactions corresponding to the next moment; comparing the plurality of environment states at the next moment with the state increment information in the state cache list, and determining an action corresponding to the environment state with the maximum difference in the plurality of environment states as a first alternative action executed by the intelligent agent at the next moment; and executing exploration operation for reinforcement learning according to the first alternative action. According to the invention, through the state cache list, repeated exploration of the explored environment state is avoided; through the environment state transition probability model, the exploration of the intelligent agent to the unknown state is strengthened and guided, and the learning efficiency is effectively improved.

Description

technical field [0001] The present disclosure relates to the technical field of artificial intelligence, and in particular to a reinforcement learning method, device and storage medium based on a short-term access mechanism. Background technique [0002] Reinforcement learning refers to a learning method that controls the interaction between an agent (agent) and the environment in order to enable the agent to obtain the maximum reward in the environment. One of the most important problems faced in reinforcement learning is how to trade off "exploration" and "exploitation": too much reliance on "exploration" will reduce the learning efficiency of the agent's action strategy, and too much reliance on "exploitation" "It will cause the agent to be unable to learn more effective action strategies; relying solely on "exploration" or "utilization" cannot complete reinforcement learning tasks well. Traditional solutions usually use a count-based method, that is, maintain a global s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/04G06N3/08

CPCG06N3/08G06N3/045

Inventor 季向阳张宏昌

Owner TSINGHUA UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Reinforced learning method and device based on short-time access mechanism and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology