Deep Q network reinforcement learning method and device for cognitive behavior model acceleration

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning and network technology, applied in the field of reinforcement learning, can solve the problems of lack of learning ability, weak generalization adaptability, and long training time of Agent, and achieve the effect of alleviating the influence of learning efficiency.

Pending Publication Date: 2021-10-26

NAT UNIV OF DEFENSE TECH

View PDF10 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

On the one hand, the construction of cognitive knowledge is biased towards engineering programming, with weak generalization and adaptability, and no learning ability

On the other hand, although the existing deep reinforcement learning algorithms have been successful in many applications, they still have outstanding problems such as long agent training time, large computing power requirements, and slow model convergence.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0023] In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0024] It should be noted that, unless otherwise defined, the technical terms or scientific terms used in the embodiments of the present disclosure shall have ordinary meanings understood by those skilled in the art to which the present disclosure belongs. "First", "second" and similar words used in the embodiments of the present disclosure do not indicate any sequence, quantity or importance, but are only used to distinguish different components. "Comprising" or "comprising" and similar words mean that the elements or items appearing before the word include the elements or items listed after the word and their equivalents, without excluding other elements or items. Words such as "connected" or "connected" are not lim...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a deep Q network reinforcement learning method and device for cognitive behavior model acceleration, and the method comprises the steps of obtaining state information from an environment through a cognitive behavior model, obtaining cognitive behavior knowledge according to the state information, and transmitting the cognitive behavior knowledge to a heuristic strategy network; acquiring state information from the environment by using a deep reinforcement learning model; obtaining a heuristic strategy value according to the state information and the cognitive behavior knowledge by using a heuristic strategy network, and sending the heuristic strategy value to a deep Q network; using the deep Q network to obtain and execute an action according to the state information and the heuristic strategy value; using a deep reinforcement learning model to acquire a return from the environment, and performing iterative updating on the heuristic strategy network and the deep Q network; and cyclically executing the operation, and ending reinforcement learning in response to determination of depth Q network convergence. Convergence of the deep Q network is accelerated through the cognitive behavior model and the heuristic strategy network, and the influence of huge state space and sparse reward return on learning efficiency is effectively relieved.

Description

technical field [0001] The present disclosure relates to the technical field of reinforcement learning, in particular to a deep Q-network reinforcement learning method and equipment accelerated by a cognitive behavior model. Background technique [0002] The problem of sampling efficiency (Sample Efficiency) has always restricted the application of reinforcement learning algorithms in complex problems. In reinforcement learning applications, the agent learns to interact with the environment through trial and error, so a large number of interaction samples are often required to fully explore the state-action space and converge to the optimal strategy. Especially in the face of complex tasks (such as high-dimensional, continuous state space or sparse environment rewards), the problem of low sampling efficiency of reinforcement learning agents is particularly prominent. [0003] Utilizing appropriate prior knowledge or transferring the learned policy model is an effective mean...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/08G06N3/04

CPCG06N3/08G06N3/045

Inventor 黄健李嘉祥陈浩刘权张中杰付可韩润海

Owner NAT UNIV OF DEFENSE TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Deep Q network reinforcement learning method and device for cognitive behavior model acceleration

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology