Deep Q network reinforcement learning method and device for cognitive behavior model acceleration

A reinforcement learning and network technology, applied in the field of reinforcement learning, can solve the problems of lack of learning ability, weak generalization adaptability, and long training time of Agent, and achieve the effect of alleviating the influence of learning efficiency.
CN113554166APending Publication Date: 2021-10-26NAT UNIV OF DEFENSE TECH

Patent Information

Authority / Receiving Office
CN Β· China
Current Assignee / Owner
NAT UNIV OF DEFENSE TECH
Publication Date
2021-10-26

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a deep Q network reinforcement learning method and device for cognitive behavior model acceleration, and the method comprises the steps of obtaining state information from an environment through a cognitive behavior model, obtaining cognitive behavior knowledge according to the state information, and transmitting the cognitive behavior knowledge to a heuristic strategy network; acquiring state information from the environment by using a deep reinforcement learning model; obtaining a heuristic strategy value according to the state information and the cognitive behavior knowledge by using a heuristic strategy network, and sending the heuristic strategy value to a deep Q network; using the deep Q network to obtain and execute an action according to the state information and the heuristic strategy value; using a deep reinforcement learning model to acquire a return from the environment, and performing iterative updating on the heuristic strategy network and the deep Q network; and cyclically executing the operation, and ending reinforcement learning in response to determination of depth Q network convergence. Convergence of the deep Q network is accelerated through the cognitive behavior model and the heuristic strategy network, and the influence of huge state space and sparse reward return on learning efficiency is effectively relieved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present disclosure relates to the technical field of reinforcement learning, in particular to a deep Q-network reinforcement learning method and equipment accelerated by a cognitive behavior model. Background technique

[0002] The problem of sampling efficiency (Sample Efficiency) has always restricted the application of reinforcement learning algorithms in complex problems. In reinforcement learning applications, the agent learns to interact with the environment through trial and error, so a large number of interaction samples are often required to fully explore the state-action space and converge to the optimal strategy. Especially in the face of complex tasks (such as high-dimensional, continuous state space or sparse environment rewards), the problem of low sampling efficiency of reinforcement learning agents is particularly prominent.

[0003] Utilizing appropriate prior knowledge or transferring the learned policy model is an effective mean...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More