A deep reinforcement learning method and equipment based on a plurality of historical optimal Q networks

A reinforcement learning and network technology, applied in the field of deep reinforcement learning, can solve problems such as low interactive evaluation scores, and achieve the effects of reducing the amount of calculation, stabilizing the training process, and improving the efficiency of use.

Inactive Publication Date: 2019-06-21
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF6 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Based on the above-mentioned inventive concept, aiming at the existing intelligent robot interactive system, the present invention proposes a deep reinforcement learning method based on multiple historical best Q-networks, which makes the training process more stable and solves the problem of late training in the prior art. , the lower the interaction evaluation score is for technical issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A deep reinforcement learning method and equipment based on a plurality of historical optimal Q networks
  • A deep reinforcement learning method and equipment based on a plurality of historical optimal Q networks
  • A deep reinforcement learning method and equipment based on a plurality of historical optimal Q networks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In the following, specific embodiments of the present invention will be described in detail in conjunction with the examples and accompanying drawings. The embodiments depicted here are only used to illustrate and explain the present invention, but not to limit the present invention.

[0049] Aiming at the interactive system of intelligent robots including agents, the deep reinforcement learning method based on multiple historical best Q-networks proposed by the present invention, such as figure 1 As shown, it mainly includes the following steps: First, define the attributes and rules of the single agent, clarify the state space and action space of the agent, construct or call the single-agent motion interaction environment, that is, observe the environment o, according to the strategy π, the intelligence Then, based on the interactive evaluation score, select the best Q-networks from all historical Q-networks according to the interactive evaluation score of each round;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a deep reinforcement learning method and device based on multiple historical optimal Q networks for an intelligent robot interaction system comprising an agent, and the method comprises the steps: defining the attributes and rules of the agent, defining the state space and action space of the agent, and constructing or calling an agent motion environment; selecting an optimal plurality of Q networks from all historical Q networks based on the level of the interaction evaluation score; combining the plurality of historical optimal Q networks with the current Q network byusing a maximum operation, guiding an agent to select an action strategy, training parameters of a learning model, and autonomously carrying out a next decision action according to an environment where the agent is located. According to the method, a reasonable motion environment can be constructed according to actual needs, the optimal Q network generated in the training process is used for better guiding the intelligent agent to make decisions, the purpose of intelligent strategy optimization is achieved, and the method has a positive effect on development of robots and unmanned systems in China.

Description

technical field [0001] The invention belongs to the technical field of computer artificial intelligence, and in particular relates to a deep reinforcement learning method and equipment based on multiple historical best Q-networks in an intelligent robot interactive system. Background technique [0002] In recent years, reinforcement learning has been widely used in artificial intelligence because of its excellent decision-making ability, so it is often used in the interactive system of intelligent robots. In reinforcement learning (RL), an agent seeks an optimal policy to solve a continuous decision problem by optimizing the accumulated future reward signal (see [1]). Over time, many popular reinforcement learning algorithms have been proposed including Q-learning (see [2]), temporal difference learning (see [3]), and policy gradient methods (see [4]). However, the reinforcement learning (RL) algorithms of these early methods mainly rely on manual feature extraction, and th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
Inventor 王瑞俞文武李瑞英胡晓惠
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products