A deep reinforcement learning method and equipment based on a plurality of historical optimal Q networks

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning and network technology, applied in the field of deep reinforcement learning, can solve problems such as low interactive evaluation scores, and achieve the effects of reducing the amount of calculation, stabilizing the training process, and improving the efficiency of use.

Inactive Publication Date: 2019-06-21

INST OF SOFTWARE - CHINESE ACAD OF SCI

View PDF6 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Based on the above-mentioned inventive concept, aiming at the existing intelligent robot interactive system, the present invention proposes a deep reinforcement learning method based on multiple historical best Q-networks, which makes the training process more stable and solves the problem of late training in the prior art. , the lower the interaction evaluation score is for technical issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0048] In the following, specific embodiments of the present invention will be described in detail in conjunction with the examples and accompanying drawings. The embodiments depicted here are only used to illustrate and explain the present invention, but not to limit the present invention.

[0049] Aiming at the interactive system of intelligent robots including agents, the deep reinforcement learning method based on multiple historical best Q-networks proposed by the present invention, such as figure 1 As shown, it mainly includes the following steps: First, define the attributes and rules of the single agent, clarify the state space and action space of the agent, construct or call the single-agent motion interaction environment, that is, observe the environment o, according to the strategy π, the intelligence Then, based on the interactive evaluation score, select the best Q-networks from all historical Q-networks according to the interactive evaluation score of each round;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a deep reinforcement learning method and device based on multiple historical optimal Q networks for an intelligent robot interaction system comprising an agent, and the method comprises the steps: defining the attributes and rules of the agent, defining the state space and action space of the agent, and constructing or calling an agent motion environment; selecting an optimal plurality of Q networks from all historical Q networks based on the level of the interaction evaluation score; combining the plurality of historical optimal Q networks with the current Q network byusing a maximum operation, guiding an agent to select an action strategy, training parameters of a learning model, and autonomously carrying out a next decision action according to an environment where the agent is located. According to the method, a reasonable motion environment can be constructed according to actual needs, the optimal Q network generated in the training process is used for better guiding the intelligent agent to make decisions, the purpose of intelligent strategy optimization is achieved, and the method has a positive effect on development of robots and unmanned systems in China.

Description

technical field [0001] The invention belongs to the technical field of computer artificial intelligence, and in particular relates to a deep reinforcement learning method and equipment based on multiple historical best Q-networks in an intelligent robot interactive system. Background technique [0002] In recent years, reinforcement learning has been widely used in artificial intelligence because of its excellent decision-making ability, so it is often used in the interactive system of intelligent robots. In reinforcement learning (RL), an agent seeks an optimal policy to solve a continuous decision problem by optimizing the accumulated future reward signal (see [1]). Over time, many popular reinforcement learning algorithms have been proposed including Q-learning (see [2]), temporal difference learning (see [3]), and policy gradient methods (see [4]). However, the reinforcement learning (RL) algorithms of these early methods mainly rely on manual feature extraction, and th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N20/00

Inventor 王瑞俞文武李瑞英胡晓惠

Owner INST OF SOFTWARE - CHINESE ACAD OF SCI

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A deep reinforcement learning method and equipment based on a plurality of historical optimal Q networks

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology