Advantage estimation method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of advantage and estimation model, applied in the field of reinforcement learning, can solve the problems of poor automatic decision-making effect, achieve the effect of enhancing the performance of advantage estimation, improving the performance of decision-making, and improving the accuracy of advantage estimation

Active Publication Date: 2021-08-10

INST OF AUTOMATION CHINESE ACAD OF SCI

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The present invention provides an advantage estimation method, device, electronic equipment and storage medium to solve the defect of poor automatic decision-making in complex scenarios in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Obviously, the described embodiments are part of the embodiments of the present invention , but not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045] figure 1 A schematic flow chart of the advantage estimation method provided by the embodiment of the present invention, such as figure 1 As shown, the method includes:

[0046] Step 110, obtaining the current environment state.

[0047] Specifically, the environment state of the current decision-making scenario is acquired first. Among them, the environment state is a set of values...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an advantage estimation method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a current environment state; inputting the current environment state into the advantage estimation model to obtain an advantage action obtained by carrying out advantage estimation on the basis of the current environment state by the advantage estimation model, wherein the advantage estimation model is obtained by training based on a teaching data set and a behavior cloning model, the teaching data set comprises a sample environment state and a sample action corresponding to the sample environment state, and the behavior clone model is obtained based on training of the teaching data set. According to the invention, the advantage estimation model is trained based on the teaching data set and the behavior clone model, through the self-adaptive behavior clone model, the teaching data are fully utilized, expert experience in the historical teaching data is automatically mined, adverse effects possibly brought by incomplete teaching data are avoided, the advantage estimation performance of the advantage estimation model is enhanced, and the advantage estimation accuracy in a complex scene is improved.

Description

technical field [0001] The present invention relates to the technical field of reinforcement learning, in particular to an advantage estimation method, device, electronic equipment and storage medium. Background technique [0002] Deep Reinforcement Learning (DRL) has made great progress in recent years and is widely used in decision-making scenarios such as video games and board games. With the help of deep learning's powerful feature extraction and function fitting capabilities, reinforcement learning subjects can directly extract and learn feature knowledge from original input data (such as game images), and then use traditional reinforcement learning algorithms to learn decision-making control based on the extracted feature information. Strategies without manually extracting or learning features based on rules and heuristics. [0003] But currently, deep reinforcement learning techniques are still not practical for applications in solving complex decision-making control...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N5/00G06N3/04G06N3/08

CPCG06N5/00G06N3/084G06N3/044G06N3/045Y04S10/50

Inventor李小双王晓黄梓铭王飞跃

OwnerINST OF AUTOMATION CHINESE ACAD OF SCI

Advantage estimation method and device, electronic equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology