Advantage estimation method and device, electronic equipment and storage medium

A technology of advantage and estimation model, applied in the field of reinforcement learning, can solve the problems of poor automatic decision-making effect, achieve the effect of enhancing the performance of advantage estimation, improving the performance of decision-making, and improving the accuracy of advantage estimation

Active Publication Date: 2021-08-10
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides an advantage estimation method, device, electronic equipment and

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Advantage estimation method and device, electronic equipment and storage medium
  • Advantage estimation method and device, electronic equipment and storage medium
  • Advantage estimation method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Obviously, the described embodiments are part of the embodiments of the present invention , but not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045] figure 1 A schematic flow chart of the advantage estimation method provided by the embodiment of the present invention, such as figure 1 As shown, the method includes:

[0046] Step 110, obtaining the current environment state.

[0047] Specifically, the environment state of the current decision-making scenario is acquired first. Among them, the environment state is a set of values...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an advantage estimation method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a current environment state; inputting the current environment state into the advantage estimation model to obtain an advantage action obtained by carrying out advantage estimation on the basis of the current environment state by the advantage estimation model, wherein the advantage estimation model is obtained by training based on a teaching data set and a behavior cloning model, the teaching data set comprises a sample environment state and a sample action corresponding to the sample environment state, and the behavior clone model is obtained based on training of the teaching data set. According to the invention, the advantage estimation model is trained based on the teaching data set and the behavior clone model, through the self-adaptive behavior clone model, the teaching data are fully utilized, expert experience in the historical teaching data is automatically mined, adverse effects possibly brought by incomplete teaching data are avoided, the advantage estimation performance of the advantage estimation model is enhanced, and the advantage estimation accuracy in a complex scene is improved.

Description

technical field [0001] The present invention relates to the technical field of reinforcement learning, in particular to an advantage estimation method, device, electronic equipment and storage medium. Background technique [0002] Deep Reinforcement Learning (DRL) has made great progress in recent years and is widely used in decision-making scenarios such as video games and board games. With the help of deep learning's powerful feature extraction and function fitting capabilities, reinforcement learning subjects can directly extract and learn feature knowledge from original input data (such as game images), and then use traditional reinforcement learning algorithms to learn decision-making control based on the extracted feature information. Strategies without manually extracting or learning features based on rules and heuristics. [0003] But currently, deep reinforcement learning techniques are still not practical for applications in solving complex decision-making control...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N5/00G06N3/04G06N3/08
CPCG06N5/00G06N3/084G06N3/044G06N3/045Y04S10/50
Inventor 李小双王晓黄梓铭王飞跃
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products