Method, device and storage medium for continuous space action planning of intelligent agent

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of action planning and intelligent body, which is applied in neural learning methods, based on specific mathematical models, and program-controlled manipulators. It can solve problems such as delays, inability to use interactive information, and inability to use, so as to reduce control delays, estimate accurately, and state value accurate effect

Active Publication Date: 2021-01-26

FUDAN UNIV

View PDF5 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0024] Although KR-UCT improves the simulation efficiency of the Monte Carlo tree, it still cannot use the interactive information with the external environment, that is, KR-UCT cannot use the information interacted with the environment in the previous t-1 step at step t.

KR-UCT is memoryless, that is, KR-UCT cannot use the interactive information between the previous t steps and the environment when performing the Monte Carlo tree search at step t+1, so in each step, in order to obtain a high-dimensional continuous action space To choose the best action, KR-UCT has to spend a lot of time deducing the future, causing delays in control

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0065] combine figure 1 , Figure 4 As shown, the present embodiment provides a continuous space action planning method for an agent, which includes the following steps:

[0066] S1. Combining the state observations in the continuous space action process of the agent into a vector to form a state S t , the driving control quantity in the continuous space action process of the agent is formed into a vector to form an action a t ;

[0067] S2. Construct and train a neural network model, including a strategy network module and a value network module. The strategy network module is used to obtain the action probability distribution in a certain state, and the value network module is used to calculate the value in a certain state. Repeat S2 to train and update the neural network according to the data interacted with the environment;

[0068] S3, KR-PV-UCT simulation, including the following four processes in turn (specifically as image 3 shown):

[0069] Selection process: s...

Embodiment 2

[0118] This embodiment provides a device for continuous space action planning of an agent, including a memory and a processor, the memory is used to store a computer program, and the processor is used to implement the computer program as in Embodiment 1 when executing the computer program. The steps of the method for the continuous space action planning of the agent, here, the method for the continuous space action planning of the agent is the same as that of Embodiment 1, and will not be repeated here.

Embodiment 3

[0120] This embodiment provides a storage medium for continuous space action planning of an agent, on which a computer program is stored, and when the computer program is executed by a processor, the method for the continuous space action planning of an agent as in Embodiment 1 is implemented. Steps, here, the method for the continuous space action planning of the agent is the same as that in Embodiment 1, and will not be repeated here.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method, device and a storage medium for continuous space action planning of an intelligent agent. The method comprises the steps of S1, forming a state St through a state observation quantity composition vector in a continuous space action process of the intelligent agent, and forming an action at through a drive control quantity composition vector in the continuous space action process of the intelligent agent; S2, constructing and training a neural network model, and training and updating the neural network model at set intervals according to data interacted with the environment; S3, performing KR-PV-UCT simulation on the basis of the neural network model, wherein the KR-PV-UCT simulation comprises a selection process, an extension process, an evaluation process and a back propagation process; and S4, selecting the optimal action under the current root node to interact with the environment, enabling an intelligent agent to reach the next state, and repeating the steps S3 to S4. Compared with the prior art, the KR-UCT and the neural network are fused and applied to the high-dimensional continuous action space, control delay is reduced on the premise thatthe effect is guaranteed, and efficient planning of continuous space action of the intelligent agent is achieved.

Description

technical field [0001] The present invention relates to an intelligent body action planning method, in particular to a method, device and storage medium for intelligent body continuous space action planning. Background technique [0002] The real world is a four-dimensional continuous space, and most scenarios involve high-dimensional continuous action planning. For example, the human body is driven by hundreds of muscles. In order to complete a basic task, the brain needs to find the best way to activate these muscles; each organization is composed of different people, and the leader of the organization needs to find out The best way to drive these people to accomplish a certain goal. Action planning tasks in high-dimensional continuous action spaces are so common in life, so solving action planning tasks in high-dimensional continuous action spaces has very important practical significance. The action planning task in the high-dimensional continuous action space involves...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): B25J9/16G06N3/00G06N3/04G06N3/08G06N7/00

CPCB25J9/161B25J9/1661B25J9/1664G06N3/008G06N3/084G06N3/082G06N3/047G06N7/01

Inventor 李伟刘天星甘中学田小禾

Owner FUDAN UNIV

Method, device and storage medium for continuous space action planning of intelligent agent

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology