A multi-agent cross-modal depth deterministic strategy gradient training method based on image input

A multi-agent, image input technology, applied in the field of reinforcement learning algorithms, can solve problems such as weak gradient guidance, reduced efficiency of actor network exploration, and impact of training actor exploration efficiency.

Pending Publication Date: 2019-06-28
SUN YAT SEN UNIV
View PDF12 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But when the explored environment is too huge, for example, when using a 1920x1024x4 color-depth image as the input of the actor, the convergence of the critic cannot be guaranteed.
Excessively large exploration space causes the actor network to greatly reduce the exploration efficiency, and the reduction of exploration efficiency will lead to the inability to obtain effective training samples. Since the actor and critic in DDPG share the same set of training samples, this leads to the critic's training being explored by the act

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multi-agent cross-modal depth deterministic strategy gradient training method based on image input
  • A multi-agent cross-modal depth deterministic strategy gradient training method based on image input
  • A multi-agent cross-modal depth deterministic strategy gradient training method based on image input

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0056] Example 1:

[0057] Such as figure 1 As shown, a multi-agent cross-modal depth deterministic strategy gradient training method based on image input includes the following steps:

[0058] Step 1. Build an experiment platform in the simulator, define the types of interactive objects and robotic arms, define the ultimate goal and rewards and punishments of the robotic arm control task, and clarify the state space and action space of the dual agent;

[0059] The specific steps include:

[0060] S11. Use the open source simulation platform V-REP to build the experimental environment, the physics engine used is the Vortex open source physics engine, the robot arm type used is the UR5 robot, and the number of joints is 6;

[0061] S12. Set the task to be completed by the robotic arm control as the grasping task. The task is described as having multiple irregular objects of different sizes, shapes, and colors on the same height level of the robotic arm. The agent needs to control the ro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-agent cross-modal depth deterministic strategy gradient training method based on image input. Firstly, a mechanical arm training environment in a simulation platform is constructed; Then two director intelligent bodies and a student intelligent body which are input by utilizing different modalities are constructed; Secondly, based on a depth deterministic strategygradient algorithm, an actor module and a critic module of a director and an actor module of a learner are trained, and finally a cross-modal depth reinforcement learning mechanical arm training algorithm based on image input is achieved; When the overall training is finished; a mob actor network can be used only; high-dimensional image input is received; the action capable of completing the taskis output; Moreover, the method is very suitable for being migrated to a real environment, and since the real environment cannot provide full-state modal information and the image modal information isrelatively easy to obtain, after an actor network of a mob is trained, the demand of the full-state modal information can be abandoned, and a relatively good output strategy can be obtained by directly utilizing image input.

Description

technical field [0001] The invention belongs to the reinforcement learning algorithm in the field of artificial intelligence and robot, and more specifically relates to a multi-agent cross-modal depth deterministic strategy gradient training method based on image input. Background technique [0002] In recent years, due to the rapid increase of computing resources and the development of deep learning networks, the use of large amounts of data to train supervised learning algorithm models has achieved very good results in many fields. At present, there are two main categories of methods for applying learning-based methods in the field of robot control: self-supervised learning and reinforcement learning. The method of self-supervised learning is to let the robot directly or indirectly collect task data and label it, and then rely on a large amount of labeled training data to complete the training of the deep neural network. [0003] Compared with the method of self-supervise...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06K9/66G06N3/00G06N3/04G06N3/08
Inventor 成慧杨凯吴华栋张东
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products