Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning

A technology of multi-objective optimization and reinforcement learning, applied in the field of multi-objective optimization of disordered grasping based on deep reinforcement learning, to achieve the effect of optimal selection

Active Publication Date: 2021-09-03
常州唯实智能物联创新中心有限公司
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The reward value functions accepted by the existing Q network are all discrete, that is, the action execution results are divided into different situations according to the threshold and different rewards are given. Such reward feedback is suitable for pre-defined situations. In the process of target grasping, m

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning
  • Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning
  • Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0060] Example 1

[0061] like figure 1 As shown, this Example 1 provides a unordered multi-objective optimization method based on deep strengthening learning, through two parallel independent Q networks, and handles the same scene at the same time, and the robot arms respectively Take the point to perform the grab, and return to the execution path, capture the power consumption and other parameters. Differentiate between the Q networks on the execution path, grab power consumption, etc., and generate corresponding reward values. Q Network accepts internal and external double reward function feedback, solving the reward value function of single Q network can only be discrete data, will execute the path, grab power consumption, etc., add continuous data to the reward value, so further optimization Select the selection of the point.

[0062] Specifically, the unordered multi-objective optimization method based on deep reinforced learning includes:

[0063] S110: Constructing a virtu...

Example Embodiment

[0101] Example 2

[0102] See figure 2 This embodiment provides a unordered multi-objective optimization system based on deep strengthening learning, the system comprising: virtual scene configuration module, task establishment module, virtual shooting module, output module, execution module, calculation module, feedback Modules and predictive model generation modules.

[0103] The virtual scene construction module is adapted to build a virtual scene of a mechanical arm.

[0104] The task establishment module is suitable for establishing two parallel independent depth enhanced learning network processing disorderly arrested multi-objective tasks. Specifically, the task establishment module is used to perform the following steps:

[0105] S121: Establish two parallel independent depth enhancement learning networks, which are the first network and the second network, where the network structure of the first network and the second network are the same;

[0106] S122: The network stru...

Example Embodiment

[0135] Example 3

[0136] This embodiment provides a computer readable storage medium that stores at least one instruction in the computer readable storage medium, and the instructions are executed by the processor to implement depth reinforced learning based on the embodiments provided by Example 1. Grab the multi-objective optimization method.

[0137] Distribution Multi-Objective Optimization Method Based on Deep Strengthening Learning By two parallel independent Q networks, the machine arm performs grabbing points for the respective grab points of the two networks, and returns the execution path. Grab the power consumption and other parameters. Differentiate between the Q networks on the execution path, grab power consumption, etc., and generate corresponding reward values. Q Network accepts internal and external double reward function feedback, solving the reward value function of single Q network can only be discrete data, will execute the path, grab power consumption, etc.,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of disordered grabbing of mechanical arms, and particularly relates to a disordered grabbing multi-objective optimization method and system based on deep reinforcement learning, the disordered grabbing multi-objective optimization method based on deep reinforcement learning processes the same scene at the same moment through two parallel and independent Q networks, and the mechanical arm grabs the respective grabbing points of the two networks and returns parameters such as an execution path and grabbing power consumption. Between the Q networks, advantages and disadvantages of the two are distinguished about capture effects of an execution path, capture power consumption and the like, and corresponding reward values are generated. The Q network receives internal and external reward function feedback, the problem that a reward value function of a single Q network can only be discrete data is solved, and continuous data such as an execution path and grabbing power consumption are added into the reward value function, so that selection of grabbing points is further optimized.

Description

technical field [0001] The invention relates to the field of disordered grasping of a robotic arm, in particular to a multi-objective optimization method and system for disordered grasping based on deep reinforcement learning. Background technique [0002] With the development of robot technology, the application scenarios of the existing robot disordered grasping technology continue to expand, and the reinforcement learning method based on the grasping success rate as the network training target cannot effectively meet the multi-index differentiation of robots' disordered grasping in different application scenarios need. The efficient multi-objective optimization of the robot's disorderly grasping behavior has important practical significance for improving the robot's customized work ability and expanding the robot's application scenarios. [0003] The deep reinforcement learning algorithm has obvious intelligence and robustness, based on the feedback of the environment, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06Q10/04G06Q10/06G06K9/00G06N3/08G06N3/04
CPCG06Q10/04G06Q10/067G06N3/08G06N3/045
Inventor 肖利民张华梁何智涛秦广军韩萌杨钰杰王良孙锦涛
Owner 常州唯实智能物联创新中心有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products