Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of multi-objective optimization and reinforcement learning, applied in the field of multi-objective optimization of disordered grasping based on deep reinforcement learning, to achieve the effect of optimal selection

Active Publication Date: 2021-09-03

常州唯实智能物联创新中心有限公司

View PDF1 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The reward value functions accepted by the existing Q network are all discrete, that is, the action execution results are divided into different situations according to the threshold and different rewards are given. Such reward feedback is suitable for pre-defined situations. In the process of target grasping, many factors that affect the grasping effect are continuously changing quantities, such as the grasping path, the power consumption of the robotic arm, etc. It is difficult to predict the effect of these variables in advance, so it is impossible to pre-determine the reward value for different situations how should change

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0061] Such as figure 1 As shown, this embodiment 1 provides a multi-objective optimization method for out-of-order grasping based on deep reinforcement learning. Through two parallel and independent Q-networks, and processing the same scene at the same time, the robotic arm grasps the two networks respectively. Fetch points to perform capture, and returns parameters such as the execution path and capture power consumption. The Q-network will distinguish the advantages and disadvantages of the two in terms of execution path, capture power consumption, etc., and generate corresponding reward values. The Q network accepts both internal and external reward function feedback, which solves the problem that the reward value function of a single Q network can only be discrete data, and adds continuous data such as execution path and power consumption to the reward value function to further optimize Selection of grab points.

[0062] Specifically, the multi-objective optimization me...

Embodiment 2

[0102] see figure 2 , this embodiment provides a disordered grasping multi-objective optimization system based on deep reinforcement learning, the system includes: a virtual scene construction module, a task establishment module, a virtual shooting module, an output module, an execution module, a calculation module, a feedback module and the predictive model generation module.

[0103] The virtual scene building module is suitable for constructing a virtual scene in which a robotic arm grabs multiple objects.

[0104] The task establishment module is suitable for establishing two parallel and independent deep reinforcement learning networks to handle the task of grabbing multiple targets out of order. Specifically, the task building module is used to perform the following steps:

[0105] S121: Establish two parallel and independent deep reinforcement learning networks, namely a first network and a second network, wherein the first network and the second network have the sam...

Embodiment 3

[0136] This embodiment provides a computer-readable storage medium, and at least one instruction is stored in the computer-readable storage medium. When the above-mentioned instruction is executed by a processor, the disorder based on deep reinforcement learning provided by Embodiment 1 is realized. Catch multi-objective optimization methods.

[0137] The multi-objective optimization method for out-of-order grasping based on deep reinforcement learning uses two parallel and independent Q-networks to process the same scene at the same time. The robotic arm performs grasping on the respective grasping points of the two networks and returns the execution path. Capture parameters such as power consumption. The Q-network will distinguish the advantages and disadvantages of the two in terms of execution path, capture power consumption, etc., and generate corresponding reward values. The Q network accepts both internal and external reward function feedback, which solves the problem ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the field of disordered grabbing of mechanical arms, and particularly relates to a disordered grabbing multi-objective optimization method and system based on deep reinforcement learning, the disordered grabbing multi-objective optimization method based on deep reinforcement learning processes the same scene at the same moment through two parallel and independent Q networks, and the mechanical arm grabs the respective grabbing points of the two networks and returns parameters such as an execution path and grabbing power consumption. Between the Q networks, advantages and disadvantages of the two are distinguished about capture effects of an execution path, capture power consumption and the like, and corresponding reward values are generated. The Q network receives internal and external reward function feedback, the problem that a reward value function of a single Q network can only be discrete data is solved, and continuous data such as an execution path and grabbing power consumption are added into the reward value function, so that selection of grabbing points is further optimized.

Description

technical field [0001] The invention relates to the field of disordered grasping of a robotic arm, in particular to a multi-objective optimization method and system for disordered grasping based on deep reinforcement learning. Background technique [0002] With the development of robot technology, the application scenarios of the existing robot disordered grasping technology continue to expand, and the reinforcement learning method based on the grasping success rate as the network training target cannot effectively meet the multi-index differentiation of robots' disordered grasping in different application scenarios need. The efficient multi-objective optimization of the robot's disorderly grasping behavior has important practical significance for improving the robot's customized work ability and expanding the robot's application scenarios. [0003] The deep reinforcement learning algorithm has obvious intelligence and robustness, based on the feedback of the environment, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06Q10/04G06Q10/06G06K9/00G06N3/08G06N3/04

CPCG06Q10/04G06Q10/067G06N3/08G06N3/045

Inventor 肖利民张华梁何智涛秦广军韩萌杨钰杰王良孙锦涛

Owner 常州唯实智能物联创新中心有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology