Reinforcement learning reward self-learning method in discrete manufacturing scene

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A self-learning method and reinforcement learning technology, applied in the field of reinforcement learning and reward learning

Active Publication Date: 2020-06-05

GUANGDONG UNIV OF TECH

View PDF7 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In order to solve the problem that in the existing discrete manufacturing production line, the adjustment of the control parameters of each process of product manufacturing mainly depends on the work experience of the staff, it provides a self-learning method for reinforcement learning and rewards in the discrete manufacturing scene. Using the method of deep reinforcement learning to realize the learning of control parameters of production line equipment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0068] The core concept in reinforcement learning is the reward function. The reward function shows the feedback results of different actions taken in the current state to the agent in the learning process, which is equivalent to specifying the learning task through the reward function. However, in reinforcement learning problems, rewards need to be manually set for different scenarios to achieve the best results, which leads to the poor applicability of the same set of algorithms to different scenarios.

[0069] Therefore, this embodiment proposes a reinforcement learning reward self-learning method in discrete manufacturing scenarios, as shown in Figure 1, which introduces a model-based reinforcement learning method (that is, using existing data to learn the model of the environment p(s t+1 |s t , a t ), corresponding to the GPR part, but the GPR part is first learned to be the difference of the state, and the next state s is derived t+1 distribution), through the weak int...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a reinforcement learning reward self-learning method in a discrete manufacturing scene. The method comprises the following steps: 1, refining the process of the current production line, wherein g belongs to G = {g1, g2,..., gN}, and the intelligent agent reaches a preset target g and is recorded as an interaction sequence episode; according to the initial parameters, obtaining multiple sections of episodes corresponding to the g1 as a target, taking state actions in the episodes and a state difference value delta as a training data set to be input into a GPR module, andobtaining a system state transition model based on state difference; enabling the intelligent Agent to continue to interact with the environment to obtain a new state st, and enabling a Reward network to output r (st), enabling an Actor network to output a (st), enabling a Critic network to output V (st) and enabling a GPR module to output value function Vg as the updating direction of the whole;when the absolute value of Vg-V (st) is smaller than epsilon, considering that award function learning under the current procedure is completed, and carrying out parameter storage of the Reward network; continuously carrying out interaction, and generating the following sub-target g < n + 1 > as the episodes of the updating direction for updating the GPR; and when the set target G = {g1, g2,...,gN} is all realized in sequence, finishing the process learning of the production line.

Description

technical field [0001] The present invention relates to the technical field of deep reinforcement learning, and more specifically, relates to a reinforcement learning reward learning method in discrete manufacturing scenarios. Background technique [0002] The manufacturing industry can be generally divided into process manufacturing and discrete manufacturing according to the characteristics of its product manufacturing process. Compared with process manufacturing, discretely manufactured products are often processed and assembled from multiple parts through a series of discontinuous processes, mainly including machining and assembly industries such as machining and machine tools. [0003] For the processing and production process of discrete manufacturing enterprises. The whole production process is often decomposed into many processing tasks, and each processing task does not require much processing resources, but parts are often processed with different types and requir...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62G06N20/00

CPCG06N20/00G06V20/41G06F18/2136

Inventor 吴宗泽赖家伦刘亚强梁泽逍曾德宇

Owner GUANGDONG UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Reinforcement learning reward self-learning method in discrete manufacturing scene

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A self-learning method and reinforcement learning technology, applied in the field of reinforcement learning and reward learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A self-learning method and reinforcement learning technology, applied in the field of reinforcement learning and reward learning

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology