Reinforcement learning reward self-learning method in discrete manufacturing scene

A self-learning method and reinforcement learning technology, applied in the field of reinforcement learning and reward learning

Active Publication Date: 2020-06-05
GUANGDONG UNIV OF TECH
View PDF7 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that in the existing discrete manufacturing production line, the adjustment of the control parameters of each process of product manufacturing mainly depends on the work experience of the staff, it provides a self-learning method for reinforcement learning and rewards in the discrete manufacturing scene. Using the method of deep reinforcement learning to realize the learning of control parameters of production line equipment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning reward self-learning method in discrete manufacturing scene
  • Reinforcement learning reward self-learning method in discrete manufacturing scene
  • Reinforcement learning reward self-learning method in discrete manufacturing scene

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0068] The core concept in reinforcement learning is the reward function. The reward function shows the feedback results of different actions taken in the current state to the agent in the learning process, which is equivalent to specifying the learning task through the reward function. However, in reinforcement learning problems, rewards need to be manually set for different scenarios to achieve the best results, which leads to the poor applicability of the same set of algorithms to different scenarios.

[0069] Therefore, this embodiment proposes a reinforcement learning reward self-learning method in discrete manufacturing scenarios, as shown in Figure 1, which introduces a model-based reinforcement learning method (that is, using existing data to learn the model of the environment p(s t+1 |s t , a t ), corresponding to the GPR part, but the GPR part is first learned to be the difference of the state, and the next state s is derived t+1 distribution), through the weak int...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a reinforcement learning reward self-learning method in a discrete manufacturing scene. The method comprises the following steps: 1, refining the process of the current production line, wherein g belongs to G = {g1, g2,..., gN}, and the intelligent agent reaches a preset target g and is recorded as an interaction sequence episode; according to the initial parameters, obtaining multiple sections of episodes corresponding to the g1 as a target, taking state actions in the episodes and a state difference value delta as a training data set to be input into a GPR module, andobtaining a system state transition model based on state difference; enabling the intelligent Agent to continue to interact with the environment to obtain a new state st, and enabling a Reward network to output r (st), enabling an Actor network to output a (st), enabling a Critic network to output V (st) and enabling a GPR module to output value function Vg as the updating direction of the whole;when the absolute value of Vg-V (st) is smaller than epsilon, considering that award function learning under the current procedure is completed, and carrying out parameter storage of the Reward network; continuously carrying out interaction, and generating the following sub-target g < n + 1 > as the episodes of the updating direction for updating the GPR; and when the set target G = {g1, g2,...,gN} is all realized in sequence, finishing the process learning of the production line.

Description

technical field [0001] The present invention relates to the technical field of deep reinforcement learning, and more specifically, relates to a reinforcement learning reward learning method in discrete manufacturing scenarios. Background technique [0002] The manufacturing industry can be generally divided into process manufacturing and discrete manufacturing according to the characteristics of its product manufacturing process. Compared with process manufacturing, discretely manufactured products are often processed and assembled from multiple parts through a series of discontinuous processes, mainly including machining and assembly industries such as machining and machine tools. [0003] For the processing and production process of discrete manufacturing enterprises. The whole production process is often decomposed into many processing tasks, and each processing task does not require much processing resources, but parts are often processed with different types and requir...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06N20/00
CPCG06N20/00G06V20/41G06F18/2136
Inventor 吴宗泽赖家伦刘亚强梁泽逍曾德宇
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products