Deep reinforcement learning method for flexible manufacturing cell job scheduling based on dual-path graph attention

By combining a dual-path graph attention network with deep reinforcement learning, the problem of improper resource allocation in complex production environments caused by traditional scheduling methods is solved, thereby minimizing workpiece processing time and improving production efficiency.

CN119960401BActive Publication Date: 2026-06-12BEIJING UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING UNIV OF TECH
Filing Date
2025-01-26
Publication Date
2026-06-12

Smart Images

  • Figure CN119960401B_ABST
    Figure CN119960401B_ABST
Patent Text Reader

Abstract

The present application relates to a deep reinforcement learning flexible manufacturing workshop job scheduling method based on double path graph attention, which is designed to improve the efficiency and accuracy of production scheduling. The present application proposes a double path graph attention network, integrating a sequence dependency analysis module, a resource compatibility analysis module, a hybrid feature extraction module and a strategy optimization module, to optimize task scheduling and resource allocation in complex industrial environments. The sequence dependency analysis module ensures the correct order of task execution by analyzing the dependency between tasks; the resource compatibility analysis module evaluates the matching degree between different operations and machines to optimize resource utilization; the hybrid feature extraction module extracts key information from static and historical data to assist the decision-making process; the strategy optimization module utilizes the extracted features. Through advanced algorithm optimization of scheduling strategy, the present application can significantly improve production efficiency and provide an efficient and accurate scheduling solution.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention establishes a flexible shop floor scheduling method based on graph attention networks and deep reinforcement learning. Utilizing multi-path graph attention networks and policy optimization, it formulates a scheduling strategy that minimizes the maximum processing time of workpieces in complex scheduling environments. This method provides a comprehensive solution to support complex and dynamic production environments by integrating functional modules for sequence dependency feature embedding, resource compatibility feature embedding, and dynamic scheduling strategies. This invention is particularly suitable for industrial applications requiring highly dynamic scheduling, such as manufacturing, logistics, supply chain management, and other related industrial automation fields, aiming to improve production efficiency, optimize resource allocation, and reduce production interruptions. Background Technology

[0002] In modern manufacturing and logistics management, the efficiency of scheduling systems directly impacts production costs, delivery speed, and resource utilization. With the rapid development of the global market, the diversification of customer demands and the increasing complexity of production processes have brought new challenges. Especially in complex environments, traditional scheduling methods are insufficient to meet the demands for high efficiency and accuracy. Traditional scheduling methods typically rely on predetermined processes and fixed resource allocation schemes. This static planning often leads to improper resource allocation, underutilization of production capacity, and low scheduling efficiency when facing demand fluctuations and market changes.

[0003] While the advancements in Industry 4.0 and smart manufacturing have led to the convergence of advanced technologies, particularly the application of artificial intelligence and machine learning, which offer new possibilities for scheduling systems to predict future trends and optimize resource allocation, most existing scheduling solutions still rely on outdated static planning models. While these models are relatively stable in predetermined environments, they lack the necessary flexibility and adaptability in today's rapidly changing markets, making it difficult to maximize productivity and reduce operating costs.

[0004] This invention aims to provide a solution specifically designed for complex production environments requiring high precision and resource optimization. By incorporating a dual-path graph attention network, the system not only analyzes the sequence of operations and the compatibility of machine resources within a job, but also optimizes the allocation of these resources. It includes a hybrid feature extraction module that integrates key information from static and historical data to support decision-making and optimize scheduling strategies. Furthermore, a strategy optimization module uses advanced algorithms to precisely adjust resource allocation based on static data, thereby improving scheduling accuracy and production efficiency.

[0005] The development of this system fills the gaps in existing technologies, providing a highly efficient and intelligent scheduling solution suitable for complex production environments. This system is particularly suitable for manufacturing, supply chain management, and other fields that heavily rely on precise resource allocation, significantly improving operational efficiency and economic benefits while reducing resource waste caused by improper scheduling. Summary of the Invention

[0006] This invention provides a deep reinforcement learning-based flexible manufacturing workshop job scheduling method based on dual-path graph attention. This method combines a multi-path graph attention network and policy optimization, aiming to minimize the maximum processing time of workpieces in complex scheduling environments, thereby improving production efficiency and optimizing resource utilization.

[0007] Specifically, the present invention includes the following steps:

[0008] (1) Constructing a disjunction graph model of the job shop scheduling environment

[0009] The disjunctive graph of the job shop scheduling environment is defined as a quintuple G = (O, M, D, E), where O represents the set of operations, M represents the set of machines, D represents the set of directed edges, and E represents the set of undirected edges. Directed edges represent processing constraints between operations, and undirected edges represent executable constraints between operations and machines.

[0010] Define six attributes for an operation: operation processing status, number of machines that can process it, average processing time, start time, completion time, and percentage of completed operations. A machine has three attributes: machine utilization, number of operations it can process, and next available time. An undirected edge has one attribute: the processing time of the operation on the machine.

[0011] (2) Design and construct a feature extraction module based on dual-path graph attention

[0012] Two independent paths are established in the dual-path graph attention network: a sequence dependency embedding module and a resource-compatible embedding module, to handle the operation order and machine resource allocation issues in the job. Then, the two types of features are integrated through path-level feature embedding. The feature extraction method is described by the following sub-steps:

[0013] ① Embedded sequence dependency features

[0014] Initialize the embedding vectors of the operation node and its neighboring nodes. and The vector dimension is 8. A parameterized linear transformation W is applied to the embedding vector of each operator node. W is defined as a three-layer fully connected network with an input dimension of 6, an output dimension of 8, and a hidden layer dimension of 128. Attention scores are calculated for each pair of sequentially adjacent operator nodes O. ij and O jp :

[0015]

[0016] Where a T It is the learned attention vector, defined as a T Given a 3D vector, || represents the vector concatenation operation. The attention scores are normalized using the softmax activation function to obtain the attention weights for each edge.

[0017]

[0018] Where exp represents the exponential operation, and node O is the weighted embedding update operation that aggregates all neighboring nodes. ij Embedding:

[0019]

[0020] Where σ is the Sigmoid activation function, the sequence dependency features of the operation are obtained. definition It is an 8-dimensional vector;

[0021] ②Embedded resource compatibility features

[0022] Initialize each operation node O ij and its compatible machine set M j,i Embedded vectors, applying parameterized linear transformations to each operation's combination to the machine (O) ij M k ), and calculate the attention score e ik :

[0023]

[0024] in Define machine characteristics h is a 3-dimensional vector ijk Describe the edge features, and define h. i,j,k Let 'a' be a 1-dimensional vector representing the processing time of the operation on that machine, and 'a' be the learned attention vector. Define 'a' as a 5-dimensional vector, normalize the attention scores of all machines using the softmax activation function, and obtain the attention weight α for each machine. ik :

[0025]

[0026] Where exp represents the exponentiation operation, updates the embedding of the operation node, and aggregates the weighted embeddings of all compatible machines:

[0027]

[0028] Where σ is the Sigmoid activation function. This yields the resource compatibility characteristics of the operation. definition It is an 8-dimensional vector;

[0029] ③ Embedding path-level features

[0030] Integrate path-level feature pairs obtained from the sequence dependency embedding module and the resource-compatible embedding module, calculate the path-level attention weights from the output of each module, and then integrate the outputs of the two modules:

[0031]

[0032] Where exp represents the exponentiation operation, and q and W... m and b m These are the learned parameters, where q is defined as the attention weights for different paths, represented as a 2D vector, and W is defined as... m The weights for path-level operations are represented as a 128-dimensional vector, and b is defined as... m The weight bias is represented as a 1D vector. S is defined as the sequence-dependent feature embedding, R as the resource-compatible feature embedding, and tanh as the activation function. The outputs of the weighted SDEM and RCEM modules are used to calculate the attention weight β. m , representing β m Given a 2D vector, compute the final embedding of the operation node.

[0033]

[0034] definition It is an 8-dimensional vector;

[0035] (3) Model training and optimization

[0036] Define the state space as a disjunctive graph model G = (O, M, D, E);

[0037] Define the action space as a two-dimensional vector a t ={(O ij M k )}, O ij Let M represent the j-th operation of the i-th job, and M represent the operation of the j-th job. k Indicates the k-th machine;

[0038] The objective of the formulaic optimization is to minimize the maximum completion time C of all operations. max Define the reward function as r t :

[0039]

[0040] in It is the estimated maximum completion time under the scheduling plan at time t, and It is the maximum completion time under the scheduling plan at time t-1;

[0041] (4) Model training and optimization

[0042] In the initial stage of model training, all parameters of the dual-path graph attention network and the deep reinforcement learning model are initialized. The learning rate is set to 0.001, and the discount factor is set to 0.1 for reward discounting. The policy is trained for 1000 epochs to obtain the optimal solution. In each epoch, the model collects data on state, action, reward, and new state through interaction with the environment. This data is used to batch update the policy and value networks. The policy network is defined as a 3-layer fully connected network with an input dimension of 24, an output dimension of 1, and a hidden layer dimension of 256. Alternatively, a 3-layer fully connected network with an input dimension of 12, an output dimension of 1, and a hidden layer dimension of 256 is defined. During training, the model's performance is evaluated every 100 epochs to monitor learning progress and adjust the training policy. After training, a policy set Π is obtained. * ={π1,π2,…,π n}, where n is 50, representing the total number of operations.

[0043] The inventiveness of this invention is mainly reflected in:

[0044] (1) The dual-path graph attention network used in this invention innovatively combines sequence dependency feature embedding and resource compatibility feature embedding, which allows the system to achieve deeper data integration and optimization when analyzing operation sequence and resource configuration. Through this method, the system can not only accurately control the dependencies in the job flow, but also optimize resource usage, greatly improving the accuracy of scheduling strategies.

[0045] (2) This invention employs a hybrid feature extraction module, which can comprehensively extract key features affecting scheduling from static and historical data, providing data support for scheduling decisions. Unlike traditional methods that rely solely on real-time data or simple historical trends, this system identifies potential patterns and trends through advanced data analysis techniques, providing a basis for formulating more scientific and robust production strategies. Attached Figure Description

[0046] Figure 1 This is a diagram showing the scheduling results of the present invention for the flexible workshop scheduling problem. Detailed Implementation

[0047] A deep reinforcement learning-based job shop scheduling method based on a dual-path graph attention network is characterized by constructing a job shop scheduling reinforcement learning model, adjusting weights based on prediction and selection, and training and optimizing the model, including the following steps:

[0048] (5) Constructing a dissecting graph model of the job shop scheduling environment

[0049] The disjunctive graph of the job shop scheduling environment is defined as a quintuple G = (O, M, D, E), where O represents the set of operations, M represents the set of machines, D represents the set of directed edges, and E represents the set of undirected edges. Directed edges represent processing constraints between operations, and undirected edges represent executable constraints between operations and machines.

[0050] Define six attributes for an operation: operation processing status, number of machines that can process it, average processing time, start time, completion time, and percentage of completed operations. A machine has three attributes: machine utilization, number of operations it can process, and next available time. An undirected edge has one attribute: the processing time of the operation on the machine.

[0051] (6) Design and construct a feature extraction module based on dual-path graph attention

[0052] Two independent paths are established in the dual-path graph attention network: a sequence dependency embedding module and a resource-compatible embedding module, to handle the operation order and machine resource allocation issues in the job. Then, the two types of features are integrated through path-level feature embedding. The feature extraction method is described by the following sub-steps:

[0053] ④ Embedded sequence dependency features

[0054] Initialize the embedding vectors of the operation node and its neighboring nodes. and The vector dimension is 8. A parameterized linear transformation W is applied to the embedding vector of each operator node. W is defined as a three-layer fully connected network with an input dimension of 6, an output dimension of 8, and a hidden layer dimension of 128. Attention scores are calculated for each pair of sequentially adjacent operator nodes O. ij and O jp :

[0055]

[0056] Where a T It is the learned attention vector, defined as a T Given a 3D vector, || represents the vector concatenation operation. The attention scores are normalized using the softmax activation function to obtain the attention weights for each edge.

[0057]

[0058] Where exp represents the exponential operation, and node O is the weighted embedding update operation that aggregates all neighboring nodes. ij Embedding:

[0059]

[0060] Where σ is the Sigmoid activation function, the sequence dependency features of the operation are obtained. definition It is an 8-dimensional vector;

[0061] ⑤ Embedded resource compatibility features

[0062] Initialize each operation node O ij and its compatible machine set M j,i Embedded vectors, applying parameterized linear transformations to each operation's combination to the machine (O) ij M k ), and calculate the attention score e ik :

[0063]

[0064] in Define machine characteristics h is a 3-dimensional vector ijk Describe the edge features, and define h. i,j,k Let 'a' be a 1-dimensional vector representing the processing time of the operation on that machine, and 'a' be the learned attention vector. Define 'a' as a 5-dimensional vector, normalize the attention scores of all machines using the softmax activation function, and obtain the attention weight α for each machine. ik :

[0065]

[0066] Where exp represents the exponentiation operation, updates the embedding of the operation node, and aggregates the weighted embeddings of all compatible machines:

[0067]

[0068] Where σ is the Sigmoid activation function. This yields the resource compatibility characteristics of the operation. definition It is an 8-dimensional vector;

[0069] ⑥ Embedded path-level features

[0070] Integrate path-level feature pairs obtained from the sequence dependency embedding module and the resource-compatible embedding module, calculate the path-level attention weights from the output of each module, and then integrate the outputs of the two modules:

[0071]

[0072] Where exp represents the exponentiation operation, and q and W... m and b mThese are the learned parameters, where q is defined as the attention weights for different paths, represented as a 2D vector, and W is defined as... m The weights for path-level operations are represented as a 128-dimensional vector, and b is defined as... m The weight bias is represented as a 1D vector. S is defined as the sequence-dependent feature embedding, R as the resource-compatible feature embedding, and tanh as the activation function. The outputs of the weighted SDEM and RCEM modules are used to calculate the attention weight β. m , representing β m Given a 2D vector, compute the final embedding of the operation node.

[0073]

[0074] definition It is an 8-dimensional vector;

[0075] (7) Model Training and Optimization

[0076] Define the state space as a disjunctive graph model G = (O, M, D, E);

[0077] Define the action space as a two-dimensional vector a t ={(O ij M k )}, O ij Let M represent the j-th operation of the i-th job, and M represent the operation of the j-th job. k Indicates the k-th machine;

[0078] The objective of the formulaic optimization is to minimize the maximum completion time C of all operations. max Define the reward function as r t :

[0079]

[0080] in It is the estimated maximum completion time under the scheduling plan at time t, and It is the maximum completion time under the scheduling plan at time t-1;

[0081] (8) Model Training and Optimization

[0082] In the initial stage of model training, all parameters of the dual-path graph attention network and the deep reinforcement learning model are initialized. The learning rate is set to 0.001, and the discount factor is set to 0.1 for reward discounting. The policy is trained for 1000 epochs to obtain the optimal solution. In each epoch, the model collects data on state, action, reward, and new state through interaction with the environment. This data is used to batch update the policy and value networks. The policy network is defined as a 3-layer fully connected network with an input dimension of 24, an output dimension of 1, and a hidden layer dimension of 256. Alternatively, a 3-layer fully connected network with an input dimension of 12, an output dimension of 1, and a hidden layer dimension of 256 is defined. During training, the model's performance is evaluated every 100 epochs to monitor learning progress and adjust the training policy. After training, a policy set Π is obtained. * ={π1,π2,…,π n}, where n is 50, representing the total number of operations.

Claims

1. A deep reinforcement learning-based flexible manufacturing shop job scheduling method based on dual-path graph attention, characterized in that, A disjunctive graph model of the job shop scheduling environment is constructed, a feature extraction module based on dual-path graph attention is designed to extract state space features, and the model is trained and optimized, including the following steps: (1) Constructing the disjunction graph model of the job shop scheduling environment The disjunctive graph of the job shop scheduling environment is defined as a quintuple. ,in Represents the set of operations. Represents a set of machines. Represents a set of directed edges. The set of undirected edges: directed edges represent processing constraints between operations, and undirected edges represent executable constraints between operations and the machine. Define six attributes for an operation: operation processing status, number of machines that can process the operation, average processing time of the operation, start processing time of the operation, completion time of the operation, and percentage of completed operation. A machine has three attributes: machine utilization, number of operations that can be processed, and next available time. An undirected edge has one attribute: the processing time of the operation on the machine. (2) Design and construct a feature extraction module based on dual-path graph attention Two independent paths are established in the dual-path graph attention network: a sequence dependency embedding module and a resource-compatible embedding module, to handle the operation order and machine resource allocation issues in the job. Then, the two types of features are integrated through path-level feature embedding. The feature extraction method is described by the following sub-steps: ① Embedded sequence dependency features Initialize the embedding vectors of the operation node and its neighboring nodes. and The vector has a dimension of 8, and a parameterized linear transformation is applied. The embedding vector to each operation node is defined. It is a three-layer fully connected network with an input dimension of 6, an output dimension of 8, and a hidden layer dimension of 128; attention scores are calculated for each pair of adjacent operation nodes in the sequence. and : ; in It is the learned attention vector, defined as follows: It is a 3-dimensional vector. This is a vector concatenation operation. The softmax activation function is used to normalize all attention scores, resulting in the attention weights for each edge. ; in This represents the exponential operation, which aggregates the weighted embedding update operation nodes of all adjacent nodes. Embedding: ; in It is the Sigmoid activation function, which yields the sequence dependency features of the operation. ,definition It is an 8-dimensional vector; ②Embedded resource compatibility features Initialize each operation node and its compatible machine set Embedded vectors, applying parameterized linear transformations to each operation in the machine combination And calculate attention score : ; in Represent edge features, define It is a 1-dimensional vector representing the processing time of the operation on this machine. It is the learned attention vector, defined as follows: Given a 5-dimensional vector, normalize the attention scores of all machines and apply the softmax activation function to obtain the attention weights for each machine. : ; in This represents the exponential operation, updates the embedding of the operation node, and aggregates the weighted embeddings of all compatible machines: ; in It is a sigmoid activation function; the resource compatibility characteristics of the operation are obtained. ,definition It is an 8-dimensional vector; ③ Embedding path-level features Integrate path-level feature pairs obtained from the sequence dependency embedding module and the resource-compatible embedding module, calculate the path-level attention weights from the output of each module, and then combine the outputs of the two modules: ; in This indicates the exponentiation operation. , and These are the learned parameters, defined. The attention weights for different paths are represented as 2D vectors, defined as follows: The weights for path-level operations are represented as 128-dimensional vectors, defined as follows: The weight bias is represented as a 1-dimensional vector and is defined as follows: Embedding of sequence-dependent features For resource-compatible feature embedding, As the activation function, the outputs of the weighted SDEM and RCEM modules are processed using the calculated attention weights. ,express Given a 2D vector, compute the final embedding of the operation node. : ; definition It is an 8-dimensional vector; (3) Model training and optimization Define the state space as a disjunctive graph model ; Define the action space as a two-dimensional vector. , ; The goal of formulaic optimization is to minimize the maximum completion time of all operations. Define the reward function as : ; in At any moment The estimated maximum completion time under the scheduling plan, while At any moment The maximum completion time under the scheduling plan; (4) Model training and optimization In the initial stage of model training, all parameters of the dual-path graph attention network and the deep reinforcement learning model are initialized, with the learning rate set to 0.001 and the discount factor set to 0.1 for reward discounting. The policy is trained for 1000 epochs to obtain the optimal solution. In each epoch, the model collects data on state, action, reward, and new state through interaction with the environment. This data is used to batch update the policy and value networks. The policy network is defined as a 3-layer fully connected network with an input dimension of 24, an output dimension of 1, and a hidden layer dimension of 256. The model's performance is evaluated every 100 epochs during training to monitor learning progress and adjust the training policy. After training is complete, a policy set is obtained. The value of n is 50, which represents the total number of operations.