Multi-objective flexible job shop scheduling method based on graph network and reinforcement learning
By employing graph network and reinforcement learning-based methods, the efficiency and quality issues of multi-objective optimization in flexible job shop scheduling were addressed, achieving efficient flexible job shop scheduling and generating high-quality scheduling schemes.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG UNIV OF TECH
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies struggle to efficiently solve multi-objective optimization problems in flexible workshop scheduling, especially in complex production environments where computation time is long and solution quality deteriorates.
We adopt a graph network and reinforcement learning-based approach to model the flexible job shop scheduling problem as a Markov decision process. We use a dual-graph attention neural network to adaptively represent the characteristics of processes and machine nodes, train the decision model through proximal policy optimization, and combine it with the Actor-Critic framework for policy gradient optimization.
It improves the solution efficiency and quality of the model, and can find the optimal balance point in the multi-objective flexible job shop scheduling problem, generating a high-quality Pareto front solution set that is adaptable to various production scenarios.
Smart Images

Figure CN119918840B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of flexible job shop scheduling, and in particular to a multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning. Background Technology
[0002] In the wave of Industry 4.0, the way companies produce, improve, and distribute products is undergoing a revolutionary transformation. This transformation is driving the manufacturing industry towards faster, smarter, and more flexible operations, thereby triggering a fundamental leap in enterprise production capabilities. To adapt to rapid market changes and the demand for flexible manufacturing systems, numerous experts and scholars have proposed a series of innovative manufacturing system models, such as multi-agent systems, flexible manufacturing systems, and agile manufacturing engineering. The emergence of these new models has undoubtedly accelerated the new process of manufacturing development.
[0003] Against this backdrop, the Flexible Job-Shop Scheduling Problem (FJSP) has become a highly representative challenge in the field of flexible manufacturing. FJSP allows a single operation to be processed on multiple different machines, involving not only the sequencing of operations but also the allocation of machines. Compared to the traditional job-shop scheduling problem, FJSP has a more complex topology and a larger solution space, making it better suited to the flexibility and diversity of task-resource relationships in new manufacturing paradigms.
[0004] In the current field of production scheduling, the main methods employed include exact methods and heuristic methods. Exact methods, such as branch and bound and column generation, aim to find the optimal solution, but they often require significant computational time when dealing with multi-objective optimization problems, especially in complex production environments where this time cost increases dramatically. Heuristic methods, while sacrificing guarantees of the optimal solution and capable of producing high-quality scheduling schemes in a shorter time, also face the challenges of long computation times and decreased solution quality when handling multi-objective problems. Summary of the Invention
[0005] The purpose of this invention is to overcome the shortcomings of the prior art and provide a multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning.
[0006] To achieve the above objectives, the technical solution provided by this invention is as follows:
[0007] A multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning includes:
[0008] The process of solving multi-objective flexible job shop scheduling is defined as a Markov decision process.
[0009] The flexible workshop scheduling problem is modeled as a heterogeneous disjunctive graph;
[0010] A decision model is obtained by adaptively representing process and machine node features using a dual-graph attention neural network;
[0011] The decision model is trained using a proximal policy optimization method to obtain the trained decision model.
[0012] Multi-objective flexible workshop scheduling is carried out using a trained decision-making model.
[0013] Furthermore, the process of solving the multi-objective flexible job shop scheduling process is regarded as a sequential decision-making process, in which the process is assigned to the processable machine in each state until all processes are scheduled.
[0014] The process of solving multi-objective flexible job shop scheduling is defined as a Markov decision process, consisting of tuples. constitute;
[0015] Let be a set of states, and all processes and machines constitute a state at time t, represented as follows:
[0016] For a set of actions, process selection and machine allocation are combined into a single decision action. A decision action is defined as a top-feasible process-machine pair (O) ij M k ), O ij This indicates the j-th process of the i-th workpiece that is allowed to be scheduled. Let k represent the compatible machine for the j-th process of the i-th workpiece. Represents the entire set of machines;
[0017] Let S be the state transition function, representing the state transition in state S. t Next, complete the decision-making action a. t Then it transitions to a new state S t+1 The probability of;
[0018] R represents the decision-making action a. t The reward, i.e., the reward from state S t Convert to S t+1 The reward is a weighted combination reward, r. t ∈R is determined by the production cycle reward r ms With load balancing reward r std Based on weight combination, i.e. r t =(1-β)r ms +βr std, where β takes values in the range [0,1]; Indicates in s t Calculate O ij The estimated completion time when process O ij When not scheduled, O ij On all compatible machines The minimum processing time is used; if scheduling has been completed, the actual processing time of the corresponding machine is used. and These represent all machines at time t and time t-1, respectively. Standard deviation of processing time;
[0019] γ is a discount factor with a value range of [0,1]. Values closer to 1 focus more on long-term cumulative rewards, while values closer to 0 consider short-term rewards.
[0020] Furthermore, the structure of the heterogeneous dissociation graph H is as follows: Where O is the set of process nodes; M is the set of machine nodes; and C is the set of directed arcs. The scheduling status is for the flexible workshop scheduling problem;
[0021] Each state s t Represented as a heterogeneous graph in It changes dynamically during the solution process.
[0022] Furthermore, a dual-graph attention neural network is used to adaptively represent process and machine node features, including process features aggregated based on a multi-order graph attention neural network and machine features aggregated based on a graph attention neural network.
[0023] Furthermore, the features of the aggregation process based on multi-order graph attention neural networks include:
[0024] By adding an attention mechanism, the process node features of multiple domains are adaptively aggregated, and the feature information of each hop neighbor is weighted:
[0025]
[0026] In the formula, m ij,l The weights of the l-th order features of nodes ij are represented by . Let L represent the l-th order feature of node ij, and L represent the total number of levels;
[0027]
[0028] In the formula, h ij Let σ represent the features of node ij, and let σ represent the nonlinear activation function.
[0029] Furthermore, machine features are aggregated based on graph attention neural networks, including:
[0030] In the machine feature embedding of the heterogeneous dissociative graph H, process O ij In compatible machine M k Processing time p ijk By using it as a feature E of the machine arc of the process ijk To indicate;
[0031] The attention coefficient of a machine node is calculated as follows:
[0032]
[0033] In the formula, v k For machine M k initial features, W M Here, represents the linear transformation parameters of the machine, and LeakyReLU is the activation function. This is the transpose of the trainable weight parameters;
[0034] Then, e kk For machine M k Its own attention coefficient, to all e ijk With e kk Normalization is used to obtain the normalized attention coefficient α. ijk and α kk ;
[0035] Finally, the embedding v' is calculated by aggregating the features of adjacent operations and its own features. k The machine, the computing machine M k Feature v' k The aggregation function is:
[0036]
[0037] In the formula, Indicates machine M k All processes that can be processed.
[0038] Furthermore, the decision model is trained using a proximal policy optimization method, including:
[0039] For each time step t, a t =(O ij M k )∈A t For each action pair, the corresponding process, machine, and average pooling feature are embedded and concatenated and input into the MLP to obtain its state s. t Choose action a t Probability scalar parameter P(a) t ,s t),as follows:
[0040]
[0041] In the formula, MLP represents the construction of a decision network using an MLP neural network, and ω represents the network parameters. and for h ij and v' k The features h after passing through L layers of neural network G for and Global features after average pooling;
[0042] For all P(a) t ,s t Normalization calculation for each a t The probability of the given value is given, and the corresponding probability distribution is output:
[0043]
[0044] The algorithm is based on the Actor-Critic framework, calculates the policy gradient, and evaluates and updates the current policy; its optimization objective function is:
[0045]
[0046] In the formula, π θ Indicates the new policy network in state s t Choose action a t The probability, π old Represents the network state s of the old policy t Choose action a t The probability of A t Let A(s,a) represent the dominance function of state s and action a, and let clip represent the truncation operation, where ∈ represents the range of a truncation.
[0047] The Critic network, used as an evaluation module, employs global features h. G As input to the network,
[0048]
[0049] In the formula, This indicates the current state S. t The evaluation is given by θ, which represents the parameters of the evaluation network.
[0050] Compared with existing technologies, the principles and advantages of this technical solution are as follows:
[0051] 1. By leveraging dual-graph attention networks to deeply mine the features of processes and machines, the key features of the production model can be effectively represented. This deep representation capability not only improves the model's performance but also enhances its adaptability to different production models, enabling it to flexibly cope with various unseen production scenarios and demonstrating excellent generalization performance.
[0052] 2. In the field of multi-objective optimization, balancing the relationships between different objectives is an important and challenging problem. This technical solution assigns different weights to different optimization objectives, enabling the model to find an optimal balance among multiple objectives. This method not only improves solution efficiency but also generates high-quality Pareto front solutions, providing an effective solution for the multi-objective flexible job shop scheduling problem. Attached Figure Description
[0053] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the services required in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0054] Figure 1 This is a flowchart of the multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning of the present invention;
[0055] Figure 2 This is a schematic diagram illustrating the principle of the multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning of this invention.
[0056] Figure 3 This is a heterogeneous disjunction diagram for the flexible workshop scheduling problem. Detailed Implementation
[0057] The present invention will be further described below with reference to specific embodiments:
[0058] like Figure 1 and Figure 2 As shown in this embodiment, the multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning includes the following steps:
[0059] S1. The process of solving the multi-objective flexible job shop scheduling problem can be viewed as a sequential decision-making process. In each state, operations are assigned to processable machines until all operations are scheduled. Therefore, the process of solving the multi-objective flexible job shop scheduling problem is defined as a Markov decision process, consisting of tuples. constitute;
[0060] in,
[0061] Let be a set of states, and all processes and machines constitute a state at time t, represented as follows:
[0062] For a set of actions, process selection and machine allocation are combined into a single decision action. A decision action is defined as a feasible process-machine pair (O) on t. ij M k ), O ij This indicates the j-th process of the i-th workpiece that is allowed to be scheduled. Let k represent the compatible machine for the j-th process of the i-th workpiece. Represents the entire set of machines;
[0063] Let S be the state transition function, representing the state transition in state S. t Next, complete the decision-making action a. t Then it transitions to a new state S t+1 The probability of;
[0064] R represents the decision-making action a. t The reward, i.e., the reward from state S t Convert to S t+1 The reward is a weighted combination reward, r. t ∈R is determined by the production cycle reward r ms With load balancing reward r std Based on weight combination, i.e. r t =(1-β)r ms +βr std , where β takes values in the range [0,1]; Indicates in s t Calculate O ij The estimated completion time when process O ij When not scheduled, O ij On all compatible machines The minimum processing time is used; if scheduling is complete, the actual processing time of the corresponding machine is used. std = and These represent all machines at time t and time t-1, respectively. Standard deviation of processing time;
[0065] γ is a discount factor with a value range of [0,1]. Values closer to 1 focus more on long-term cumulative rewards, while values closer to 0 consider short-term rewards.
[0066] S2. Model the flexible workshop scheduling problem as a heterogeneous disjunctive graph;
[0067] like Figure 3 As shown, the structure of heterogeneous separation diagram H is as follows: Where O is the set of process nodes; M is the set of machine nodes; and C is the set of directed arcs. The scheduling status is for the flexible workshop scheduling problem;
[0068] Each state s t Represented as a heterogeneous graph in It changes dynamically during the solution process.
[0069] S3. Use a dual-graph attention neural network to adaptively represent the characteristics of the process and machine nodes to obtain the decision model;
[0070] This step includes aggregating process features based on multi-order graph attention neural networks and aggregating machine features based on graph attention neural networks;
[0071] The features of the aggregation process based on multi-order graph attention neural networks include:
[0072] By adding an attention mechanism, the process node features of multiple domains are adaptively aggregated, and the feature information of each hop neighbor is weighted:
[0073]
[0074] In the formula, m ij,l The weights of the k-th feature of node ij are represented. Let L represent the l-th order feature of node ij, and L represent the total number of levels;
[0075]
[0076] In the formula, h ij Let σ represent the features of node ij, and let σ represent the nonlinear activation function.
[0077] Machine features are aggregated based on graph attention neural networks, including:
[0078] In the machine feature embedding of the heterogeneous dissociative graph H, process O ij In compatible machine M k Processing time p ijk By using it as a feature E of the machine arc of the process ijk To indicate;
[0079] The attention coefficient of a machine node is calculated as follows:
[0080]
[0081] In the formula, v k For machine M k initial features, WM Here, represents the linear transformation parameters of the machine, and LeakyReLU is the activation function. This is the transpose of the trainable weight parameters;
[0082] Then, e kk For machine M k Its own attention coefficient, to all e ijk With e kk Normalization is used to obtain the normalized attention coefficient α. ijk and α kk ;
[0083] Finally, the embedding v' is calculated by aggregating the features of adjacent operations and its own features. k The machine, the computing machine M k Feature v' k The aggregation function is:
[0084]
[0085] In the formula, Indicates machine M k All processes that can be processed.
[0086] S4. Train the decision model using the proximal policy optimization method to obtain the trained decision model;
[0087] The process of training a decision model using proximal policy optimization methods includes:
[0088] For each time step t, a t =(O ij M k )∈A t For each action pair, the corresponding process, machine, and average pooling feature are embedded and concatenated and input into the MLP to obtain its state s. t Choose action a t Probability scalar parameter P(a) t ,s t ),as follows:
[0089]
[0090] In the formula, MLP represents the construction of a decision network using an MLP neural network, and ω represents the network parameters. and for h ij and v' k The features h after passing through L layers of neural network G for and Global features after average pooling;
[0091] For all P(a) t ,s t Normalization calculation for each a t The probability of the given value is given, and the corresponding probability distribution is output:
[0092]
[0093] The algorithm is based on the Actor-Critic framework, calculates the policy gradient, and evaluates and updates the current policy; its optimization objective function is:
[0094]
[0095] In the formula, π θ Indicates the new policy network in state s t Choose action a t The probability, π old Represents the network state s of the old policy t Choose action a t The probability of A t Let A(s,a) represent the dominance function of state s and action a, and let clip represent the truncation operation, where ∈ represents the range of a truncation.
[0096] The Critic network, used as an evaluation module, employs global features h. G As input to the network,
[0097]
[0098] In the formula, This indicates the current state S. t The evaluation is given by θ, which represents the parameters of the evaluation network.
[0099] S5. Finally, the trained decision-making model is used for multi-objective flexible workshop scheduling.
[0100] The above-described embodiments are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Therefore, any changes made in accordance with the shape and principle of the present invention should be covered within the protection scope of the present invention.
Claims
1. A multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning, characterized in that, include: The process of solving multi-objective flexible job shop scheduling is defined as a Markov decision process. The flexible workshop scheduling problem is modeled as a heterogeneous disjunctive graph; A decision model is obtained by adaptively representing process and machine node features using a dual-graph attention neural network; The decision model is trained using a proximal policy optimization method to obtain the trained decision model; Multi-objective flexible workshop scheduling is performed using a trained decision-making model. Adaptive characterization of process and machine node features is achieved using a dual-graph attention neural network, including the aggregation of process features based on a multi-order graph attention neural network and the aggregation of machine features based on a graph attention neural network. The features of the aggregation process based on multi-order graph attention neural network include: By adding an attention mechanism, the process node features of multiple domains are adaptively aggregated, and the feature information of each hop neighbor is weighted: ; In the formula express The node's Weights of second-order features express The node's Rank features, Indicates the total number of tiers; ; In the formula, express Node characteristics This represents a non-linear activation function.
2. The multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning according to claim 1, characterized in that, The process of solving the multi-objective flexible job shop scheduling process is regarded as a sequential decision-making process, in which the process is assigned to the processable machine in each state until all processes are scheduled. The process of solving multi-objective flexible job shop scheduling is defined as a Markov decision process, consisting of tuples. constitute; As a state set, all processes and machines are in The state is represented by the moment. ; For a set of actions, process selection and machine allocation are combined into a single decision action. ; A decision action is defined as The above-mentioned feasible process machine pair , Indicates the number of allowed scheduling The first workpiece Each process Indicates the first The first workpiece Compatible machines for each process , Represents the entire set of machines; Let be the state transition function, representing the state transition. Next, complete the decision-making action. Then transition to a new state The probability of; To complete the decision-making action The reward, i.e., the reward based on the state Switch to The reward is a weighted combination reward. Production cycle rewards With load balancing rewards Based on weight combination, i.e. ,in Range of values ; , Indicates in calculate The estimated completion time, when the process When not scheduled, ; express On all compatible machines The minimum processing time is used; if scheduling has been completed, the actual processing time of the corresponding machine is used. , and Each is the current Time and -1 moment all machines all machines Standard deviation of processing time; This is the discount factor, and its value range is... Those whose scores are close to 1 focus more on long-term cumulative rewards, while those whose scores are close to 0 consider short-term rewards more.
3. The multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning according to claim 2, characterized in that, Heterogeneous dissociation diagram The structure is ,in For process node set; For machine node set, For a set of directed arcs, The scheduling status is for the flexible workshop scheduling problem; Each state Represented as a heterogeneous graph ,in It changes dynamically during the solution process.
4. The multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning according to claim 1, characterized in that, Machine features are aggregated based on graph attention neural networks, including: In heterogeneous dissociation diagram Machine features are embedded in the process. In compatible machines Processing time By using it as a feature of the machine arc of the process To indicate; The attention coefficient of a machine node is calculated as follows: ; In the formula, For machines initial characteristics, For the linear transformation parameters of the machine, For activation function, This is the transpose of the trainable weight parameters; Then, For machines One's own attention coefficient, all and Normalization to obtain a normalized attention coefficient and ; Finally, the embedding is calculated by aggregating the features of adjacent operations and the feature itself. Machines, computing machines Features The aggregation function is: ; In the formula, Indicates machine All processes that can be processed.
5. The multi-objective flexible job shop scheduling method based on graph networks and reinforcement learning according to claim 4, characterized in that, The decision model is trained using a proximal policy optimization method, including: For each time step of For each action pair, the corresponding process, machine, and average pooling feature are embedded and concatenated, and then input into the MLP to obtain its state. Select action Probability scalar parameter ,as follows: ; In the formula, This indicates that an MLP neural network is used to construct the decision network. For network parameters, and for and Passing through Features after layered neural networks for and Global features after average pooling; For all Normalization calculation for each Given the probability, output the corresponding probability distribution: ; The algorithm is based on the Actor-Critic framework, calculates the policy gradient, and evaluates and updates the current policy; its optimization objective function is: ; In the formula, Indicates the new policy network in state Select action The probability, Indicates the network state of the old policy Select action The probability, Representing state action Advantage function , This indicates a truncation operation. For a truncated range; The Critic network, used as an evaluation module, utilizes global features. As input to the network, ; In the formula, Indicates the current state Evaluation, These represent the parameters used to evaluate the network.