Scheduling method and system based on graph representation learning and knowledge reasoning enhancement
By constructing dynamic heterogeneous graphs and knowledge graph reasoning, the problem of representing implicit collaborative relationships in radiology scheduling was solved, achieving efficient and compliant medical resource allocation and improving the global optimality of scheduling and team collaboration efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHAN DONG MSUN HEALTH TECH GRP CO LTD
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-12
AI Technical Summary
Existing radiology scheduling techniques suffer from problems such as insufficient rule integration, short-sighted decision-making, and inability to capture implicit collaborative relationships, resulting in a disconnect between scheduling results and actual needs, making it difficult to achieve scientific, efficient, and coordinated spatiotemporal scheduling of medical resources.
By constructing a dynamic heterogeneous graph containing doctors, positions, and rules, high-order semantic embedding features are extracted using a relational graph convolutional network. This is combined with a knowledge graph reasoning engine to perform counterfactual inference, calculate the comprehensive reward, and generate the final schedule through a proximal policy optimization algorithm.
It achieves explicit representation of implicit collaborative relationships, improves the compliance, balance and global optimality of scheduling results, solves the problem that traditional methods cannot predict future resource gaps, and enhances team collaboration efficiency and resource allocation flexibility.
Smart Images

Figure CN122201680A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the interdisciplinary field of medical resource management and artificial intelligence technology, and in particular relates to a scheduling method and system based on graph representation learning and knowledge reasoning enhancement. Background Technology
[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.
[0003] As a key diagnostic and treatment department in the hospital, the radiology department's scheduling work must strictly adhere to multiple constraints such as personnel qualifications, job sequence, and mutual exclusion rules. It also needs to take into account the efficiency of doctor collaboration, the balance of workload, and the overall allocation of resources. Therefore, the scheduling work has extremely high requirements for scientificity, dynamism, and foresight.
[0004] Existing scheduling methods are mostly based on static rule engines, which can only perform simple matching and filtering of explicit hard rules. They cannot handle implicit collaborative relationships between doctors, such as tacit understanding and fatigue transmission, which can easily lead to a disconnect between the scheduling results and actual work needs, resulting in low team collaboration efficiency. At the same time, static rule engines lack dynamic adjustment capabilities and foresight. When faced with changes in job requirements and fluctuations in doctors' condition, scheduling conflicts or even resource allocation "deadlock" problems are likely to occur, making it difficult to adapt to the 24-hour uninterrupted work scenario of radiology departments.
[0005] Currently, existing technologies incorporate traditional reinforcement learning methods for scheduling, overcoming the limitations of static rule engines. However, these methods suffer from low search efficiency and slow model convergence in the high-dimensional discrete feature space of radiology scheduling. Furthermore, traditional reinforcement learning treats the scheduling environment as a black box, making it difficult to integrate the complex social and management logic of medical scheduling, resulting in semantic gaps. Moreover, due to the lack of effective immediate reward feedback mechanisms, sparse rewards are common, and the model is prone to myopic behavior, focusing only on the compliance of the current time step and failing to predict resource gaps in future periods, leading to poor overall scheduling results and a continued likelihood of unavailability of key positions.
[0006] Furthermore, existing scheduling technologies that integrate rules and intelligent algorithms all use knowledge rules as post-processing filtering conditions. The algorithm first generates the scheduling results and then removes violations. This does not achieve deep coupling between rules and algorithm models, which not only reduces scheduling efficiency but also makes the model lose the flexibility to find the optimal solution at the edge of the rules. It is impossible to optimize soft goals such as team collaboration and balanced workload of doctors while meeting hard compliance requirements.
[0007] It is evident that existing radiology scheduling techniques suffer from problems such as insufficient rule integration, short-sighted decision-making, and inability to capture implicit collaborative relationships, making it difficult to achieve scientific, efficient, and coordinated spatiotemporal scheduling of medical resources. Summary of the Invention
[0008] To overcome the shortcomings of the prior art, this invention provides a scheduling method and system based on graph representation learning and knowledge reasoning enhancement. By deeply coupling knowledge graphs with reinforcement learning, it can effectively uncover implicit collaborative relationships among doctors, endow the model with forward-looking decision-making capabilities, solve problems such as sparse rewards, and improve the compliance, balance and global optimality of scheduling results. It is applicable to scheduling scenarios in hospital radiology departments and other similar human resources.
[0009] To achieve the above objectives, one or more embodiments of the present invention provide the following technical solutions: The first aspect of the present invention provides a scheduling method based on graph representation learning and knowledge reasoning enhancement, comprising: Obtain the basic data set for radiology department scheduling, which includes a personnel set, a job set, and a rule set; Based on the combination of basic data, a dynamic heterogeneous graph containing personnel, positions and rules is constructed; When the shift scheduling time arrives, the relationship graph convolutional network model is used to extract the high-order semantic embedding feature set of personnel nodes and job node in the heterogeneous graph at the current time. Based on the high-order semantic embedding feature set and the features of the pending job positions, the selection probability distribution of all candidate doctors is calculated. Actions are sampled based on the selection probability distribution, and counterfactual reasoning is performed using a knowledge graph reasoning engine to calculate the reasoning reward. The dynamic heterogeneous graph structure is updated based on the actions to obtain the heterogeneous graph state at the next moment, and the comprehensive reward is calculated. Based on comprehensive rewards, the parameters of the relational graph convolutional network model and the policy network are optimized using a near-end policy optimization algorithm to obtain the final schedule.
[0010] As one implementation method, a dynamic heterogeneous graph containing personnel, positions, and rules is constructed based on the combination of basic data. The specific process is as follows: Construct a set of nodes for a dynamic heterogeneous graph, which includes doctor nodes, job nodes, and time nodes; Construct the edge set of a dynamic heterogeneous graph, where the edge set includes static semantic edges and dynamic state edges; Initialize the node feature matrix, which includes personnel node features, job node features, and time node features.
[0011] As one implementation method, the edge set of the dynamic heterogeneous graph is constructed as follows: Static semantic edges are constructed based on the rule set, where static semantic edges include qualification permission edges and doctor collaboration edges; Calculate the weights of the doctor collaboration edges; Dynamic state edges are dynamically generated based on the current scheduling results.
[0012] As one implementation method, the node feature matrix is initialized as follows: The total number of nodes and the feature dimensions are used as the dimensions of the node feature matrix. The feature dimensions include doctor node features, job node features, and time node features. The doctor's node characteristics consist of job level code, historical average fatigue score, equipment operation authorization tag, and historical overtime hours accumulated. The job node characteristics consist of the job difficulty coefficient, the minimum qualification level required for the job, and the average number of patients received by the job. By embedding the position of a time point within a period using sine and cosine coding, time node features are obtained.
[0013] As one implementation method, when the shift scheduling time arrives, a relational graph convolutional network model is used to extract high-order semantic embedding feature sets of personnel nodes and job node in the heterogeneous graph at the current time. The relational graph convolutional network model includes multiple relational graph convolutional layers, and the specific process is as follows: Iterate through all neighboring nodes of each type in the heterogeneous scheduling graph; Feature aggregation is performed using multi-layer relational graph convolutional layers to obtain the final node embedding set, namely, the high-order semantic embedding feature set of personnel nodes and job node.
[0014] As one implementation method, based on the high-order semantic embedding feature set and the features of the positions to be ranked, the selection probability distribution of all candidate doctors is calculated. The specific process is as follows: Take the embedding vector of the current job node to be scheduled as the query vector; Take the set of embedding vectors of all personnel nodes as the key vector and value vector; Traverse the rules of the heterogeneous graph to generate a knowledge mask vector; Calculate the attention distribution based on the knowledge mask vector; Based on the attention distribution, the attention weights with knowledge masks are mapped to the selection probability distribution of candidate doctors.
[0015] As one implementation method, a knowledge graph reasoning engine is used to perform counterfactual reasoning and calculate the reasoning reward. The specific process is as follows: After selecting an action, create a temporary copy of the graph; Trigger a knowledge inference engine on a temporary graph copy. The knowledge inference engine iterates and checks key positions within a future preset time step based on the logical constraints in the rule set. Calculate the size of the candidate set for each key position within the future preset time step, and perform extrapolation; Counterfactual reasoning rewards are generated based on the reasoning results. If the reasoning fails, a strong penalty reward is calculated; if the reasoning succeeds, a positive incentive reward is calculated.
[0016] As one implementation method, the comprehensive reward is calculated, and the specific process is as follows: Obtain the workload vectors of all doctors and calculate their variance to obtain the balanced reward. If the selected personnel's historical shift account is in a state of arrears, they will receive positive compensation rewards; The balanced reward, positive compensation reward, and positive incentive reward are weighted and combined to obtain the comprehensive reward.
[0017] As one implementation method, based on comprehensive rewards, the parameters of the relational graph convolutional network model and the policy network are optimized using a near-end policy optimization algorithm to obtain the final schedule. The specific process is as follows: Set the length of the shift scheduling cycle, and collect trajectory data for each time step based on comprehensive rewards; Calculate the estimate of the advantage function; Based on the advantage function estimate, a pruning objective function for the near-end policy optimization algorithm is constructed. Based on the pruning objective function of the near-end policy optimization algorithm, the overall objective function of the near-end policy optimization is constructed. Maximize the overall objective function of the proximal policy optimization until the policy network converges; The optimized model output action sequences are mapped back to the original data structure to generate the final schedule.
[0018] A second aspect of the present invention provides a scheduling system based on graph representation learning and knowledge reasoning enhancement, comprising: The data acquisition module is used to acquire the basic data set for the radiology department's scheduling, which includes a personnel set, a job set, and a rule set. The graph construction module is used to build a dynamic heterogeneous graph at the initial moment based on the combination of basic data, and to encode historical scheduling data as the initial features of nodes; The feature extraction module is used to input the current heterogeneous graph state into the relational graph convolutional network model when entering the scheduling time, and extract the high-order semantic embedding feature set of personnel nodes and job nodes. The probability calculation module is used to calculate the selection probability distribution of all candidate doctors based on the high-order semantic embedding feature set and the features of the positions to be ranked; The deduction reward module is used to sample actions based on the selection probability distribution and use the knowledge graph reasoning engine to perform counterfactual deduction and calculate the deduction reward. The comprehensive reward module is used to update the dynamic heterogeneous graph structure based on actions, obtain the heterogeneous graph state at the next moment, and calculate the comprehensive reward. The optimized output module is used to optimize the parameters of the relational graph convolutional network model and the policy network based on the comprehensive reward and the near-end policy optimization algorithm to obtain the final schedule.
[0019] The above one or more technical solutions have the following beneficial effects: In this embodiment, a dynamic heterogeneous graph containing doctor nodes, job nodes, and time nodes is constructed. This dynamic heterogeneous graph modeling achieves an explicit representation of implicit collaborative relationships, quantifying the historical level of collaboration and tacit understanding among doctors into computable graph structure information. For the first time, the implicit human factor of "tacit partnership," which is difficult to formally express, is transformed into a graph structure, allowing implicit collaborative relationships to participate in decision-making alongside explicit rules within a unified graph framework. Simultaneously, a relational graph convolutional network is used to extract features from the dynamic heterogeneous graph, employing a multi-layer aggregation mechanism. This allows the weights of collaborative edges to directly participate in the node feature update process, internalizing the degree of collaborative tacit understanding as a high-dimensional embedded feature of doctor nodes. This solves the problem that existing technologies cannot capture implicit collaborative relationships, enabling the model to perceive "which doctor pairings are more efficient" at the initial decision-making stage. This achieves deep semantic fusion of implicit relationships, naturally favoring the recommendation of tacit partnership combinations during the scheduling process.
[0020] In this embodiment, knowledge graph constraints are transformed into Mask Bias in the attention mechanism, achieving deep coupling between rules and neural networks. This not only ensures the compliance of scheduling results but also gives the model the flexibility to find optimal solutions at the edge of rules, such as maximizing team collaboration without violating hard rules.
[0021] In this embodiment, counterfactual reasoning is performed using a knowledge graph reasoning engine, enabling the model to anticipate the impact of current decisions on subsequent scheduling, thus solving the fundamental flaw of traditional greedy scheduling methods that "only consider the present and ignore the future." Simultaneously, the designed counterfactual reasoning reward function fundamentally eliminates deadlock in the scheduling process, allowing the model to overcome the "myopic" tendency of traditional RL methods. Like a human scheduler, the model can recognize that "if we use firefighters today, there will be no one available for critical moments the day after tomorrow," thereby making globally optimal time-span decisions.
[0022] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0023] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0024] Figure 1 This is a flowchart of the scheduling method based on graph representation learning and knowledge reasoning enhancement in this embodiment. Detailed Implementation
[0025] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0026] It should be noted that the terminology used herein is for the purpose of describing particular implementations only and is not intended to limit the exemplary implementations of the present invention.
[0027] Where there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.
[0028] Example 1 This embodiment discloses a scheduling method based on graph representation learning and knowledge reasoning enhancement.
[0029] To more clearly illustrate this embodiment, the scheduling process based on graph representation learning and knowledge reasoning enhancement can be specifically described as follows: Scheduling methods based on graph representation learning and knowledge reasoning enhancement include: S1. Obtain the basic data set for the radiology department's scheduling, which includes a personnel set, a job set, and a rule set; S2. Based on the combination of basic data, construct a dynamic heterogeneous graph that includes personnel, positions and rules; S3. When the shift scheduling time is reached, the relationship graph convolutional network model is used to extract the high-order semantic embedding feature set of personnel nodes and job nodes in the heterogeneous graph at the current time. S4. Based on the high-order semantic embedding feature set and the features of the positions to be ranked, the selection probability distribution of all candidate doctors is calculated using a multi-head attention mechanism. S5. Sampling of actions is performed based on the selection probability distribution, and counterfactual reasoning is performed using the knowledge graph reasoning engine to calculate the reasoning reward; S6. Update the dynamic heterogeneous graph structure according to the action, obtain the heterogeneous graph state at the next moment, and calculate the comprehensive reward; S7. Based on comprehensive rewards, the parameters of the relational graph convolutional network model and the policy network are optimized through the near-end policy optimization algorithm to obtain the final schedule.
[0030] like Figure 1 As shown, in step S1, the basic data set for radiology department scheduling is obtained, which includes a personnel set, a job set, and a rule set.
[0031] In this embodiment, the basic data set for radiology department scheduling is obtained. , represented as: (1) in, This indicates that the doctors have gathered. Represents a set of job positions. Represents a set of rules.
[0032] The system obtains the doctor set, job set, and rule set by connecting to the hospital's existing Information Management System (HIS) database, scheduling management system, or by reading manually imported configuration files. Specifically: The acquisition of the doctor pool involves extracting basic information about the doctors participating in the scheduling from the personnel information database, including but not limited to doctor identification, job level (e.g., levels 1-4), operation authorization tags for specific equipment (e.g., MRI / CT), historical average fatigue score, and cumulative overtime hours in the past 30 days.
[0033] The acquisition of job sets involves extracting job demand information from the department's scheduling demand database, including job identifier, job difficulty coefficient (e.g., 0.1-1.0), minimum qualification level required for the job, and average number of patients seen for the job.
[0034] The rule set is obtained by extracting the scheduling constraints set by the department from the management system or configuration table, including explicit hard rules (such as no night shifts during pregnancy, and qualification-based rules for specific positions) and implicit soft preferences (such as avoiding consecutive overtime, and the degree of closeness of historical collaboration).
[0035] Through the above steps, the basic data and constraints of the scheduling environment were fully acquired and digitally converted, laying a complete and accurate data foundation for the subsequent construction of dynamic heterogeneous graphs and model training.
[0036] like Figure 1 As shown, in step S2, a dynamic heterogeneous graph containing personnel, positions, and rules is constructed based on the combination of basic data.
[0037] Based on basic data combination Constructing the dynamic heterogeneous graph at the initial moment The historical scheduling data is then encoded as the initial features of the nodes. The specific process is as follows: (1) Construct a set of nodes for a dynamic heterogeneous graph, wherein the set of nodes includes doctor nodes, job nodes and time nodes.
[0038] Specifically, define the set of nodes in the graph. Including doctor nodes Job milestones and time nodes .
[0039] (2) Construct the edge set of the dynamic heterogeneous graph, wherein the edge set includes static semantic edges and dynamic state edges.
[0040] Specifically, define the set of edges of the graph. It consists of two parts: static semantic edges and dynamic state edges.
[0041] 1) Construct static semantic edges based on the rule set, where static semantic edges include qualification permission edges and doctor collaboration edges.
[0042] Specifically, according to the set of rules Construct static semantic edges, such as This indicates that the qualifications are permitted. This indicates close historical collaboration, meaning that static semantic edges include qualification-permitted edges and doctor collaboration edges.
[0043] 2) Calculate the weight of the doctor collaboration edge.
[0044] Specifically, The weights of the edges are calculated from the historical collaborative efficiency matrix, using the following formula: (2) in, This refers to the time during which two doctors work together without any complaints. The edge weight will participate in the aggregation calculation of R-GCN, making the model tend to recommend "compatible partners".
[0045] 3) Dynamically generate dynamic state edges based on the current scheduling results.
[0046] Dynamically generated based on the current scheduling results. (If the doctor...) exist I was assigned a position at all times. Then establish a time-series connection edge. .
[0047] (3) Initialize the node feature matrix.
[0048] 1) The total number of nodes and the feature dimensions are used as the dimensions of the node feature matrix. The feature dimensions include doctor node features, job node features and time node features.
[0049] Specifically, initialize the node feature matrix. , dimension ,in The total number of nodes. Let be the feature dimension, where is the feature dimension. It includes doctor node characteristics, job node characteristics, and time node characteristics.
[0050] 2) The doctor node characteristics consist of job level code, historical average fatigue score, equipment operation authorization tag and historical overtime hours.
[0051] Physician node characteristics include job level (1-4), historical average fatigue score, operation authorization tags for specific equipment (such as MRI / CT), and cumulative overtime hours in the past 30 days.
[0052] 3) The characteristics of job nodes consist of the job difficulty coefficient, the minimum qualification level required for the job, and the average number of patients received by the job.
[0053] The job node characteristics include the job difficulty coefficient (0.1-1.0), the minimum qualification level required for the job, and the average number of patients seen for the job.
[0054] 4) The position of the time point in the period is embedded by sine and cosine coding to obtain the time node features.
[0055] The time node characteristics are represented by sine and cosine encoding to indicate the position of the current time point in the weekly / monthly cycle, in order to capture the periodic pattern of the scheduling.
[0056] Through the above steps, the implicit collaborative relationship is made explicit, and the degree of historical collaboration and tacit understanding among doctors is quantified into computable graph structure information (i.e., static semantic edges and dynamic state edges). This allows implicit human factors that are difficult to express formally to participate in decision-making together with explicit rules within a unified graph framework.
[0057] like Figure 1 As shown, in step S3, when the shift scheduling time is entered, the relationship graph convolutional network model is used to extract the high-order semantic embedding feature set of personnel nodes and job nodes in the heterogeneous graph at the current time.
[0058] Entering the scheduling time step The current graph state Input into the Relational Graph Convolutional Network (R-GCN) model In the process, extract the high-order semantic embedding feature set of doctor nodes and job node. .
[0059] The relational graph convolutional network model includes multiple layers of relational graph convolutional layers, each containing a transformation matrix for handling specific edge relationship types (such as "qualification" and "repulsion"). Self-circulating transformation matrix used to process the characteristics of nodes themselves Normalization constant used to prevent numerical explosion during feature aggregation And activation functions for implementing nonlinear mappings. According to formula (3), the model extracts and outputs a high-order embedding set containing structured semantic information by aggregating the features of each node’s neighbor nodes under different types of relationships layer by layer.
[0060] Extracting the high-order semantic embedding feature set of personnel nodes and job node in the heterogeneous graph at the current moment, the specific process is as follows: (1) Traverse all types of neighboring nodes of each node in the scheduling time heterogeneous graph.
[0061] For the graph Each node in Iterate through all its neighbor nodes of all types. .
[0062] (2) Use multi-layer relation graph convolutional layers to perform feature aggregation to obtain the final node embedding set, namely the high-order semantic embedding feature set of personnel nodes and job nodes.
[0063] Feature aggregation is performed using a multi-layer relational graph convolutional layer, with the following formula: (3) in, Indicates the relationship type of the edge, such as "qualification" or "repulsion"; Indicates the activation function; Represents the normalization constant, usually defined as a node. In relationship The number of neighbors, i.e., degree, is used to prevent numerical explosion during feature aggregation; and All represent weight learning matrices. It is for a specific relationship The transformation matrix, It is a self-circulating transformation matrix based on the characteristics of the node itself; This represents the layer index of the neural network.
[0064] go through After layer aggregation, the final set of node embeddings is obtained. , is represented as: (4) Each vector The dimension is This preserves the structured semantic information of the graph.
[0065] After the above steps, the weights of the collaborative edges directly participate in the node feature update process, internalizing the graph structured and semantic information such as the degree of collaborative understanding into high-dimensional embedded features of doctor nodes. This solves the problem that existing technologies cannot capture implicit collaborative relationships, enabling the model to perceive which doctor pairings are more efficient in the early stages of decision-making.
[0066] like Figure 1 As shown, in step S4, based on the high-order semantic embedding feature set and the features of the positions to be ranked, the selection probability distribution of all candidate doctors is calculated using a multi-head attention mechanism.
[0067] Utilizing knowledge-guided attention modules, combined with Based on the characteristics of the currently pending job positions, calculate the selection probability distribution of all candidate doctors. The specific process is as follows: (1) Take the embedding vector of the current job node to be scheduled as the query vector.
[0068] Specifically, the query generates the embedding vector of the current job node to be scheduled. as query vector Dimension .
[0069] (2) Take the set of embedding vectors of all personnel nodes as key vectors and value vectors.
[0070] Key-value generation: retrieve the set of embedding vectors for all doctor nodes. as a key vector Sum value vector Dimension .
[0071] (3) Traverse the heterogeneous graph rules to generate a knowledge mask vector.
[0072] Knowledge mask generation involves traversing knowledge graph rules. (If a doctor...) With position If there is a hard conflict (such as "no night shifts during pregnancy"), then in the mask vector Set the corresponding position to Otherwise set to .
[0073] (4) Calculate the attention distribution based on the knowledge mask vector.
[0074] Attention calculation formula: (5) in, It is a knowledge mask vector with the same dimension as the attention matrix, used to filter out violating doctor-job combinations; This represents the scaling factor, which is usually equal to the query vector. The dimension, that is The scaling factor is used to prevent the softmax gradient from vanishing due to an excessively large dot product result. (Superscript): Matrix transpose symbol.
[0075] (5) Based on the attention distribution, the attention weights with knowledge masks are mapped to the selection probability distribution of candidate doctors.
[0076] Based on the attention distribution, the attention weights with knowledge masks are directly mapped to the selection probability distribution of candidate doctors. The specific formula is: (6) in, This is the attention row vector calculated in the previous step. Let be the column vector obtained after transposition, which represents the selection probability distribution of candidate doctors, and its dimension is . The probability distribution has already eliminated non-compliant options (such as what positions a doctor cannot hold) during the calculation process using a mask matrix, and the high-quality candidates have been weighted according to the graph relationships.
[0077] Through the above steps, invalid combinations that violate hard rules are eliminated by using knowledge masks, and high-quality candidates (such as "skilled partners") are weighted and distributed according to graph relationships. This achieves deep coupling between rules and neural networks, giving the model the flexibility to find optimal solutions (such as maximizing collaborative tacit understanding) at the edge of rules while meeting compliance requirements.
[0078] like Figure 1 As shown, in step S5, action sampling is performed based on the selection probability distribution, and counterfactual reasoning is performed using the knowledge graph reasoning engine to calculate the reasoning reward.
[0079] Based on the selection probability distribution Sampling action That is, selecting a specific doctor and using a knowledge graph reasoning engine. Step-by-step counterfactual deduction, calculate the deduction reward. The specific process is as follows: (1) After selecting an action, create a temporary copy of the graph.
[0080] Specifically, virtual rehearsals are conducted, and actions are selected. (i.e., assuming you choose a doctor) After that, create a temporary copy of the graph. .
[0081] (2) Trigger the knowledge inference engine on the temporary graph copy. The knowledge inference engine traverses and checks key positions within the future preset time step based on the logical constraints in the rule set.
[0082] Specifically, perform logical chain detection: in the replica The knowledge inference engine is triggered. Based on the logical constraints in the rule set, the knowledge inference engine iterates and examines the future. Key positions within a specific time frame, such as "Saturday night shift".
[0083] (3) Calculate the size of the candidate set for each key position within the future preset time step and perform a deduction.
[0084] Specifically, satisfiability calculation is performed on a temporary graph copy. Above, based on rule set constraints, the size of the candidate set for future key positions is statistically analyzed. The calculation formula is as follows: (7) in, The total number of doctors in the department; For an indicator function, if and only if the doctor node Meets all the rule constraints required for future key positions The value is 1 if the qualifications allow it and there is no scheduling conflict; otherwise, the value is 0. For logical satisfaction symbols, Represents doctor node Meets all the rule constraints required for future key positions .
[0085] If the size of the candidate set for a certain future key position is... If a deadlock occurs, the deduction is considered to have failed.
[0086] (4) Generate counterfactual deduction rewards based on the deduction results. If the deduction fails, calculate strong penalty rewards; if the deduction succeeds, calculate positive incentive rewards.
[0087] Specifically, in the reward generation process, if the prediction fails (i.e., the candidate set for future key positions is empty), a strong penalty reward is given, using the following formula: (8) in, This represents the intensity of the penalty, such as 10.0.
[0088] If the projection is successful, a positive incentive reward will be calculated based on the resource sufficiency in the future time period, using the following formula: (9) in, The scaling weighting factor is used to adjust the inference reward in the total reward. The proportion in; As a kind Type activation function, formula is Its function is to map the "candidate surplus" to the range (0, 1) to prevent the reward value from being too large when there are too many candidates, i.e., reward explosion, so as to make the training more stable. This refers to the prediction window, which is the value you set from the current time step. Starting to calculate backwards (Each time step, such as the next 3 days or 24 hours). This is a minimum operator, representing the search for the point in time when resources are most scarce (fewest candidates) within the entire future prediction window. This indicates the number of available positions with the fewest candidates within the future forecast window. For example, if a safety threshold is set... If there are fewer than two available doctors at some future time, the term in parentheses will be negative, the reward will decay rapidly, and the model will be warned that the current scheduling is at risk.
[0089] Through the above steps, a positive incentive reward is obtained, solving the reward sparsity problem in reinforcement learning. That is, the agent does not need to wait until the last day to know that it has made a mistake, but rather each step is guided by "foresight". At the same time, it provides a foundation for the subsequent calculation of the total reward function.
[0090] like Figure 1 As shown, in step S6, the dynamic heterogeneous graph structure is updated according to the action to obtain the heterogeneous graph state at the next moment, and the comprehensive reward is calculated.
[0091] Execute action Update the heterogeneous graph structure to obtain And calculate the comprehensive reward. The specific process is as follows: (1) Obtain the current workload vector of all doctors and calculate its variance to obtain the balanced reward.
[0092] Specifically, calculate the equilibrium reward and obtain the workload vector of all doctors at present. Calculate its variance To obtain a balanced reward, the formula is: (10) (2) If the selected personnel’s historical shift account is in a state of being behind, they will receive a positive compensation reward.
[0093] If selected doctor Historical scheduling account If someone is in a state of missed shifts, they will receive positive compensation rewards. . Specifically: First, based on the historical characteristics of the doctor nodes (such as historical attendance data and overtime hours), calculate the historical scheduling account of the candidate doctors. The formula is: (11) in, For doctors The actual cumulative working hours or number of shifts within the previously defined period. This refers to the standard average working hours or number of shifts for doctors of the same rank within the same period.
[0094] Secondly, if the selected doctor Historical scheduling account This means that their actual workload is lower than the standard workload, and they are in a state of "off-duty". To encourage the model to prioritize the scheduling of off-duty personnel and promote long-term workload balance, they will be given positive compensation rewards. The formula for calculating positive compensation rewards is: (12) in, The positive compensation adjustment coefficient is a constant, set to 0.5; its absolute value... This indicates the extent of the doctor's missed shifts. The more missed shifts, the greater the compensation bonus they will receive after being assigned to this position.
[0095] (3) The balanced reward, positive compensation reward and positive incentive reward are weighted and integrated to obtain the comprehensive reward.
[0096] Specifically, total reward integration, i.e., comprehensive reward. The calculation formula is: ; in, To balance rewards, As a positive compensation reward, To provide positive incentives and rewards, , , These are the weights of balanced rewards, positive compensation rewards, and positive incentive rewards, respectively.
[0097] After the above steps, the workload balancing reward, the off-duty positive compensation reward, and the future positive incentive reward are weighted and integrated to provide the model with a global feedback signal that comprehensively considers the current fairness and future risk, guiding the algorithm to update the strategy in the direction of multi-objective optimization.
[0098] like Figure 1 As shown, in step S7, based on the comprehensive reward, the parameters of the relational graph convolutional network model and the policy network are optimized by the near-end policy optimization algorithm to obtain the final schedule.
[0099] Using the collected trajectory data Optimize the model using the PPO algorithm The strategy network parameters are calculated until the reward converges, and the final schedule is output. .
[0100] (1) Utilizing the collected trajectory data Optimize the model using the PPO algorithm The network parameters and policy parameters are calculated until the reward converges, resulting in the final schedule.
[0101] 1) Set the length of the shift scheduling cycle and collect trajectory data for each time step based on comprehensive rewards.
[0102] The trajectory data for each time step is collected and represented as follows: ; Among them, s t This represents the current state of the heterogeneous graph. For the selected action, For comprehensive rewards, s t+1 This represents the state of the heterogeneous graph at the next time step.
[0103] 2) Calculate the estimated value of the advantage function.
[0104] Assess the advantage of the current action relative to the average level, and calculate the advantage function estimate using the generalized advantage estimation method. The formula is: ; Where, δ t For timing difference error, δ t = +γV(s t+1 ) V(s t ), V(s t ) represents the value network pair of states s t The estimated value; γ is the discount factor; λ is the hyperparameter of the generalized advantage estimation; and T is the total number of time steps.
[0105] 3) Based on the advantage function estimate, construct the pruning objective function for the near-end policy optimization algorithm.
[0106] ; Where, r t (θ) represents the ratio of the old to the new strategies. ; The hyperparameters are used to prune and limit the magnitude of policy updates to prevent training instability.
[0107] 4) Based on the pruning objective function of the near-end policy optimization algorithm, construct the overall objective function of the near-end policy optimization.
[0108] By combining the loss function and entropy regularization term of the value network, a total objective function for near-end policy optimization is constructed.
[0109] The overall objective function formula for near-end strategy optimization is: ; Among them, L VF (θ) represents the value network loss, typically expressed as mean squared error. ;S[πθ](s t ) represents the policy entropy regularization term, and c1 and c2 are the weight coefficients of the value network loss and the policy entropy regularization term, respectively.
[0110] 5) Maximize the overall objective function of the near-end policy optimization until the policy network converges.
[0111] The parameters of the relational graph convolutional network model and the policy network are iteratively updated by an optimization algorithm (such as the Adam optimizer). The total loss function corresponding to the inverse of the pruning objective function is minimized, that is, the total objective function is maximized, until the policy network converges.
[0112] 6) Map the optimized model's output action sequences back to the original data structure to generate the final schedule. .
[0113] Through the above steps, the end-to-end differentiable optimization of the scheduling strategy was achieved, which can output a globally optimal scheduling sequence that satisfies multiple complex constraints. At the same time, through the design of dynamic slice updates, the rolling scheduling needs of long-term work in the radiology department were met.
[0114] (2) After outputting the final schedule, perform rolling scheduling.
[0115] 1) Set the length of the scheduling cycle (e.g., 7 days).
[0116] 2) Based on the increasing time step, the model outputs the action sequence in sequence.
[0117] As time goes by from Increase to The model outputs actions sequentially. .
[0118] 3) Map the action sequence back to the original data structure to generate a scheduling matrix with personnel identifiers as elements.
[0119] Map the action sequences back to the original data structure to generate the scheduling matrix. Size is The element value is the doctor's ID.
[0120] 4) If rolling scheduling is required, take the last few columns of the scheduling matrix as new historical state inputs and repeat the prediction for the next cycle.
[0121] Dynamic slice updates; if rolling scheduling is required, retrieve... After The column is used as the new historical state input, and step b) is repeated to predict the next cycle.
[0122] Through the above steps, a dynamic rolling schedule that operates 24 / 7 without interruption is achieved over long periods. This ensures the continuity and compliance of scheduling across cycles by using a dynamic slicing update mechanism to update the scheduling status from the end of the previous cycle (i.e., the subsequent cycle). The system seamlessly transfers features and uses them as the initial historical state for the next cycle, enabling the model to remember and continue the accumulated doctor fatigue and workload from the previous cycle. This effectively avoids the risks of "schedule gaps" and cross-cycle violations (such as consecutive night shifts without rest) caused at cycle boundaries (such as the end of last week and the beginning of this week). At the same time, it improves the system's spatiotemporal adaptability and global smoothness, overcoming the limitation that traditional static scheduling can only generate fixed time blocks (such as single week / single month). This allows the model to dynamically adapt to the infinitely extended time axis requirements in the real scenario of hospital radiology departments, maintaining a dynamic balance in the distribution of the doctor team's workload on a long-term time scale.
[0123] Example 2 The purpose of this embodiment is to provide a scheduling system based on graph representation learning and knowledge reasoning enhancement, including: The data acquisition module is used to acquire the basic data set for the radiology department's scheduling, which includes a personnel set, a job set, and a rule set. The graph construction module is used to build dynamic heterogeneous graphs containing personnel, positions, and rules based on the combination of basic data. The feature extraction module is used to extract high-order semantic embedding feature sets of personnel nodes and job nodes in the heterogeneous graph at the current time when the shift scheduling time is entered, using a relational graph convolutional network model. The probability calculation module is used to calculate the selection probability distribution of all candidate doctors based on the high-order semantic embedding feature set and the features of the positions to be ranked; The deduction reward module is used to sample actions based on the selection probability distribution and use the knowledge graph reasoning engine to perform counterfactual deduction and calculate the deduction reward. The comprehensive reward module is used to update the dynamic heterogeneous graph structure based on actions, obtain the heterogeneous graph state at the next moment, and calculate the comprehensive reward. The optimized output module is used to optimize the parameters of the relational graph convolutional network model and the policy network based on the comprehensive reward and the near-end policy optimization algorithm to obtain the final schedule.
[0124] Based on the provision of a scheduling system enhanced by graph representation learning and knowledge reasoning, the method steps in Implementation Example 1 are implemented.
[0125] Example 3 The purpose of this embodiment is to provide a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the above-described method.
[0126] Example 4 The purpose of this embodiment is to provide a computer-readable storage medium.
[0127] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the steps of the above method.
[0128] Example 5 The purpose of this embodiment is to provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the methods and functions involved in any of the above embodiments.
[0129] The steps and methods involved in the apparatus of the above embodiments correspond to those in Embodiment 1. For specific implementation details, please refer to the relevant description section of Embodiment 1. The term "computer-readable storage medium" should be understood as a single medium or multiple media including one or more instruction sets; it should also be understood as including any medium capable of storing, encoding, or carrying an instruction set for execution by a processor and enabling the processor to perform any of the methods in this invention.
[0130] Those skilled in the art will understand that the modules or steps of the present invention described above can be implemented using general-purpose computer devices. Optionally, they can be implemented using computer-executable program code, thereby allowing them to be stored in a storage device for execution by a computer device, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. The present invention is not limited to any particular combination of hardware and software.
[0131] While the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.
Claims
1. A scheduling method based on graph representation learning and knowledge reasoning enhancement, characterized in that, include: Obtain the basic data set for radiology department scheduling, which includes a personnel set, a job set, and a rule set; Based on the combination of basic data, a dynamic heterogeneous graph containing personnel, positions and rules is constructed; When the shift scheduling time arrives, the relationship graph convolutional network model is used to extract the high-order semantic embedding feature set of personnel nodes and job node in the heterogeneous graph at the current time. Based on the high-order semantic embedding feature set and the features of the pending job positions, the selection probability distribution of all candidate doctors is calculated. Actions are sampled based on the selection probability distribution, and counterfactual reasoning is performed using a knowledge graph reasoning engine to calculate the reasoning reward. The dynamic heterogeneous graph structure is updated based on the actions to obtain the heterogeneous graph state at the next moment, and the comprehensive reward is calculated. Based on comprehensive rewards, the parameters of the relational graph convolutional network model and the policy network are optimized using a near-end policy optimization algorithm to obtain the final schedule.
2. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 1, characterized in that, Based on the combination of basic data, a dynamic heterogeneous graph containing personnel, positions, and rules is constructed. The specific process is as follows: Construct a set of nodes for a dynamic heterogeneous graph, which includes doctor nodes, job nodes, and time nodes; Construct the edge set of a dynamic heterogeneous graph, where the edge set includes static semantic edges and dynamic state edges; Initialize the node feature matrix, which includes personnel node features, job node features, and time node features.
3. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 2, characterized in that, Constructing the edge set of a dynamic heterogeneous graph is specifically as follows: Static semantic edges are constructed based on the rule set, where static semantic edges include qualification permission edges and doctor collaboration edges; Calculate the weights of the doctor collaboration edges; Dynamic state edges are dynamically generated based on the current scheduling results.
4. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 2, characterized in that, Initialize the node feature matrix as follows: The total number of nodes and the feature dimension are used as the dimensions of the node feature matrix. Among them, the feature dimensions include doctor node features, job node features, and time node features; The doctor's node characteristics consist of job level code, historical average fatigue score, equipment operation authorization tag, and historical overtime hours accumulated. The job node characteristics consist of the job difficulty coefficient, the minimum qualification level required for the job, and the average number of patients received by the job. By embedding the position of a time point within a period using sine and cosine coding, the time node features are obtained.
5. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 1, characterized in that, When the shift scheduling begins, a relational graph convolutional network model is used to extract high-order semantic embedding feature sets of personnel nodes and job node in the heterogeneous graph at the current time. The relational graph convolutional network model includes multiple relational graph convolutional layers, and the specific process is as follows: Iterate through all neighboring nodes of each type in the heterogeneous scheduling graph; Feature aggregation is performed using multi-layer relational graph convolutional layers to obtain the final node embedding set, namely, the high-order semantic embedding feature set of personnel nodes and job node.
6. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 1, characterized in that, Based on the high-order semantic embedding feature set and the features of the pending job positions, the selection probability distribution of all candidate doctors is calculated. The specific process is as follows: Take the embedding vector of the current job node to be scheduled as the query vector; Take the set of embedding vectors of all personnel nodes as the key vector and value vector; Traverse the rules of the heterogeneous graph to generate a knowledge mask vector; Calculate the attention distribution based on the knowledge mask vector; Based on the attention distribution, attention weights with knowledge masks are mapped to the selection probability distribution of candidate doctors.
7. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 1, characterized in that, The counterfactual reasoning is performed using a knowledge graph reasoning engine, and the reasoning reward is calculated. The specific process is as follows: After selecting an action, create a temporary copy of the graph; Trigger a knowledge inference engine on a temporary graph copy. The knowledge inference engine iterates and checks key positions within a future preset time step based on the logical constraints in the rule set. Calculate the size of the candidate set for each key position within the future preset time step, and perform extrapolation; Counterfactual reasoning rewards are generated based on the reasoning results. If the reasoning fails, a strong penalty reward is calculated; if the reasoning succeeds, a positive incentive reward is calculated.
8. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 1, characterized in that, The calculation of the overall reward is as follows: Obtain the workload vectors of all doctors and calculate their variance to obtain the balanced reward. If the selected personnel's historical shift account is in a state of arrears, they will receive positive compensation rewards; The balanced reward, positive compensation reward, and positive incentive reward are weighted and combined to obtain the comprehensive reward.
9. The scheduling method based on graph representation learning and knowledge reasoning enhancement as described in claim 1, characterized in that, Based on comprehensive rewards, the parameters of the relational graph convolutional network model and the policy network are optimized using a near-end policy optimization algorithm to obtain the final shift schedule. The specific process is as follows: Set the length of the shift scheduling cycle, and collect trajectory data for each time step based on comprehensive rewards; Calculate the estimate of the advantage function; Based on the advantage function estimate, a pruning objective function for the near-end policy optimization algorithm is constructed. Based on the pruning objective function of the near-end policy optimization algorithm, the overall objective function of the near-end policy optimization is constructed. Maximize the overall objective function of the proximal policy optimization until the policy network converges; The optimized model output action sequences are mapped back to the original data structure to generate the final schedule.
10. A scheduling system based on graph representation learning and knowledge reasoning enhancement, characterized in that, include: The data acquisition module is used to acquire the basic data set for the radiology department's scheduling, which includes a personnel set, a job set, and a rule set. The graph construction module is used to build a dynamic heterogeneous graph at the initial moment based on the combination of basic data, and to encode historical scheduling data as the initial features of nodes; The feature extraction module is used to input the current heterogeneous graph state into the relational graph convolutional network model when entering the scheduling time, and extract the high-order semantic embedding feature set of personnel nodes and job nodes. The probability calculation module is used to calculate the selection probability distribution of all candidate doctors based on the high-order semantic embedding feature set and the features of the positions to be ranked; The deduction reward module is used to sample actions based on the selection probability distribution and use the knowledge graph reasoning engine to perform counterfactual deduction and calculate the deduction reward. The comprehensive reward module is used to update the dynamic heterogeneous graph structure based on actions, obtain the heterogeneous graph state at the next moment, and calculate the comprehensive reward. The optimized output module is used to optimize the parameters of the relational graph convolutional network model and the policy network based on the comprehensive reward and the near-end policy optimization algorithm to obtain the final schedule.