A distributed resource scheduling system and method based on a large model

By using a distributed resource scheduling system based on a large model, the problems of network architecture rigidity and difficulty in federated collaborative training in large-scale heterogeneous scenarios are solved. The system achieves adaptive architecture evolution and efficient, privacy-preserving scheduling, thereby improving the generalization ability of the model and the efficiency of resource allocation.

CN122309155APending Publication Date: 2026-06-30WUXI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WUXI UNIV
Filing Date
2026-03-31
Publication Date
2026-06-30

Smart Images

  • Figure CN122309155A_ABST
    Figure CN122309155A_ABST
Patent Text Reader

Abstract

This invention discloses a distributed resource scheduling system and method based on a large model, belonging to the field of artificial intelligence technology. The method includes: constructing a digital twin environment to uniformly map task flows and heterogeneous resources into a standardized heterogeneous graph; using a large language model (LLM) to perceive resource congestion and task semantics of the client, generating a personalized GNN architecture mask; the server pruning the global hypernetwork based on the mask and issuing subnet parameters adapted to the current scenario; the client fusing features such as relative load rate for graph message passing, generating state embeddings, and inputting them into a policy network combined with the PPO algorithm to execute resource scheduling; simultaneously, contrastive learning is introduced to align cross-domain features, and the server dynamically aggregates parameters and prototypes. This invention achieves real-time adaptive evolution of the network architecture, significantly improving scheduling efficiency and generalization ability in heterogeneous scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence technology, specifically relating to a distributed resource scheduling system and method based on a large model. Background Technology

[0002] Currently, manufacturing and cloud computing industries are facing the challenge of comprehensive digital transformation. The large-scale, multi-variety, and complex environments place extremely high demands on the real-time performance and generalization capabilities of scheduling. However, traditional mathematical programming and heuristic algorithms are mostly applicable to static environments and are difficult to adapt to the core characteristics of dynamic changes, heterogeneous resources, and large-scale concurrency in real-world scenarios, thus failing to meet the needs of standardized promotion and large-scale application.

[0003] In recent years, the rapid development of artificial intelligence technology has provided many new solutions to the scheduling problem. Among them, deep reinforcement learning models the scheduling problem as a Markov decision process and relies on trial-and-error learning mechanisms to discover the optimal scheduling strategy; graph neural networks, with their structural representation capabilities that are naturally adapted to scheduling scenarios, effectively make up for the short-sightedness of traditional scheduling algorithms; federated learning has successfully broken down data silos, supporting multiple clients to achieve collaborative training without sharing sensitive data, thus providing the possibility for privacy-preserving scheduling.

[0004] Despite advancements in existing scheduling technologies, challenges remain when facing large-scale heterogeneous scheduling scenarios, including poor network architecture adaptability, difficulty in converging training of personalized components, and misalignment of cross-domain feature semantics. Specifically, traditional heuristic algorithms exhibit poor generalization ability, while existing graph neural network-based scheduling models often employ fixed architectures, failing to adapt to the unique task dependency chain lengths and resource contention intensities of different clients, leading to short-sighted scheduling decisions or deadlocks. Furthermore, data silos limit the training effectiveness of general-purpose scheduling models. Traditional federated learning, when aggregating heterogeneous subnets, not only suffers from the problem of gradients of sparse personalized components being diluted by global averaging but also faces the challenge of a lack of unified alignment standards for cross-domain feature semantics, making it difficult to achieve efficient collaborative optimization and architectural adaptive evolution of the scheduling system while ensuring data privacy.

[0005] Existing research results still have significant limitations. For example, the distributed scheduling method for stamping resources in vehicle manufacturing under cloud-edge-device collaboration disclosed in patent application CN120952497A aims to solve the problems of low utilization of multi-source heterogeneous computing power, difficulty in ensuring data privacy, and insufficient scheduling robustness under cloud-edge-device architecture. However, this technology relies on a pre-set fixed deep reinforcement learning network architecture and specific physical constraints for parameter fine-tuning. It cannot dynamically search and reconstruct the network structure based on heterogeneous topological features. When facing large-scale complex environmental changes, the model has weak adaptive ability and great difficulty in cross-domain generalization, which is not conducive to the architecture-level performance iteration and general expansion of the scheduling system in multi-source heterogeneous scenarios. Another patent application, CN120128487A, discloses a method and system for multi-agent collaborative communication and real-time resource optimization management based on a large model. This method focuses on solving the problems of poor communication and collaboration, poor environmental awareness, and insufficient resource utilization in multi-agent systems under complex environments. However, its core limitation lies in the high complexity of multi-agent relationships, making it difficult to maintain efficient collaboration as the system scales up. Furthermore, it lacks environmental adaptability and cannot flexibly cope with dynamically changing scheduling scenarios. At the same time, the data source of this technology is singular, lacking effective integration of multimodal data such as audio, video, and sensor data. This results in a limited understanding of complex real-world scenarios by the large model, and the model lacks a dynamic parameter adjustment mechanism. Consequently, it cannot reasonably allocate computing resources when facing tasks of different scales and complexities, seriously affecting the system's operating efficiency and scheduling accuracy.

[0006] In summary, there is an urgent need for a distributed scheduling system that can leverage the powerful logical reasoning capabilities of large models to achieve adaptive architectural evolution and combine sparse aggregation mechanisms to solve the challenges of heterogeneous federated training, so as to meet the scheduling requirements of large-scale heterogeneous scenarios that are efficient, universal, and privacy-preserving. Summary of the Invention

[0007] To address the aforementioned technical problems, this invention provides a distributed resource scheduling system and method based on a large model, which effectively solves the problems of rigid network architecture, insufficient dynamic adaptability, and difficulties in federated collaborative training in complex scenarios involving large scale, multiple heterogeneous sources, and dynamic changes.

[0008] The distributed resource scheduling system based on a large model described in this invention includes a scenario configuration module, a data preprocessing module, a key topology feature definition module, a node update module based on inverse inhibition, an LLM personalized GNN architecture search module, a PPO strategy training module, a feature alignment and prototype generation module, a supernetwork pruning and local dual optimization module, and a three-channel aggregation and architecture iteration module; wherein, The scenario configuration module is used to map user-defined tasks, resources and constraints into a unified graph topology data containing time constraints, capability constraints and competition constraints, and output it to the data preprocessing module. The data preprocessing module is responsible for building a standardized channel from raw source data to graph neural network input, performing data cleaning and feature vectorization on graph topology data, outputting the initial feature matrices of task nodes and resource nodes in the standard heterogeneous graph and heterogeneous graph neural network, and outputting them to the key topology feature definition module and the LLM personalized GNN architecture search module. The key topological feature definition module is used to obtain enhanced resource features and task-resource edge features with enhanced semantic expression through a parameterized two-stage embedding mechanism, and output them to the node update module based on reverse inhibition. The node update module based on reverse inhibition adopts a two-stage message passing mechanism to update resource nodes and task nodes in sequence, and outputs the global embedding state to the PPO policy training module and the resource node embedding output to the feature alignment and prototype generation module. The LLM personalized GNN architecture search module constructs structured prompt words and uses the large language model LLM to generate an architecture mask that adapts to the current heterogeneous scenario based on the task flexibility gradient and resource congestion mode of each client. The mask is then output to the super network pruning and local dual optimization module. The PPO policy training module uses the PPO algorithm to update the policy network, iteratively optimizes the resource scheduling decision policy, and outputs the PPO loss to the feature alignment and prototype generation module. The feature alignment and prototype generation module is used to introduce supervised contrastive learning loss while updating the policy network in the PPO algorithm, so as to achieve semantic alignment of cross-domain features in heterogeneous scenarios, and output the prototype and parameter update to the three-channel aggregation and architecture iteration module. The hypernetwork pruning and local dual optimization module prunes the hypernetwork on the server side, generates an architecture mask set based on LLM, extracts subnet parameters adapted to the client scenario, and connects the output to the node update module based on reverse suppression. The three-channel aggregation and architecture iteration module uses a mask-aware sparse aggregation strategy to aggregate client subnet parameters, ensuring that the entire network has a unified semantic standard for resource congestion and idle states. At the same time, it monitors the performance feedback of the LLM-generated architecture in real time, dynamically adjusts the architecture mask and aggregation strategy, realizes the adaptive evolution of the system architecture, and connects the output to the supernetwork pruning and local dual optimization module, the LLM personalized GNN architecture search module, and the feature alignment and prototype generation module.

[0009] Furthermore, the scenario configuration module maps user-defined tasks, resources, and constraints into a unified graph topology; including: Define a logical unit with time attributes and sequential dependencies as a task node; Define physical carriers with capacity limitations and processing capabilities as resource-type nodes; Complex business rules are structured into three types of topological connections: time constraints that represent the order of task flow, capability constraints that measure the processing permissions of resource classes, and competition constraints that describe the substitution of multiple resource classes or the preemption of multiple tasks.

[0010] Furthermore, the data preprocessing module first performs a data cleaning process, including outlier filtering, missing value imputation, and numerical standardization; then it performs feature vectorization, using a learnable embedding layer to map discrete task types and resource attributes into high-dimensional dense vectors, thereby constructing the initial feature matrix of task nodes and resource nodes in the heterogeneous graph neural network.

[0011] Furthermore, the key topology feature definition module, for any resource class node... Define three metrics: Current task load , which is the total task load allocated to this type of resource and all its child nodes at the current time t; Resource capacity That is, the total parallel processing limit of the physical computing units contained in this resource class; relative load rate This normalizes and quantifies the congestion level of this type of resource; the calculation formula is as follows: = ; Based on a preset set of load thresholds Determine the status category label of the resource node. ,in, Indicates the low load threshold. Indicates a high load threshold; the classification rule is: when hour, That is, the idle state; when hour, That is, the normal state; when hour, This refers to a congested state. Based on the current resource capacity and relative load rate The system will and After normalization, it is simultaneously used as the key state input to construct an enhanced resource feature vector: that is... ,in Indicates normalization; Intensity of mission competition Defined as a task node The out-degree in the current heterogeneous graph structure, i.e., the resource-type nodes that the task can be compatible with. The quantity; the system will be discrete. Learnable embedding layers are mapped to dense vectors Then it is concatenated to the feature vector of the task-resource class edge. The extended task-resource class edge features are based on a specific dimension. ,in As the original characteristics of the task, The edge feature is defined as follows.

[0012] Furthermore, the node update module based on reverse inhibition adopts a two-stage message passing mechanism, first updating resource nodes and then updating task nodes;

[0013] Phase 1: Resource Node Updates It is necessary to aggregate task information from all potential connections in order to assess future load trends; In heterogeneous graphs, resource-type nodes The set of adjacent tasks is Based on flexible dynamic inhibition logic, the attention coefficients of resource classes and tasks are calculated. and the self-attention coefficient of resource nodes The calculation formula is as follows: (1) (2) in, The original matching score is based on feature similarity. This is a preset sensitivity adjustment coefficient used to control the importance of subsequent parameters; it can be set flexibly. Activation function for resource load Output a high value when resources are congested; The function is ; for A rigid task with a value of 1. The suppression term is zero, even for resource nodes. In a state of extreme congestion ( (Highly), attention weight will not be reduced, ensuring that rigid tasks can acquire resources; for flexible tasks that are not 1, if resources are congested, both will be suppressed, and tasks with more choices will be subject to more severe weight penalties. The calculated attention coefficient With self-attention coefficient After softmax normalization, we get and Combined with linear transformation and Resource category The embedded update formula is as follows: (3) Phase Two: Task Node Update It is necessary not only to aggregate predecessor / successor information, but also to aggregate the status of optional resource classes in order to perceive where there is space; definition for A collection of adjacent resource classes. for The set of predecessor / successor tasks, Calculate using formula (1), yes Attention coefficient for predecessor / successor tasks, These are the self-attention coefficients of the task nodes; after normalizing the three parameters, the task embedding... The update formula is as follows: (4) in, For precursor / successor missions; These are the predecessor / successor feature vectors; During the intermediate layer aggregation, the embeddings of the Z heads are concatenated, and during the final layer aggregation, the embeddings of the Z heads are averaged. After alternating updates through L layers of dual attention layers, the final feature embedding is obtained. and Where Z represents the preset number of attention heads, and L represents the preset number of graph neural network update layers; for all task embeddings and resource embedding Perform global aggregation to generate state embeddings of heterogeneous graphs. POOL represents the global pooling operation, and its computational paradigm is as follows: (5)

[0014] The vector This will be used as input to the local policy network to calculate the action probability distribution; the resource nodes in the constructed heterogeneous graph will be used as input. The unified mapping is to node u, whose final embedding vector Recorded as The embedding vector and their corresponding status labels This will serve as input for subsequent comparative learning, used to compute local and global prototypes. Losses between And compute local feature prototypes to participate in federated aggregation.

[0015] Furthermore, the LLM personalized GNN architecture search module abstracts the scheduling scenario into triples through the system. Input LLM; where For client scale, For dataset features, Domain knowledge for quantifying the topological properties of graphs; LLM is based on a pre-defined GNN search space. (Including common components such as layer count and attention mechanism) it performs inference, transforming physical constraints into architectural configurations; to drive adaptive architectural evolution, the module introduces historical performance P as a feedback signal; in each iteration, the system collects the local scheduling performance of all clients and the global model performance, and calculates the system's total score. The calculation formula is as follows: (6) in, This represents the local score of the i-th client-side personalized subnet in the previous round. The weighted score representing the global hypernetwork; In the Tth optimization, LLM integrates the scene parameters input and the preset search space. Current search strategy And historical performance feedback P, through logical reasoning, generate a new set of optimal architecture masks A.

[0016] Furthermore, the PPO strategy training module utilizes personalized parameters on the client side. Initialize the network and embed vectors in full-graph state. Given the input, output the probability distribution of the scheduling action. And sample execution; Construct a reward function oriented towards minimizing completion time. The calculation formula is: , This represents the maximum completion time of all scheduled tasks under a given system state; subsequently, the system will output the state transition tuple. Store in the current strategy trajectory buffer; This represents the current system state at time step t; Indicates the state Next action Subsequently, immediate rewards are provided based on feedback from heterogeneous resource environments; This indicates that the system is performing an action. Then, as time progresses, it transitions to the next state; This indicates that under the old policy (i.e., the policy network before this parameter update), given a state... Take action at the time The log probability; then calculate the advantage function. ,in The resource node state label vector is defined as the state label vector of all resource nodes in the heterogeneous graph at the current time step t. The corresponding set of discrete physical state labels.

[0017] Furthermore, the feature alignment and prototype generation module, for the current batch Resource node u in the data is determined based on its status label. The global prototype library distributed from the server Searching for positive sample prototypes Then, calculate the contrast loss. : (7) in, For resource-type node characteristics, This is the set of all resource-type nodes within the current batch. For temperature coefficient, It is a global model that iterates through all prototypes; the joint loss is calculated to update the parameters. : (8) in, For dynamic weights, during the warm-up phase at the beginning of training ( ),set up The value is 0, and the local prototype is optimized based solely on PPO and uploaded to wait for the global feature distribution to stabilize; after the warm-up period ( ),recover With the initial settings in place, feature alignment is now officially enabled. After the parameters are updated, in order to provide feedback on the local feature distribution, the client uses the updated network to recalculate the local feature prototypes for various states. For the k-th state, k∈{0,1,2}, Calculated as a weighted average of the feature vectors of all nodes in this category: (9) in, Status label in the current batch A collection of resource nodes; This represents the upper limit of the physical capacity of this resource node.

[0018] Furthermore, the hypernetwork pruning and local dual optimization module is responsible for the generation and distribution of personalized subnets and efficient local training on the client side. It prunes the hypernetwork on the server side and uses LLM to generate an architecture mask set. Acting on supernetworks Subnet parameters are extracted by multiplying the global model weights with the mask; Personalized strategy network parameters for client i Represented as: (10) The server will generate a personalized policy network architecture. and its initialization parameters Along with the latest global feature prototype library It is then precisely distributed to the corresponding i-th client; After receiving the data, the client directly uses the parameters. Initialize the local scheduling policy network and will Load the local comparative learning module.

[0019] Furthermore, the three-channel aggregation and architecture iteration module includes: Channel 1, the parameter aggregation channel, employs a mask-aware sparse aggregation strategy; the aggregation formula is as follows: (11) in, It is the schema mask of parameter j in client i. This represents the total number of task nodes that actually completed the scheduling. To prevent smooth terms with a denominator of 0, For client i, the local gradient update amount regarding parameter j is uploaded. It is the global learning rate; Channel 2, also known as the prototype aggregation channel, involves the system synchronously collecting local feature prototypes uploaded by each client. (i.e., the statistical mean of node features under various states) and sample weights The server calculates the statistical mean for this round. And introduce momentum coefficient Combined with the previous round of global prototype Update and generate a new round of global feature prototype library The global prototype update for a certain type of state is as follows: (12) Channel 3, namely the performance feedback channel, is used to drive the continuous evolution of the LLM architecture; The system summarizes the total score according to formula (6). Server based on Trend command LLM: If performance improves, core components are retained; if performance stagnates or declines, random exploration is triggered. LLM monitors the sub-scores. Once a significant lag in the score of a specific client is identified, it is determined that the subnet is incompatible and the architecture mask is regenerated accordingly. When the performance indicators tend to be stable or meet the standards, the iteration stops and the final scheduling model is output.

[0020] This invention also provides a distributed resource scheduling method based on a large model. Based on the above system, distributed resource scheduling is implemented, including the following steps: Step 1: The scene configuration module collects the computing power, load status, resource requirements and timing constraints of heterogeneous resources in real time to build a digital twin environment; then the data preprocessing module cleans and vectorizes the collected data, maps the task flow and heterogeneous resources into a standardized heterogeneous graph, and outputs the initial feature matrix. Step 2: Using the LLM personalized GNN architecture search module, construct structured prompt words that include client resource congestion patterns, task flexibility gradients, and historical performance feedback. Input these prompt words into the large language model to generate a personalized GNN architecture mask that adapts to the current heterogeneous scenario. Step 3: Through the hypernetwork pruning and local dual optimization module, the server prunes the global hypernetwork based on the architecture mask, extracts the subnet parameters adapted to the corresponding client, and distributes them to each client along with the global feature prototype library. Step 4: The client uses the key topology feature definition module to deeply integrate the relative load rate and task contention intensity into enhanced resource features and task-resource edge features; then, through the node update module based on reverse inhibition, the node is updated using a two-stage message passing mechanism, and a global state embedding vector is generated using global pooling operations. Step 5: The client uses the PPO policy training module to embed the global state into the vector input policy network to generate a resource scheduling action sequence, interacts with the environment to obtain a reward oriented towards minimizing the completion time, and executes the PPO algorithm optimization; at the same time, it introduces supervised contrastive learning loss through the feature alignment and prototype generation module to achieve cross-domain feature semantic alignment, and outputs the local parameter update amount and local feature prototype. Step 6: Through the three-channel aggregation and architecture iteration module, the server receives and performs mask-aware sparse parameter aggregation and prototype clustering to build an updated global supernetwork and a unified global feature prototype library. Step 7: Based on the performance of the local and global models on the validation set, the system calculates and accumulates the total score of this round of system performance. The three-channel aggregation and architecture iteration module monitors the score in real time. If the model performance converges, the final scheduling strategy is output; if it does not converge, an architecture refactoring signal and performance feedback data are generated, and the system returns to trigger the large language model in Step 2 for the next round of architecture iteration optimization.

[0021] The beneficial effects of this invention are as follows: 1) This invention utilizes the logical reasoning capabilities of LLM to dynamically generate personalized architecture masks, replacing the traditional fixed network architecture; the system can automatically adapt the optimal number of network layers and interaction mechanisms according to the task dependency depth and resource contention intensity of each client, significantly improving the model's generalization ability in multi-source heterogeneous scenarios. 2) To address the sparsity of heterogeneous subnets in federated learning, a mask-aware sparse aggregation strategy is proposed. By eliminating the interference of inactive nodes during aggregation, the gradient dilution problem faced by rare personalized components in global averaging is effectively solved, ensuring the accurate convergence of the model's micro-features. 3) A message passing mechanism based on reverse inhibition was constructed. When resources are congested, the weight of highly flexible tasks is automatically reduced, forcing them to flow to idle resources, while ensuring the connection of rigid tasks. This eliminates scheduling deadlock under high load from a physical mechanism perspective. 4) By adopting a server-side pruning and client-side direct initialization mode, zero-computational-cost client deployment is achieved; in conjunction with the introduction of a dynamic weighted comparative learning mechanism, the semantic standards for congestion / idle states of different clients are unified, achieving efficient collaborative optimization while protecting privacy. Attached Figure Description

[0022] Figure 1 It is a global iterative process for a distributed scheduling system based on LLM; Figure 2 This refers to the client-side local training and data upload process; Figure 3 This is the overall system architecture diagram; Figure 4 This is a schematic diagram of the key topological feature definition module structure; Figure 5 This is a schematic diagram of the module structure; Figure 6 It is a flowchart of the method. Detailed Implementation

[0023] To make the content of this invention easier to understand, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings.

[0024] like Figure 3 and Figure 5 As shown, the distributed resource scheduling system based on a large model according to the present invention includes a scenario configuration module, a data preprocessing module, a key topology feature definition module, a node update module based on inverse inhibition, an LLM personalized GNN architecture search module, a PPO strategy training module, a feature alignment and prototype generation module, a supernetwork pruning and local dual optimization module, and a three-channel aggregation and architecture iteration module. The specific contents of each module are as follows:

[0025] (1) Scene configuration module; This module aims to abstract arbitrarily complex scheduling scenarios into standardized heterogeneous graphs. By mapping user-defined tasks, resources, and constraints to a unified graph topology, it achieves rapid adaptation and generalized representation for various heterogeneous scheduling scenarios. Specifically, the system defines logical units with temporal attributes and sequential dependencies as task nodes. It defines physical carriers with capacity limitations and processing capabilities as resource nodes. Simultaneously, complex business rules are structured into three types of topological connections: temporal constraints representing the task flow order, capability constraints measuring resource class processing permissions, and competition constraints describing the substitution of multiple resource classes or the preemption relationship between multiple tasks.

[0026] (2) Data preprocessing module; This module is responsible for building a standardized channel from raw multi-source data to the input of the graph neural network. The system supports access to heterogeneous data from multiple sources such as cloud computing, model training, and power dispatching. First, it performs a data cleaning process, including outlier filtering, missing value imputation, and numerical standardization. Then, it performs feature vectorization, using a learnable embedding layer to map discrete task types and resource attributes into high-dimensional dense vectors, thereby constructing the initial feature matrices of task nodes and resource nodes in the heterogeneous graph neural network.

[0027] (3) Key topological feature definition module; Unlike traditional systems, this system no longer treats individual resources as nodes, but instead constructs a heterogeneous graph centered around resource classes. ,in Represents a set of task nodes. Represents a collection of resource nodes. Edges representing the order of tasks. This represents the allocation relationship between tasks and resource classes. The diagram shows this module using a parameterized two-stage embedding mechanism to represent the relative load rate. Competition for strength with the mission Deep integration. Specific structure. Figure 4 As shown below, Figure 4 In the diagram, m1, m2, and m3 represent resource nodes, and o11-o41 represent four tasks in the first task flow. Tasks have a specific order of completion. For example, o11 must be completed before o21, otherwise o21 cannot be scheduled. The same applies to the following three task flows.

[0028] This system introduces two key topological feature indicators. and Used to enhance the semantic representation of features. For any resource class node. This invention defines three metrics, the first being the current task load. The first is defined as the total task load allocated to this resource class and all its child nodes at the current time t. The second is the resource class capacity. Defined as the upper limit of the total parallel processing of the physical computing units contained in this resource class, this is a key static characteristic that distinguishes large-scale clusters from edge nodes. The third is the relative load rate. The calculation formula is: = This metric normalizes and quantifies the congestion level of this type of resource. Subsequently, it is based on a preset set of load thresholds. Determine the state category label of the node. The classification rule is when At that time, (Idle state); when At that time, (Normal state); when At that time, (Congestion status). Based on current resource capacity. relative load rate The system will and After normalization, it is simultaneously used as the key state input to construct an enhanced resource feature vector: that is... .

[0029] Regarding the intensity of task contention ( Defined as a task node The out-degree in the current heterogeneous graph structure, i.e., the resource classes that this task can be compatible with. The number of [items / values]. Unlike simple binary classification, this invention uses it as a continuous gradient index to measure scheduling flexibility; a larger value means more alternative paths for the task and a higher tolerance for congestion on a single resource. The system will discrete [items / values]. Learnable embedding layers are mapped to dense vectors Then it is concatenated to the feature vector of the task-resource class edge. For a specific dimension (such as the last dimension), the expanded task-resource class edge features are: ,in As the original characteristics of the task, The edge feature is defined as follows.

[0030] (4) Node update module based on reverse suppression; This system adopts a two-stage message passing mechanism, first updating resource nodes and then updating task nodes.

[0031] Phase 1 involves updating resource nodes. Task information from all potential connections needs to be aggregated to assess future load trends. In the middle, resource category The set of adjacent tasks is This invention designs a flexible, dynamic inhibition logic to compute attention coefficients for resource classes and tasks. The calculation formula is as follows: (1) (2) in The original matching score is based on feature similarity. This is the activation function for resource load; it outputs a high value when resources are congested. For a flexible, monotonically increasing function. A rigid task with a value of 1. The suppression term is zero. Even with resources... In a state of extreme congestion ( (Highly flexible tasks) will not have their attention weights reduced, ensuring that rigid tasks can acquire resources. For flexible tasks with values ​​of 1 or less, both will be suppressed if resources are congested, with tasks having more options receiving a more severe weight penalty. This differentiation mechanism encourages highly flexible tasks to give way and flow to other idle resources, while moderately flexible tasks make their choices after weighing the options, thus achieving global load balancing. The calculated attention coefficients will be used to... With self-attention coefficient After softmax normalization, we get and Combined with linear transformation and Resource category The embedded update formula is as follows: (3) Phase two involves updating task nodes. It needs to aggregate not only predecessor / successor information, but also the state of optional resource classes to detect where there are available spaces. (Definition) for A collection of adjacent resource classes. for The set of predecessor / successor tasks, Calculate using formula (1), yes Attention coefficient for predecessor / successor tasks, These are the self-attention coefficients. After normalizing the three parameters, the task embedding... The update formula is as follows: (4).

[0032] To enhance feature representation and robustness, this invention introduces a multi-head attention mechanism, using Z independent attention heads to learn entity relationships across different dimensions in parallel. The aggregation method used for task node updates in this invention is the standard multi-head attention aggregation mechanism in graph attention networks. To stabilize the multi-head attention learning process, different feature aggregation strategies are employed at different layers: Intermediate layer aggregation uses feature concatenation, meaning that in layers 1 to L-1 of the graph neural network, the feature vectors output by the Z independently computed attention heads are directly concatenated along the feature dimension to preserve the rich diversity of features in each subspace; the final layer aggregation uses feature averaging, meaning that in the last layer (layer L) of the graph neural network, to output the final stable node representation, concatenation is no longer performed, but rather the feature vectors output by the Z attention heads are averaged element-wise, and then passed through an activation function for output. After alternating updates through L layers of dual attention layers, the final feature embedding is obtained. and Embedded for all tasks and resource embedding Perform global aggregation to generate state embeddings of heterogeneous graphs. ,generate The specific pooling function POOL(•) is a configurable component (dynamically determined by the subsequent LLM architecture search module). Its general computational paradigm is as follows: (5); The vector This will be used as input to the local policy network to calculate the action probability distribution. For ease of subsequent mathematical description, we will use the resource nodes in the constructed heterogeneous graph. The unified mapping is to node u, whose final embedding vector Recorded as The embedding vector and their corresponding status labels This will serve as input for subsequent comparative learning, used to compute local and global prototypes. Losses between And compute local feature prototypes to participate in federated aggregation.

[0033] (5) LLM personalized GNN architecture search module; This module aims to leverage the logical reasoning and generalization capabilities of Large Language Models (LLMs) to automatically match the optimal Graph Neural Network (GNN) architecture for clients in different heterogeneous environments. The module guides the LLM to generate an adaptive architecture mask based on the unique task flexibility gradient and resource congestion patterns of each client by constructing structured cue words.

[0034] like Figure 1 As shown, the system abstracts the scheduling scenario into triples. Enter LLM. (The text then repeats the LLM option.) For client scale, For dataset features, This provides domain knowledge for quantifying the topological properties of graphs. LLM is based on a pre-defined GNN search space. (Including common components such as layer count and attention mechanism) it performs inference, transforming physical constraints into architectural configurations. To drive adaptive architectural evolution, the module introduces historical performance P as a feedback signal. In each iteration, the system collects all client-side local scheduling performance and global model performance, calculating the system's overall score. The calculation formula is as follows: (6) in, This represents the local score of the i-th client-side personalized subnet in the previous round. This represents the weighted score of the global hypernetwork. This formula ensures that architecture search considers both individual differences and global generalization performance.

[0035] In the Tth optimization, LLM integrates the scene parameters input and the preset search space. Current search strategy And historical performance feedback P, through logical reasoning, generate a new set of optimal architecture masks A.

[0036] (6) PPO strategy training module; Clients utilize personalized parameters Initialize the network and embed vectors in full-graph state. Given the input, output the probability distribution of the scheduling action. And sampling and execution. To adapt to scheduling scenarios, this invention constructs a reward function oriented towards "minimizing completion time". This reward quantifies the contribution of the current action to the overall efficiency of the system, and is calculated using the following formula: Subsequently, the system will output the state transition tuple. Store the current policy trajectory in the buffer, and then calculate the advantage function. ,in The resource node state label vector is defined as the state label vector of all resource nodes in the heterogeneous graph at the current time step t. The corresponding set of discrete physical state labels.

[0037] (7) Feature alignment and prototype generation module; This module, while utilizing the PPO algorithm to update the policy network, introduces supervised contrastive learning loss to align heterogeneous features. For the current batch Resource node u in the data is determined based on its status label. The global prototype library distributed from the server Searching for positive sample prototypes Then, we calculate the contrast loss. : (7) in For resource-type node characteristics, This is the set of all resource-type nodes within the current batch. For temperature coefficient, It's a global model that iterates through all prototypes. Finally, the joint loss is calculated to update the parameters. : (8) in For dynamic weights, during the warm-up phase at the beginning of training ( ),set up Only based on PPO optimization and uploaded to the local prototype, waiting for the global feature distribution to stabilize; after the warm-up period ( ),recover The initial settings are used to officially enable feature alignment.

[0038] After the parameters are updated, in order to provide feedback on the local feature distribution, the client uses the updated network to recalculate the local feature prototypes for various states. For the k-th state (k∈{0,1,2}). Calculated as a weighted average of the feature vectors of all nodes in this category: (9) in Status label in the current batch A collection of resource nodes; This represents the upper limit of the physical capacity of this resource node.

[0039] (8) Hypernetwork pruning and local dual optimization module; This module is primarily responsible for the generation and distribution of personalized subnets, as well as efficient local training on the client side. It achieves this through server-side hypernetwork pruning and utilizes LLM to generate architecture mask sets. Acting on supernetworks The subnet parameters are extracted by multiplying the weights by the mask. Client i's personalized policy network parameters. Represented as: (10) The server will generate a personalized policy network architecture. and its initialization parameters Along with the latest global feature prototype library The data is then precisely distributed to the corresponding i-th client. After receiving the data, the client directly uses the parameters... Initialize the local scheduling policy network and will Load the local contrastive learning module. Training can begin directly without local pruning, significantly reducing computational overhead at the edge.

[0040] The client employs a dual mechanism of PPO algorithm and contrastive learning alignment to train the local model. On one hand, the client utilizes the PPO algorithm to maximize scheduling rewards to optimize policy network parameters; on the other hand, it utilizes a global prototype... As semantic anchors, the contrastive loss function forces the local feature space to align with a global standard. By minimizing the joint loss function, the client only packages and uploads the sparse parameters that are activated and generate gradients in its subnet. arrive ( ), and calculate the parameter update amount. .

[0041] (9) Three-channel aggregation and architecture iteration module; This module aims to break down data silos and drive continuous architectural evolution, aggregating server-side execution parameters, prototypes, and performance across three channels.

[0042] Channel 1 is the parameter aggregation channel. To address the gradient dilution problem faced by sparsely activated personalized components in heterogeneous subnet scenarios, this invention abandons the traditional global weighted averaging and adopts a mask-aware sparse aggregation strategy. The global update amount of any parameter in the supernet is determined only by a subset of clients that activate that parameter, rather than by averaging across all clients. The aggregation formula is as follows: (11) in, It is the schema mask of parameter j in client i. This represents the total number of task nodes that actually completed the scheduling. To prevent smooth terms with a denominator of 0, For client i, the local gradient update amount regarding parameter j is uploaded. It is the global learning rate.

[0043] Channel two is the prototype aggregation channel, where the system synchronously collects local feature prototypes uploaded by each client. (i.e., the statistical mean of node features under various states) and sample weights The server calculates the statistical mean for this round. And introduce momentum coefficient Combined with the previous round of global prototype Update and generate a new round of global feature prototype library This channel ensures a unified semantic standard across the entire network for states such as congestion and idleness. The global prototype update for a certain type of state is as follows: (12) Channel three is the performance feedback channel, used to drive the continuous evolution of the LLM architecture. The system's total score is obtained by summing the results according to formula (6). The server is based on The LLM trend command retains core components if performance improves, and triggers random exploration if performance stagnates or declines. LLM monitors sub-scores; if a client's score is significantly lagging, its subnet is deemed incompatible, and the architecture mask is regenerated accordingly. When performance metrics stabilize or meet targets, iteration stops, and the final scheduling model is output.

[0044] The following example, using heterogeneous computing power scheduling for a large language model training platform, illustrates the method described in this invention.

[0045] like Figure 2 As shown, taking large-scale distributed model training as an example, in this scenario, tasks are operators in the computation graph, and the resource cluster consists of heterogeneous computing nodes. One client in this scenario has 100 concurrent training task streams, each with three sequential tasks: data cleaning, etc. →Model fine-tuning →Model Evaluation The resource pool contains three types of resources: The (general pool) consists of 1000 CPU servers; The (high-performance pool) consists of 100 H800 servers; The evaluation pool consists of 500 RTX 4090 servers. The available resource structure for each task is as follows: :only Available; belongs to a dedicated task. :only It is usable because the model has a large number of parameters and is a rigid task. ; : and Both are available; they are flexible tasks. ; It represents the intensity of resource contention, indicating the current moment on a specific computing cluster node. The number of queued tasks displays the current number of tasks queuing for a resource. It represents the intensity of the task contention, indicating the current moment in which the task... The number of compatible hardware types. For example, as shown above. This indicates that the task is a rigid task and must use H800; while This indicates that the task is an elastic task, which can use either H800 or 4090; when multiple tasks request resources via H800, at this time... The value is relatively high. Based on the reverse inhibition mechanism of this invention, The lower the value (rigidity), the less attention weight decays during congestion, and the higher the probability of allocating that resource; while The higher the value (elasticity), the stronger the inhibition it receives, and the lower the probability of allocating that resource.

[0046] After scene configuration and preprocessing, the system constructs a large language model for prompt input based on client scale, domain knowledge, and historical performance. The prompt word setting objective is to minimize completion time. The LLM is based on a global search space. Optimal inference architecture: requires the input layer to be adapted to the feature-engineered architecture. and The intermediate layer configures a multi-layer network to capture long-range dependencies and enhances the inverse inhibition weights for resilient tasks. The LLM encodes the generated configuration as a mask. The server uses this information to prune the global hypernetwork. The system extracts and distributes personalized parameters, allowing the client to directly initialize the local GNN model.

[0047] Each client first performs node updates, concatenating the normalized relative load rate into the feature vector. Then perform task node updates, in the task-resource edge features. splicing task competition strength embedding Then, a global state vector is generated. .Will Input policy network output action distribution And execute the scheduling. Environmental feedback completion time differential reward. Update parameters using the PPO algorithm And calculate local feature prototypes Simultaneously, a contrastive learning loss is introduced to force local congestion features to align with the global prototype. This process drives the model to learn high-level policies, thereby incorporating high-level congestion features. Flexible tasks Migrate to idle , will be scarce Leave to rigid tasks .

[0048] After reaching the preset number of training steps, the client uploads the parameter update amount. Local prototype and performance The server performs mask-based sparse weighted aggregation and prototype momentum updates based on the cluster size. If global performance... If the target is not met, the LLM architecture will be adjusted; if the target is met, the final model will be output.

[0049] This invention also provides a distributed resource scheduling method based on a large model, the flowchart of which is shown below. Figure 6 As shown, based on the above system, distributed resource scheduling is implemented, including the following steps: Step 1: Construct a digital twin environment, collect status data in real time, and map task flows and heterogeneous resources into a standardized heterogeneous graph. Step 2: Construct a cue vector containing client information and historical performance, and input it into a large language model to generate a GNN architecture mask adapted to the current scenario; Step 3: The server uses a mask to prune the global hypernetwork, extracts the personalized strategy network architecture and parameters, and distributes them to the client along with the global feature prototype library. Step 4: Enhance client-side performance by calculating the relative load rate. Competition for strength with the mission Embedding node and edge features, then performing graph message passing and global pooling to generate a global state embedding. ; Step 5: Set the global state Input the policy network to generate a sequence of resource scheduling actions and apply them to heterogeneous resource environments. Obtain environmental feedback rewards and optimize the policy network based on the rewards. At the same time, extract the global prototype to calculate the comparative loss to align features and update local parameters synchronously by minimizing the joint loss. Step 6: Calculate and sum the local and global scores on the validation set to obtain the cumulative total score for this round of the system. ; Step 7: Aggregate server execution parameters, prototype, and performance across three channels; if If convergence is achieved, the final model is output and the scheduling action continues. If convergence is not achieved, the score is fed back to step 2, triggering the LLM to regenerate the architecture for the next generation of iterative optimization.

[0050] The above description is merely a preferred embodiment of the present invention and is not intended to further limit the present invention. All equivalent changes made based on the description and drawings of the present invention are within the protection scope of the present invention.

Claims

1. A distributed resource scheduling system based on a large model, characterized in that, It includes a scene configuration module, a data preprocessing module, a key topology feature definition module, a node update module based on inverse inhibition, an LLM personalized GNN architecture search module, a PPO policy training module, a feature alignment and prototype generation module, a supernetwork pruning and local dual optimization module, and a three-channel aggregation and architecture iteration module; among them, The scenario configuration module is used to map user-defined tasks, resources and constraints into a unified graph topology data containing time constraints, capability constraints and competition constraints, and output it to the data preprocessing module. The data preprocessing module is responsible for building a standardized channel from raw source data to graph neural network input, performing data cleaning and feature vectorization on graph topology data, outputting the initial feature matrices of task nodes and resource nodes in the standard heterogeneous graph and heterogeneous graph neural network, and outputting them to the key topology feature definition module and the LLM personalized GNN architecture search module. The key topological feature definition module is used to obtain enhanced resource features and task-resource edge features with enhanced semantic expression through a parameterized two-stage embedding mechanism, and output them to the node update module based on reverse inhibition. The node update module based on reverse inhibition adopts a two-stage message passing mechanism to update resource nodes and task nodes in sequence, and outputs the global embedding state to the PPO policy training module and the resource node embedding output to the feature alignment and prototype generation module. The LLM personalized GNN architecture search module constructs structured prompt words and uses the large language model LLM to generate an architecture mask that adapts to the current heterogeneous scenario based on the task flexibility gradient and resource congestion mode of each client. The mask is then output to the super network pruning and local dual optimization module. The PPO policy training module uses the PPO algorithm to update the policy network, iteratively optimizes the resource scheduling decision policy, and outputs the PPO loss to the feature alignment and prototype generation module. The feature alignment and prototype generation module is used to introduce supervised contrastive learning loss while updating the policy network in the PPO algorithm, so as to achieve semantic alignment of cross-domain features in heterogeneous scenarios, and output the prototype and parameter update to the three-channel aggregation and architecture iteration module. The hypernetwork pruning and local dual optimization module prunes the hypernetwork on the server side, generates an architecture mask set based on LLM, extracts subnet parameters adapted to the client scenario, and connects the output to the node update module based on reverse suppression. The three-channel aggregation and architecture iteration module uses a mask-aware sparse aggregation strategy to aggregate client subnet parameters, ensuring that the entire network has a unified semantic standard for resource congestion and idle states. At the same time, it monitors the performance feedback of the LLM-generated architecture in real time, dynamically adjusts the architecture mask and aggregation strategy, realizes the adaptive evolution of the system architecture, and connects the output to the supernetwork pruning and local dual optimization module, the LLM personalized GNN architecture search module, and the feature alignment and prototype generation module.

2. The distributed resource scheduling system based on a large model according to claim 1, characterized in that, The scenario configuration module maps user-defined tasks, resources, and constraints into a unified graph topology; including: Define a logical unit with time attributes and sequential dependencies as a task node; Define physical carriers with capacity limitations and processing capabilities as resource-type nodes; Complex business rules are structured into three types of topological connections: time constraints that represent the order of task flow, capability constraints that measure the processing permissions of resource classes, and competition constraints that describe the substitution of multiple resource classes or the preemption of multiple tasks.

3. The distributed resource scheduling system based on a large model according to claim 2, characterized in that, The key topological feature definition module, for any resource class node Define three metrics: Current task load , which is the total task load allocated to this type of resource and all its child nodes at the current time t; Resource capacity That is, the total parallel processing limit of the physical computing units contained in this resource class; relative load rate This normalizes and quantifies the congestion level of this type of resource; the calculation formula is as follows: = ; Based on a preset set of load thresholds Determine the status category label of the resource node. ,in, Indicates the low load threshold. Indicates a high load threshold; the classification rule is: when hour, That is, the idle state; when hour, That is, the normal state; when hour, This refers to a congested state. Based on the current resource capacity and relative load rate The system will and After normalization, it is simultaneously used as the key state input to construct an enhanced resource feature vector: that is... ,in Indicates normalization; Intensity of mission competition Defined as a task node The out-degree in the current heterogeneous graph structure, i.e., the resource-type nodes that the task can be compatible with. The quantity; the system will be discrete. Learnable embedding layers are mapped to dense vectors Then it is concatenated to the feature vector of the task-resource class edge. The extended task-resource class edge features are based on a specific dimension. ,in As the original characteristics of the task, The edge feature is defined as follows.

4. A distributed resource scheduling system based on a large model according to claim 3, characterized in that, The node update module based on reverse inhibition adopts a two-stage message passing mechanism, first updating resource nodes and then updating task nodes. Phase 1: Resource node updates: In heterogeneous graphs, resource-type nodes The set of adjacent tasks is Based on flexible dynamic inhibition logic, the attention coefficients of resource classes and tasks are calculated. and the self-attention coefficient of resource nodes The calculation formula is as follows: (1) (2) in, The original matching score is based on feature similarity; Activation function for resource load Output a high value when resources are congested; The function is ; The calculated attention coefficient With self-attention coefficient After softmax normalization, we get and Combined with learnable linear transformation matrices and Resource category The embedded update formula is as follows: (3) Phase Two, Task Node Update: definition for A collection of adjacent resource classes. for The set of predecessor / successor tasks, Calculate using formula (1), yes Attention coefficient for predecessor / successor tasks, These are the self-attention coefficients of the task nodes; after normalizing the three parameters using softmax, we obtain... and Task embedding The update formula is as follows: (4) in, For precursor / successor missions; These are the predecessor / successor feature vectors. During the intermediate layer aggregation, the embeddings of the Z heads are concatenated, and during the final layer aggregation, the embeddings of the Z heads are averaged. After alternating updates through L layers of dual attention layers, the final feature embedding is obtained. and Where Z represents the preset number of attention heads, and L represents the preset number of graph neural network update layers; for all task embeddings and resource embedding Perform global aggregation to generate state embeddings of heterogeneous graphs. POOL represents the global pooling operation, and its computational paradigm is as follows: (5)。 5. A distributed resource scheduling system based on a large model according to claim 1, characterized in that, The LLM personalized GNN architecture search module abstracts the scheduling scenario into triples through the system. Input LLM; where For client scale, For dataset features, Domain knowledge for quantifying the topological properties of graphs; LLM is based on a pre-defined GNN search space. Inference is performed to transform physical constraints into architectural configurations; historical performance P is introduced as a feedback signal; in each iteration, the system collects the local scheduling performance of all clients and the global model performance, and calculates the total system score. The calculation formula is as follows: (6) in, This represents the local score of the i-th client-side personalized subnet in the previous round. The weighted score representing the global hypernetwork; In the Tth optimization, LLM integrates the scene parameters input and the preset search space. Current search strategy And historical performance feedback P, through logical reasoning, generate a new set of optimal architecture masks A.

6. A distributed resource scheduling system based on a large model according to claim 1, characterized in that, The PPO strategy training module, based on personalized parameters from the client, Initialize the network and embed vectors in full-graph state. Given the input, output the probability distribution of the scheduling action. And sample execution; Construct a reward function oriented towards minimizing completion time. The calculation formula is: , This represents the maximum completion time of all scheduled tasks under a given system state; Subsequently, the system will use the state transition tuple. Store in the current strategy trajectory buffer; This represents the current system state at time step t; Indicates the state Next action Subsequently, immediate rewards are provided based on feedback from heterogeneous resource environments; This indicates that the system is performing an action. Then, as time progresses, it transitions to the next state; This indicates that under the old policy (i.e., the policy network before this parameter update), given a state... Take action at the time The log probability; then calculate the advantage function. ,in The resource node state label vector is defined as the state label vector of all resource nodes in the heterogeneous graph at the current time step t. The corresponding set of discrete physical state labels.

7. A distributed resource scheduling system based on a large model according to claim 1, characterized in that, The feature alignment and prototype generation module, for the current batch Resource node u in the data is determined based on its status label. The global prototype library distributed from the server Searching for positive sample prototypes Then, calculate the contrast loss. : (7) in, For resource-type node characteristics, This is the set of all resource-type nodes within the current batch. For temperature coefficient, It is a global model that iterates through all prototypes; the joint loss is calculated to update the parameters. : (8) in, For dynamic weights, during the warm-up phase at the beginning of training ( ),set up The value is 0, and the local prototype is optimized based solely on PPO and uploaded to wait for the global feature distribution to stabilize; after the warm-up period ( ),recover With the initial settings in place, feature alignment is now officially enabled. After the parameters are updated, the client uses the updated network to recalculate the local feature prototypes for various states. For the k-th state, k∈{0,1,2}, Calculated as a weighted average of the feature vectors of all nodes in this category: (9) in, Status label in the current batch A collection of resource nodes; This represents the upper limit of the physical capacity of this resource node.

8. A distributed resource scheduling system based on a large model according to claim 1, characterized in that, The hypernetwork pruning and local dual optimization module prunes the hypernetwork on the server side and uses LLM to generate an architecture mask set. Acting on supernetworks The subnet parameters are extracted by multiplying the global supernet weights with the mask; Personalized strategy network parameters for client i Represented as: (10) The server will generate a personalized policy network architecture. and its initialization parameters Along with the latest global feature prototype library It is then precisely distributed to the corresponding i-th client; After receiving the data, the client directly uses the parameters. Initialize the local scheduling policy network and will Load the local comparative learning module.

9. A distributed resource scheduling system based on a large model according to claim 5, characterized in that, The three-channel aggregation and architecture iteration module includes: Channel 1, the parameter aggregation channel, employs a mask-aware sparse aggregation strategy; the aggregation formula is as follows: (11) in, It is the schema mask of parameter j in client i. This represents the total number of task nodes that actually completed the scheduling. To prevent smooth terms with a denominator of 0, For client i, the local gradient update amount regarding parameter j is uploaded. It is the global learning rate; Channel 2, also known as the prototype aggregation channel, involves the system synchronously collecting local feature prototypes uploaded by each client. and sample weights The server calculates the statistical mean for this round. And introduce momentum coefficient Combined with the previous round of global prototype Update and generate a new round of global feature prototype library The global prototype update for a certain type of state is as follows: (12) Channel 3, namely the performance feedback channel, is used to drive the continuous evolution of the LLM architecture; The system summarizes the total score according to formula (6). Server based on Trend command LLM: If performance improves, core components are retained; if performance stagnates or declines, random exploration is triggered. LLM monitors the sub-scores. Once a significant lag in the score of a specific client is identified, it is determined that the subnet is incompatible and the architecture mask is regenerated accordingly. When the performance indicators tend to be stable or meet the standards, the iteration stops and the final scheduling model is output.

10. A distributed resource scheduling method based on a large model, characterized in that, Based on the system according to any one of claims 1-9, distributed resource scheduling is implemented, comprising the following steps: Step 1: The scene configuration module collects the computing power, load status, resource requirements and timing constraints of heterogeneous resources in real time to build a digital twin environment; then the data preprocessing module cleans and vectorizes the collected data, maps the task flow and heterogeneous resources into a standardized heterogeneous graph, and outputs the initial feature matrix. Step 2: Using the LLM personalized GNN architecture search module, construct structured prompt words that include client resource congestion patterns, task flexibility gradients, and historical performance feedback. Input these prompt words into the large language model to generate a personalized GNN architecture mask that adapts to the current heterogeneous scenario. Step 3: Through the hypernetwork pruning and local dual optimization module, the server prunes the global hypernetwork based on the architecture mask, extracts the subnet parameters adapted to the corresponding client, and distributes them to each client along with the global feature prototype library. Step 4: The client uses the key topology feature definition module to deeply integrate the relative load rate and task contention intensity into enhanced resource features and task-resource edge features; then, through the node update module based on reverse inhibition, the node is updated using a two-stage message passing mechanism, and a global state embedding vector is generated using global pooling operations. Step 5: The client uses the PPO policy training module to embed the global state into the vector input policy network to generate a resource scheduling action sequence, interacts with the environment to obtain a reward oriented towards minimizing the completion time, and executes the PPO algorithm optimization; at the same time, it introduces supervised contrastive learning loss through the feature alignment and prototype generation module to achieve cross-domain feature semantic alignment, and outputs the local parameter update amount and local feature prototype. Step 6: Through the three-channel aggregation and architecture iteration module, the server receives and performs mask-aware sparse parameter aggregation and prototype clustering to build an updated global supernetwork and a unified global feature prototype library. Step 7: Based on the performance of the local and global models on the validation set, the system calculates and accumulates the total score of this round of system performance. The three-channel aggregation and architecture iteration module monitors the score in real time. If the model performance converges, the final scheduling strategy is output; if it does not converge, an architecture refactoring signal and performance feedback data are generated, and the system returns to trigger the large language model in Step 2 for the next round of architecture iteration optimization.