A computing power network resource scheduling method and system
By combining real-time resource awareness and historical behavior analysis with graph neural network topology modeling and deep reinforcement learning, the problem of low resource utilization in computing network resource scheduling is solved, and intelligent scheduling and task execution efficiency are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 江西省科技基础条件平台中心(江西省计算中心)
- Filing Date
- 2026-05-19
- Publication Date
- 2026-06-16
AI Technical Summary
Existing computing power network resource scheduling technologies suffer from lagging resource awareness, limited scheduling dimensions, and a lack of intelligent decision-making capabilities, resulting in low resource utilization and an inability to meet the multi-objective optimization needs of mixed tasks.
By using real-time resource awareness, historical behavior analysis, graph neural network topology modeling, and deep reinforcement learning decision-making, the comprehensive health status and task requirement characteristics of computing nodes are obtained, an initial fit score is generated, and the optimal target allocation node is output using a deep reinforcement learning scheduling decision network.
It achieves efficient and intelligent scheduling of computing resources, improves resource utilization and task execution efficiency, and adapts to network changes and multi-objective optimization needs.
Smart Images

Figure CN122220118A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computing power network and resource scheduling technology, and particularly relates to a computing power network resource scheduling method and system. Background Technology
[0002] Computing networks have become a critical infrastructure supporting the digital economy. Computing networks require the ability to integrate distributed computing, storage, and network resources across regions and levels, enabling dynamic resource awareness and collaborative scheduling. However, existing computing network resource scheduling technologies have the following shortcomings: First, resource awareness is lagging. Traditional scheduling systems use a periodic polling mechanism to collect node status, with update cycles typically on the order of minutes. This makes it impossible to capture the dynamic changes of computing nodes in real time, resulting in outdated resource status being used as the basis for scheduling decisions, and task-resource matching accuracy falling below 60%.
[0003] Second, the scheduling dimension is too narrow. Existing scheduling strategies only focus on the CPU utilization or memory load of a single cluster, ignoring cross-domain collaborative factors such as network bandwidth fluctuations, storage node locations, and heterogeneity of edge node computing power, resulting in an "island effect" in resource scheduling and causing the overall resource utilization rate to be less than 50%.
[0004] Third, it lacks intelligent decision-making capabilities. Static routing strategies cannot respond to network congestion, and fixed scheduling rules are difficult to adapt to the diversity of task types. In particular, when faced with mixed tasks such as computationally intensive, data-intensive, and communication-intensive tasks, a single scheduling strategy cannot simultaneously meet the needs of multi-objective optimization.
[0005] Therefore, there is an urgent need for a computing network resource scheduling method and system that can perceive resource status in real time, integrate multi-dimensional information, and have intelligent decision-making capabilities. Summary of the Invention
[0006] The purpose of this invention is to overcome the shortcomings of the prior art and provide a method and system for scheduling computing network resources. Through real-time resource perception, historical behavior analysis, graph neural network topology modeling, and deep reinforcement learning decision-making, it achieves efficient and intelligent scheduling of computing resources.
[0007] In a first aspect, the present invention provides a method for scheduling computing power network resources, comprising: Acquire real-time resource status data and historical behavior data of multiple computing nodes in the computing power network; The instantaneous availability of each computing node is calculated based on the real-time resource status data, and the historical stability score of each computing node is calculated based on the historical behavior data. The instantaneous availability and the historical stability score are weighted and fused together to obtain the comprehensive health of each computing node. Obtain the resource requirement characteristics of the computing task to be scheduled, including expected computing power consumption, data storage volume, network communication volume, and maximum tolerable latency; Based on the overall health of each computing node and the resource requirement characteristics of the computing task to be scheduled, a graph neural network is used to model the network topology relationship between computing nodes to generate an initial fit score of each computing node relative to the computing task to be scheduled. The node features of the graph neural network are the overall health of each computing node, and the edge features are the network latency and bandwidth between nodes. The initial fit score is input into a deep reinforcement learning-based scheduling decision network, which outputs the optimal target allocation node. The action of the scheduling decision network is the target node for task allocation, and the reward function is set as a weighted negative sum of task completion time, node load balancing degree, and scheduling overhead. The scheduled computation task is sent to the target allocation node for execution.
[0008] In a second aspect, the present invention provides a computing power network resource scheduling system, comprising: The first acquisition module is configured to acquire real-time resource status data and historical behavior data of multiple computing nodes in the computing power network; The fusion module is configured to calculate the instantaneous availability of each computing node based on the real-time resource status data, and calculate the historical stability score of each computing node based on the historical behavior data. The instantaneous availability and the historical stability score are weighted and fused to obtain the comprehensive health of each computing node. The second acquisition module is configured to acquire the resource requirement characteristics of the computing task to be scheduled, wherein the resource requirement characteristics include expected computing power consumption, data storage volume, network communication volume and maximum tolerable latency. The generation module is configured to model the network topology relationship between computing nodes based on the comprehensive health of each computing node and the resource requirement characteristics of the computing task to be scheduled, and generate an initial fit score of each computing node relative to the computing task to be scheduled. The node features of the graph neural network are the comprehensive health of each computing node, and the edge features are the network latency and bandwidth between nodes. The output module is configured to input the initial fitness score into a deep reinforcement learning-based scheduling decision network, the scheduling decision network outputs the optimal target allocation node, the action of the scheduling decision network is the target node for task allocation, and the reward function is set as a weighted negative sum of task completion time, node load balancing degree and scheduling overhead; The sending module is configured to send the computation task to be scheduled to the target allocation node for execution.
[0009] Thirdly, an electronic device is provided, comprising: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the steps of the computing power network resource scheduling method according to any embodiment of the present invention.
[0010] Fourthly, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein when the program instructions are executed by a processor, the processor performs the steps of the computing power network resource scheduling method of any embodiment of the present invention.
[0011] The computing power network resource scheduling method and system of this application acquires real-time resource status and historical behavior data of computing power nodes, calculates instantaneous availability and historical stability scores and merges them into a comprehensive health score; analyzes task resource demand characteristics; based on the comprehensive health score and task requirements, a graph attention network is used to model the topological relationship between nodes to generate an initial fit score; the score is input into a reinforcement learning network based on a dual-delay deep deterministic policy gradient, and the optimal target allocation node is output, with the reward function being a weighted negative sum of task completion time, load balancing degree and scheduling overhead; task execution is then dispatched; by fusing real-time status and historical behavior to evaluate node health, a graph neural network is used to capture topological dependencies, and reinforcement learning is combined to achieve adaptive scheduling, effectively improving computing power resource utilization and task execution efficiency. Attached Figure Description
[0012] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0013] Figure 1 A flowchart illustrating a computing power network resource scheduling method according to an embodiment of the present invention; Figure 2 This is a structural block diagram of a computing power network resource scheduling system provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0014] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0015] Please see Figure 1 The diagram shows a flowchart of a computing power network resource scheduling method according to this application.
[0016] like Figure 1 As shown, the computing power network resource scheduling method specifically includes the following steps: Step S101: Obtain real-time resource status data and historical behavior data of multiple computing nodes in the computing power network.
[0017] In this step, the system first deploys an embedded intelligent agent module on each computing node in the computing power network. This intelligent agent module can be a lightweight software service running at the node's operating system level, or deployed in a containerized form. The intelligent agent module has the ability to monitor the node's hardware resources and system operating status, obtaining various resource indicators by calling low-level interfaces provided by the operating system (such as the / proc file system and sysfs interface in Linux systems, or through hardware monitoring tools).
[0018] Real-time resource status data is collected at fixed time intervals. In this embodiment, the collection period is set to 1 second, meaning the intelligent agent module reads the node's current resource status every second. The collected resource status data includes six core categories of indicators: The first category is computing resource indicators, including CPU utilization and GPU utilization. CPU utilization refers to the average percentage of all cores used, and GPU utilization refers to the computing usage rate of the video memory cores. The second category is storage resource indicators, including memory utilization and disk utilization. Memory utilization is the ratio of used physical memory to total physical memory, and disk utilization is the ratio of used disk space to total disk space. The third category is network resource indicators, including current network bandwidth utilization, which is the ratio of the current network interface transmission rate to the maximum bandwidth. The fourth category is task status indicators, namely the current task queue length, representing the total number of tasks currently being executed and waiting to be executed by the node. These indicators comprehensively reflect the node's load level and remaining processing capacity.
[0019] To ensure data validity, the intelligent agent module incorporates an outlier filtering unit. After each collection of raw data, the outlier filtering unit first checks if each metric is within a reasonable range. For example, CPU utilization should be between 0 and 100. If a negative value or a value exceeding 100 is found, it is considered invalid data, discarded, and an anomaly is logged. Simultaneously, the unit also detects data jumps caused by instantaneous hardware fluctuations. For instance, if CPU utilization is 30% one second and suddenly jumps to 100% before immediately recovering, such outliers matching noise characteristics are replaced with data from the previous second or smoothed using a sliding median filter. After outlier filtering, the data validity rate reaches over 99.5%.
[0020] Next, the intelligent agent module normalizes the filtered data. Because different resource metrics have different value ranges and units (e.g., CPU utilization ranges from 0 to 100, queue length from 0 to several thousand), directly using this data for subsequent fusion calculations would introduce scale bias. Therefore, the normalization unit maps the value range of each metric to a uniform interval of 0 to 1. For metrics with clear upper limits, such as CPU utilization and memory usage, linear normalization is used, i.e., the actual value is divided by the upper limit. For metrics with uncertain theoretical upper limits, such as queue length, a reference upper limit is set based on the node's historical maximum queue length, or a sigmoid function is used for non-linear mapping. The normalized data eliminates the influence of units, allowing different resource dimensions to be compared and weighted on the same scale.
[0021] After outlier filtering and normalization, real-time resource status data is encapsulated into a standard message format. The intelligent broker module uses a message queue transmission mechanism to send the data to the global scheduling center. Specifically, the system deploys a Kafka message queue cluster in the global scheduling center. Each intelligent broker acts as a producer, pushing its own resource status data to the message queue of a specified topic. The Kafka message queue supports high throughput and a multi-subscriber mode. The global scheduling center and various regional scheduling sub-centers can subscribe to this topic and receive resource status data from all nodes in real time. The message queue also has data persistence capabilities; even if the scheduling center experiences a brief downtime, messages will not be lost. To ensure data real-time performance, Kafka producers are configured in asynchronous sending mode, with sending latency controlled at the millisecond level.
[0022] In addition to real-time resource status data, step S101 also requires acquiring historical behavior data for each node. The historical behavior data originates from the system's runtime log database and the behavior monitoring module. Whenever a computational task is completed on a node, the system automatically records the task's execution status, including whether the task was successfully completed, the response latency from start to finish, whether any failures occurred during task execution, and the time spent recovering from those failures. These records are written to a distributed time-series database and indexed by node identifier and timestamp.
[0023] In this embodiment, the system aggregates and statistically analyzes historical data according to time windows (with a window length of one week). For each node, within each time window, the system calculates the completion rate of all tasks executed by that node, i.e., the ratio of the number of successfully completed tasks to the total number of tasks; calculates the average response latency of all tasks, i.e., the arithmetic mean of the response time of each task (the total time from task issuance to result return); and calculates the recovery time of all faults occurring within that window, i.e., the time elapsed from the occurrence of a fault to the node returning to normal operation. These statistical results are stored in sequence form. Each node saves the task completion rate sequence, average response latency sequence, and fault recovery time sequence for several past time windows (e.g., the most recent 12 weekly windows), forming historical behavioral data.
[0024] The aforementioned real-time resource status data and historical behavior data are provided to subsequent steps through a unified data interface. The global scheduling center caches this data in a distributed high-speed cache in memory for rapid retrieval during scheduling decisions. Simultaneously, the system implements a data expiration policy, marking real-time data that has not been updated for a certain period (e.g., 24 hours) as invalid, and archiving historical data older than one year. Through this complete data collection, filtering, normalization, and storage mechanism, step S101 provides a high-quality, low-latency data foundation for subsequent node health assessments.
[0025] Step S102: Calculate the instantaneous availability of each computing node based on the real-time resource status data, and calculate the historical stability score of each computing node based on the historical behavior data. Then, weight and fuse the instantaneous availability and the historical stability score to obtain the comprehensive health of each computing node.
[0026] In this step, the resource dimensions of the computing nodes are obtained, and the remaining rate of each resource dimension is calculated. The resource dimensions include CPU utilization. GPU utilization Memory usage Disk usage Current bandwidth utilization and task queue length The expression for calculating the surplus rate of each resource dimension is: , , , , , In the formula, The remaining percentage of CPU utilization. The remaining percentage of GPU utilization. The remaining percentage of memory usage. The remaining percentage of disk usage. This represents the remaining percentage of current bandwidth utilization. For each resource dimension, calculate the data dispersion index of the resource dimension across different nodes, and convert the data dispersion index into a weighting coefficient, expressed as: , , In the formula, Let be the weight coefficient for the j-th resource dimension. Let j be the data dispersion index for the j-th resource dimension. This is an index of the data dispersion in the k-th resource dimension. Let be the normalized value of the residual rate of the i-th node in the j-th resource dimension. The total number of nodes; The instantaneous availability is obtained by weighting the remaining rates of each resource dimension according to their corresponding weight coefficients, and then multiplying the sum by the decay factor of the task queue length. The expression is as follows: , In the formula, For instantaneous availability, Let j be the remaining rate of the j-th resource dimension. This is the preset queue length sensitivity coefficient; Obtain the task completion rate sequence of each node over the past N time windows. Average response delay sequence and fault recovery time sequence ,in, Let be the task completion rate of the node within the Nth time window. Let N be the average response latency of the node within the Nth time window. The fault recovery time of the node within the Nth time window; Calculate the mean and standard deviation of the task completion rate sequence, and calculate the stable completion rate based on the mean and standard deviation. The expression is: , In the formula, This is the preset penalty coefficient; The exponentially weighted moving average of the average response delay sequence is calculated as follows: , In the formula, The exponentially weighted moving average of the current moment is delayed. As a smoothing factor, This represents the actual average response delay for the current time window. The exponentially weighted moving average of the previous time step is delayed; Calculate the median of the fault recovery time sequence. Based on the median The recovery capability value is determined by the preset fault recovery threshold, expressed as follows: , In the formula, To restore ability value, This is the fault recovery threshold; The historical stability score is obtained by normalizing the stable completion rate, the exponentially weighted moving average, and the recovery ability score, and then summing them by weight.
[0027] Step S103: Obtain the resource requirement characteristics of the computing task to be scheduled, including expected computing power consumption, data storage volume, network communication volume, and maximum tolerable latency.
[0028] In this step, the metadata of the computing task is parsed to extract the identifier of the task type, which includes compute-intensive, data-intensive, and communication-intensive tasks. Based on the task type, an initial resource requirement vector is matched from a preset resource requirement template library, and then dynamically corrected by combining the actual input data of the task to obtain three-dimensional requirement features. Obtain the service level agreement parameters for the task, and extract the maximum tolerable latency and minimum reliability requirements; combine the three-dimensional requirement features, the maximum tolerable latency, and the minimum reliability requirements into resource requirement features.
[0029] In one specific embodiment, the system first receives a computation task submitted by the user. Computation tasks are typically submitted in the form of a job description file, which contains metadata information about the task, such as the task name, execution command, input data path, output data path, expected runtime, and resource estimates. The system parses this metadata file to extract the task type identifier. Task type determination can be performed in several ways: first, based on the task type field explicitly declared by the user, such as specifying "computation-intensive" or "data-intensive" when submitting the job; second, through the characteristics of the task's executable file, for example, if the task's executable file is a deep learning training script (such as calling TensorFlow or PyTorch frameworks) and the expected computational power requirement is high, it is determined to be computation-intensive; if the task's main operations involve sorting, filtering, and aggregating large-scale data, and the input data volume is huge, it is determined to be data-intensive; if the task involves a large amount of cross-node communication (such as an MPI parallel program), it is determined to be communication-intensive. The system can also perform type matching by combining the execution characteristics of similar historical tasks.
[0030] After obtaining the task type, the system enters the resource requirement template matching stage. The resource requirement template library is a pre-built database that stores typical resource requirement configurations for different task types. For example, for compute-intensive tasks, the preset initial resource requirement vector in the template library is: expected computing power consumption of 1000 GFlops, data storage of 10GB, and network communication of 5GB. For data-intensive tasks, the preset template is: computing power consumption of 200 GFlops, data storage of 500GB, and network communication of 50GB. For communication-intensive tasks, the preset template is: computing power consumption of 500 GFlops, data storage of 20GB, and network communication of 200GB. These template values are set based on historical statistics and expert experience and can be adjusted periodically according to actual operating results.
[0031] Based on the task type identified in step S102, the system retrieves the corresponding initial resource requirement vector from the template library. However, since the actual input data volume varies for each task, directly using the template value will result in a large error, thus requiring dynamic correction. The specific method for dynamic correction is as follows: the system reads the actual input data volume (unit: GB) and the expected output data volume (unit: GB) of the task. These two values can be obtained from the dataset size pointed to by the "input path" and "output path" in the task description, or by executing a simple metadata query command (e.g., using Hadoop's "hdfsdfs -du" command to obtain the file size in the distributed file system).
[0032] The correction process consists of three steps: First, the data storage size is scaled proportionally based on the ratio of the actual input data size to the template's preset data storage size. For example, if the template's preset data storage size is 10GB, and the actual input data size is 50GB, the corrected data storage size will be five times the initial value. Second, network traffic is adjusted accordingly based on the data storage size correction ratio. This is because a larger data volume typically results in a higher network load during scheduling. Furthermore, the task's data dependency (e.g., whether the task needs to frequently exchange intermediate results between different nodes) must also be considered. If the task description includes a "data dependency" field (ranging from 0 to 1), the network traffic is corrected by multiplying the template value by the data storage size scaling factor and then by the data dependency. The third step, adjusting computing power consumption, is more complex. The system uses different adjustment formulas depending on the task type: for computationally intensive tasks, computing power consumption has a weak linear relationship with data volume, and the adjustment is the template value multiplied by (1 + log (data storage scaling factor)); for data-intensive tasks, computing power consumption is basically unrelated to data volume, and no significant adjustment is made; for communication-intensive tasks, computing power consumption is directly proportional to data volume, and the adjustment is made proportionally to the scaling factor. After these three steps of adjustment, the system obtains three-dimensional demand characteristics, namely the adjusted expected computing power consumption, data storage volume, and network communication volume.
[0033] Next, the system retrieves the Service Level Agreement (SLA) parameters for the task. The SLA is typically specified by the user upon task submission or automatically assigned by the system based on the task's origin. If the user does not explicitly provide it, the system can use the default value. The SLA parameters mainly include the maximum tolerable latency and the minimum reliability requirement. The maximum tolerable latency refers to the maximum allowed time from task submission to task completion, measured in seconds, minutes, or hours. For example, a real-time data stream processing task might require a maximum tolerable latency of no more than 500 milliseconds, while an offline report generation task might allow for latency as long as several hours. The minimum reliability requirement is the lower limit of the probability of successful task completion, ranging from 0 to 1, with typical values of 0.95 or 0.99. Some critical business tasks may require extremely high reliability of 0.9999.
[0034] The system combines these service level agreement parameters with three-dimensional requirement features. Specifically, the system generates a five-dimensional resource requirement feature vector, with the dimensions as follows: expected computing power consumption (unit: GFlops or PFlops), data storage volume (unit: GB), network traffic volume (unit: GB), maximum tolerable latency (unit: ms or s), and minimum reliability requirement (dimensionless). To facilitate matching calculations with node features in subsequent steps, the system also normalizes this vector, mapping each dimension to the range of 0 to 1. Different types of tasks exhibit significantly different shapes for their requirement feature vectors. For example, computationally intensive tasks have larger values in the computing power consumption dimension, data-intensive tasks have larger values in the data storage dimension, and communication-intensive tasks have larger values in the network traffic volume dimension. This provides a discriminative basis for the subsequent matching of tasks and nodes by the graph neural network.
[0035] Finally, the system passes the generated and normalized resource requirement feature vector to the next step as input to the task representation vector in the graph neural network. Simultaneously, the system caches these features for performance evaluation and model feedback after task execution. Through this process, step S103 achieves a refined and standardized description of the resource requirements of heterogeneous computing tasks, laying a quantitative foundation for subsequent intelligent scheduling decisions.
[0036] Step S104: Based on the comprehensive health of each computing node and the resource requirement characteristics of the computing task to be scheduled, a graph neural network is used to model the network topology relationship between computing nodes, and an initial fit score of each computing node relative to the computing task to be scheduled is generated. The node features of the graph neural network are the comprehensive health of each computing node, and the edge features are the network latency and bandwidth between nodes.
[0037] In this step, a computing power network topology diagram is constructed. , where the set of nodes The i-th node in The initial feature vector contains the comprehensive health score of the i-th node. The geographical coordinates of the i-th node And the computing power type encoding of the i-th node edge set Middle node With nodes Between the edges The feature vector contains nodes With nodes Network latency between and available bandwidth ; A graph attention network layer is used to perform message passing and aggregation of node features. Layers, nodes The update formula is: , , In the formula, For nodes The set of neighboring nodes, For the first Layer nodes eigenvectors, For the first Layer nodes For neighboring nodes Attention coefficient For the first Layer node feature transformation matrix For the first The edge feature transformation matrix of the layer, For the first The attention weight vector of the layer, For the first The transpose of the attention weight vector of the layer. This is a vector concatenation operation. It is a linear rectified activation function. It is an exponential function. For activation functions; After passing through a multi-layer graph attention network, the final state representation vector of each node is obtained. The source requirement characteristics of the computational task to be scheduled are mapped to the final state representation vector through a fully connected layer. The same dimensions yield the task representation vector. ; Calculate the cosine similarity between the state representation and the task representation of each node as the initial fitness score. The initial fitness scores of all nodes are normalized to obtain the initial fitness score of each computing node relative to the computing task to be scheduled.
[0038] Step S105: Input the initial fitness score into the scheduling decision network based on deep reinforcement learning. The scheduling decision network outputs the optimal target allocation node. The action of the scheduling decision network is the target node for task allocation. The reward function is set as a weighted negative sum of task completion time, node load balancing degree and scheduling overhead.
[0039] In this step, the expression for the reward function of the scheduling decision network is: , In the formula, For the reward function, For preset weighting coefficients, The actual time to complete the task. The expected completion time is predicted based on node resources. Let i be the utilization rate of node i. For average utilization rate, The computational overhead of scheduling decisions, This is the baseline cost.
[0040] In one specific embodiment, a scheduling decision network based on deep reinforcement learning is introduced. The initial fitness score is used as the state input, and the optimal task allocation node is output through the policy learned by the agent through interaction with the environment. The core advantage of this step lies in its ability to adaptively balance three mutually constraining optimization objectives: task completion time, node load balancing, and scheduling overhead.
[0041] The specific implementation process is as follows: First, the system constructs a scheduling decision network based on deep reinforcement learning. This embodiment uses a dual-delay deep deterministic policy gradient algorithm as its core framework. This algorithm is specifically designed for continuous action spaces, but through soft maximum transformation, it can be applied to task scheduling scenarios with discrete action spaces. The scheduling decision network consists of three sub-networks: one Actor network and two Critic networks. The Actor network is responsible for outputting scheduling decisions based on the current state, i.e., the probability distribution of task allocation to each node; the two Critic networks evaluate the value of the actions output by the Actor network. Using two Critic networks can alleviate the overestimation problem in traditional Q-learning. All three networks adopt a multilayer perceptron structure, containing two hidden layers, each with 128 and 64 neurons respectively, and the activation function is a linear rectified function. The dimension of the network input layer is determined by the dimension of the state space, and the dimension of the output layer is determined by the dimension of the action space (i.e., the number of nodes).
[0042] Next, the system defines the state space, action space, and reward function.
[0043] The state space is composed of two parts: the first part is the initial fitness score vector of each computing node output in step S104, which contains the score values of all nodes, and its dimension is the total number of nodes; the second part is the current comprehensive health vector of each computing node, also with the dimension of the total number of nodes. The two parts are aligned by node index and then concatenated into a complete state vector. For example, if there are 50 nodes in the computing network, the length of the state vector is 50 (fitness score) + 50 (comprehensive health) = 100 dimensions. This state vector fully depicts all the information upon which the current scheduling decision depends: the fitness score reflects the task's preference for the node, and the comprehensive health reflects the actual usability of the node; the combination of the two can guide the scheduling network to make reasonable decisions.
[0044] The action space is defined as the probability distribution of assigning the currently scheduled computational task to various computing nodes. The scheduling decision network outputs a soft-maximum vector with a dimension equal to the number of nodes. Each element in the vector has a value between 0 and 1, and the sum of all elements is 1, representing the probability of assigning the task to the corresponding node. The finally determined allocation node is the node corresponding to the maximum probability. To avoid the policy getting trapped in local optima, random exploration is performed with a certain probability during the training phase (i.e., sampling according to the probability distribution rather than directly taking the maximum value) to enhance the diversity of the policy.
[0045] The reward function is the core design element of this step. It is set as the negative of the weighted sum of task completion time, node load balancing, and scheduling overhead. A larger reward value (i.e., a smaller negative value) is better, which is equivalent to requiring the system to minimize the weighted sum. The specific components of the reward function are as follows: The first item is the ratio of the actual completion time to the expected completion time. Actual completion time refers to the total time elapsed from task issuance to execution on the target node, until the result is returned to the scheduler. Expected completion time is the estimated execution time based on the overall health of the target node, the current task queue length, and the task's own computational requirements. A smaller ratio indicates higher actual execution efficiency; a ratio greater than 1 indicates that actual execution is slower than expected. This item is multiplied by a preset time weighting coefficient.
[0046] The second item is the node load balancing deviation. The specific calculation method is as follows: First, calculate the utilization rate of all nodes, which can be a weighted average of CPU utilization or overall resource utilization; then calculate the arithmetic mean of the utilization rates of all nodes; next, calculate the sum of the absolute values of the deviations of each node's utilization rate from the average, and then normalize by dividing by the total number of nodes and the average utilization rate. This ratio reflects the degree of load distribution among the nodes: the smaller the ratio, the more balanced the load; the larger the ratio, the more overloaded some nodes are while others are idle. This item is multiplied by a preset load weighting coefficient.
[0047] The third term is the ratio of the computational cost of scheduling decisions to the baseline cost. The computational cost of scheduling decisions refers to the time consumed by the scheduling decision network during forward reasoning (from state input to action output), as well as the associated data preparation and network communication time. The baseline cost is a preset reference value, such as the time required for a simple round-robin scheduling operation under off-load conditions. The smaller this ratio, the more lightweight and efficient the decision-making process. This term is multiplied by a preset cost weighting coefficient.
[0048] The sum of the three factors is taken as the negative value to obtain the final reward function value. The three weighting coefficients (time weight, load weight, and overhead weight) can be adjusted according to actual business needs: for latency-sensitive tasks, increase the time weight; for scenarios that pursue resource utilization, increase the load weight; for scenarios with extremely high scheduling frequency, increase the overhead weight. In this embodiment, the three weighting coefficients are set to 0.5, 0.3, and 0.2, respectively.
[0049] The training phase of the scheduling decision network is conducted offline. The system collects historical scheduling tasks and their execution results to construct an experience replay pool. Each experience sample contains a quadruple: current state, action taken, reward obtained, and next state. During training, a batch of samples is randomly sampled from the experience pool, employing a priority experience replay mechanism, where samples with larger sequential difference errors receive higher sampling probabilities to accelerate learning. The parameters of the Actor and Critic networks are updated using a dual-delay deep deterministic policy gradient algorithm. The target network uses a soft update method with a smoothing coefficient set to 0.005, ensuring stability while allowing for gradual policy improvement. Training continues until the reward function value converges.
[0050] During the inference phase (i.e., actual scheduling), the system no longer performs random exploration but directly calls the pre-trained Actor network. The specific execution flow is as follows: the current state vector (composed of the initial fitness score and overall health score calculated in real time) is input into the Actor network. After forward propagation, the network outputs a soft-maximum vector with dimensions equal to the number of nodes. The system finds the node index with the highest probability value in this vector and uses it as the target allocation node for this scheduling operation. To ensure decision reliability, the system also sets a confidence threshold (e.g., 0.6). If the maximum probability value is lower than this threshold, it indicates that the network lacks confidence in the current decision. In this case, the system triggers a manual review mechanism or switches to an alternative scheduling strategy (such as round-robin or minimum load priority scheduling), and saves this record for subsequent model fine-tuning.
[0051] Finally, the system transmits the optimal target allocation node information obtained from the decision to step S106, whereby step S106 completes the actual task assignment and execution. The complete record of the decision-making process (including status, actions, and rewards) is stored in the log system for subsequent model evaluation and online incremental learning.
[0052] Through the deep reinforcement learning scheduling decision network in step S105, the system achieves an intelligent leap from static scoring to dynamic decision-making, and can adaptively select the optimal computing power node under multi-objective constraints, significantly improving the adaptability and robustness of the scheduling strategy.
[0053] Step S106: The computation task to be scheduled is sent to the target allocation node for execution.
[0054] In summary, the method of this application acquires real-time resource status and historical behavior data of computing nodes, calculates instantaneous availability and historical stability scores, and merges them into a comprehensive health score; analyzes task resource requirement characteristics; based on the comprehensive health score and task requirements, a graph attention network is used to model the topological relationships between nodes to generate an initial fit score; the score is input into a reinforcement learning network based on a dual-latency deep deterministic policy gradient, which outputs the optimal target node allocation, with the reward function being a weighted negative sum of task completion time, load balancing, and scheduling overhead; and then the task is executed. This invention effectively improves computing resource utilization and task execution efficiency by fusing real-time status and historical behavior to evaluate node health, using graph neural networks to capture topological dependencies, and combining reinforcement learning to achieve adaptive scheduling.
[0055] Please see Figure 2 The diagram shows a structural block diagram of a computing power network resource scheduling system according to this application.
[0056] like Figure 2 As shown, the computing power network resource scheduling system 200 includes a first acquisition module 210, a fusion module 220, a second acquisition module 230, a generation module 240, an output module 250, and a sending module 260.
[0057] The system includes a first acquisition module 210 configured to acquire real-time resource status data and historical behavior data of multiple computing nodes in the computing power network; a fusion module 220 configured to calculate the instantaneous availability of each computing node based on the real-time resource status data, and calculate the historical stability score of each computing node based on the historical behavior data, and weightedly fuse the instantaneous availability and the historical stability score to obtain the comprehensive health of each computing node; a second acquisition module 230 configured to acquire the resource requirement characteristics of the computing task to be scheduled, including expected computing power consumption, data storage volume, network communication volume, and maximum tolerable latency; and a generation module 240 configured to generate data based on the comprehensive health of each computing node and the resource requirements of the computing task to be scheduled. The system employs a graph neural network to model the network topology between computing nodes, generating an initial fit score for each computing node relative to the scheduled computing task. The node features of the graph neural network represent the overall health of each computing node, while the edge features represent network latency and bandwidth between nodes. An output module 250 is configured to input the initial fit score into a deep reinforcement learning-based scheduling decision network. This network outputs the optimal target allocation node, and its action is the target node for task allocation. The reward function is a weighted negative sum of task completion time, node load balancing, and scheduling overhead. A sending module 260 is configured to distribute the scheduled computing task to the target allocation node for execution.
[0058] It should be understood that Figure 2 The modules and references described in the document Figure 1 The steps described in the text correspond to those in the method described above. Therefore, the operations, features, and corresponding technical effects described above also apply to the method described in the text. Figure 2 The various modules in the document will not be described in detail here.
[0059] In other embodiments, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein when the program instructions are executed by a processor, the processor performs the computing power network resource scheduling method in any of the above method embodiments. In one embodiment, the computer-readable storage medium of the present invention stores computer-executable instructions, which are configured as follows: Acquire real-time resource status data and historical behavior data of multiple computing nodes in the computing power network; The instantaneous availability of each computing node is calculated based on the real-time resource status data, and the historical stability score of each computing node is calculated based on the historical behavior data. The instantaneous availability and the historical stability score are weighted and fused together to obtain the comprehensive health of each computing node. Obtain the resource requirement characteristics of the computing task to be scheduled, including expected computing power consumption, data storage volume, network communication volume, and maximum tolerable latency; Based on the overall health of each computing node and the resource requirement characteristics of the computing task to be scheduled, a graph neural network is used to model the network topology relationship between computing nodes to generate an initial fit score of each computing node relative to the computing task to be scheduled. The node features of the graph neural network are the overall health of each computing node, and the edge features are the network latency and bandwidth between nodes. The initial fit score is input into a deep reinforcement learning-based scheduling decision network, which outputs the optimal target allocation node. The action of the scheduling decision network is the target node for task allocation, and the reward function is set as a weighted negative sum of task completion time, node load balancing degree, and scheduling overhead. The scheduled computation task is sent to the target allocation node for execution.
[0060] Computer-readable storage media may include a stored program area and a stored data area, wherein the stored program area may store an operating system and an application program required for at least one function; the stored data area may store data created based on the use of the computing network resource scheduling system, etc. Furthermore, the computer-readable storage medium may include high-speed random access memory, and may also include memory, such as at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the computer-readable storage medium may optionally include memory remotely disposed relative to the processor, and these remote memories may be connected to the computing network resource scheduling system via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0061] Figure 3 This is a schematic diagram of the structure of the electronic device provided in the embodiment of the present invention, such as... Figure 3 As shown, the device includes a processor 310 and a memory 320. The electronic device may also include an input device 330 and an output device 340. The processor 310, memory 320, input device 330, and output device 340 can be connected via a bus or other means. Figure 3 Taking a bus connection as an example, the memory 320 is the computer-readable storage medium described above. The processor 310 executes various server functions and data processing by running non-volatile software programs, instructions, and modules stored in the memory 320, thereby implementing the computing power network resource scheduling method described in the above embodiment. The input device 330 can receive input digital or character information and generate key signal inputs related to user settings and function control of the computing power network resource scheduling system. The output device 340 may include a display screen or other display device.
[0062] The aforementioned electronic device can execute the method provided in the embodiments of the present invention, and has the corresponding functional modules and beneficial effects for executing the method. Technical details not described in detail in this embodiment can be found in the method provided in the embodiments of the present invention.
[0063] In one implementation, the above-described electronic device is applied in a computing power network resource scheduling system for a client, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to: Acquire real-time resource status data and historical behavior data of multiple computing nodes in the computing power network; The instantaneous availability of each computing node is calculated based on the real-time resource status data, and the historical stability score of each computing node is calculated based on the historical behavior data. The instantaneous availability and the historical stability score are weighted and fused together to obtain the comprehensive health of each computing node. Obtain the resource requirement characteristics of the computing task to be scheduled, including expected computing power consumption, data storage volume, network communication volume, and maximum tolerable latency; Based on the overall health of each computing node and the resource requirement characteristics of the computing task to be scheduled, a graph neural network is used to model the network topology relationship between computing nodes to generate an initial fit score of each computing node relative to the computing task to be scheduled. The node features of the graph neural network are the overall health of each computing node, and the edge features are the network latency and bandwidth between nodes. The initial fit score is input into a deep reinforcement learning-based scheduling decision network, which outputs the optimal target allocation node. The action of the scheduling decision network is the target node for task allocation, and the reward function is set as a weighted negative sum of task completion time, node load balancing degree, and scheduling overhead. The scheduled computation task is sent to the target allocation node for execution.
[0064] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods of various embodiments or some parts of embodiments.
[0065] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for scheduling computing network resources, characterized in that, include: Acquire real-time resource status data and historical behavior data of multiple computing nodes in the computing power network; The instantaneous availability of each computing node is calculated based on the real-time resource status data, and the historical stability score of each computing node is calculated based on the historical behavior data. The instantaneous availability and the historical stability score are weighted and fused together to obtain the comprehensive health of each computing node. Obtain the resource requirement characteristics of the computing task to be scheduled, including expected computing power consumption, data storage volume, network communication volume, and maximum tolerable latency; Based on the overall health of each computing node and the resource requirement characteristics of the computing task to be scheduled, a graph neural network is used to model the network topology relationship between computing nodes to generate an initial fit score of each computing node relative to the computing task to be scheduled. The node features of the graph neural network are the overall health of each computing node, and the edge features are the network latency and bandwidth between nodes. The initial fit score is input into a deep reinforcement learning-based scheduling decision network, which outputs the optimal target allocation node. The action of the scheduling decision network is the target node for task allocation, and the reward function is set as a weighted negative sum of task completion time, node load balancing degree, and scheduling overhead. The scheduled computation task is sent to the target allocation node for execution.
2. The computing power network resource scheduling method according to claim 1, characterized in that, The calculation of the instantaneous availability of each computing node based on the real-time resource status data, and the calculation of the historical stability score of each computing node based on the historical behavior data, include: Obtain the resource dimensions of the computing nodes and calculate the remaining rate of each resource dimension, including CPU utilization. GPU utilization Memory usage Disk usage Current bandwidth utilization and task queue length The expression for calculating the surplus rate of each resource dimension is: , , , , , In the formula, The remaining percentage of CPU utilization. The remaining percentage of GPU utilization. The remaining percentage of memory usage. This represents the remaining percentage of disk usage. This represents the remaining percentage of current bandwidth utilization. For each resource dimension, calculate the data dispersion index of the resource dimension across different nodes, and convert the data dispersion index into a weighting coefficient, expressed as: , , In the formula, Let j be the weight coefficient of the j-th resource dimension. Let j be the data dispersion index for the j-th resource dimension. This is an index of the data dispersion in the k-th resource dimension. Let be the normalized value of the residual rate of the i-th node in the j-th resource dimension. The total number of nodes; The instantaneous availability is obtained by weighting the remaining rates of each resource dimension according to their corresponding weight coefficients, and then multiplying the sum by the decay factor of the task queue length. The expression is as follows: , In the formula, For instantaneous availability, Let j be the remaining rate of the j-th resource dimension. This is the preset queue length sensitivity coefficient; Obtain the task completion rate sequence of each node over the past N time windows. Average response delay sequence and fault recovery time sequence ,in, Let be the task completion rate of the node within the Nth time window. Let N be the average response latency of the node within the Nth time window. This represents the fault recovery time of the node within the Nth time window. Calculate the mean and standard deviation of the task completion rate sequence, and calculate the stable completion rate based on the mean and standard deviation. The expression is: , In the formula, This is the preset penalty coefficient; The exponentially weighted moving average of the average response delay sequence is calculated as follows: , In the formula, The exponentially weighted moving average of the current moment is delayed. As a smoothing factor, This represents the actual average response delay for the current time window. The exponentially weighted moving average of the previous time step is delayed; Calculate the median of the fault recovery time sequence. Based on the median The recovery capability value is determined by the preset fault recovery threshold, expressed as follows: , In the formula, To restore ability value, This is the fault recovery threshold; The historical stability score is obtained by normalizing the stable completion rate, the exponentially weighted moving average, and the recovery ability score, and then summing them by weight.
3. The computing power network resource scheduling method according to claim 1, characterized in that, The resource requirement characteristics of the computational task to be scheduled include: Parse the metadata of the computing task and extract the identifier of the task type, which includes computing-intensive, data-intensive, and communication-intensive tasks; Based on the task type, an initial resource requirement vector is matched from a preset resource requirement template library, and then dynamically corrected by combining the actual input data of the task to obtain three-dimensional requirement features. Obtain the service level agreement parameters for the task, and extract the maximum tolerable latency and minimum reliability requirements; combine the three-dimensional requirement features, the maximum tolerable latency, and the minimum reliability requirements into resource requirement features.
4. The computing power network resource scheduling method according to claim 1, characterized in that, Based on the comprehensive health of each computing node and the resource requirement characteristics of the computing task to be scheduled, a graph neural network is used to model the network topology relationship between computing nodes, generating an initial fit score for each computing node relative to the computing task to be scheduled, including: Constructing a computing power network topology diagram , where the set of nodes The i-th node in The initial feature vector contains the comprehensive health score of the i-th node. The geographical coordinates of the i-th node And the computing power type encoding of the i-th node edge set Middle node With nodes Between the edges The feature vector contains nodes With nodes Network latency between and available bandwidth ; A graph attention network layer is used to perform message passing and aggregation of node features. Layers, nodes The update formula is: , , In the formula, For nodes The set of neighboring nodes, For the first Layer nodes eigenvectors, For the first Layer nodes For neighboring nodes Attention coefficient For the first Layer node feature transformation matrix For the first The edge feature transformation matrix of the layer, For the first The attention weight vector of the layer, For the first The transpose of the attention weight vector of the layer. This is a vector concatenation operation. It is a linear rectified activation function. It is an exponential function. For activation functions; After passing through a multi-layer graph attention network, the final state representation vector of each node is obtained. The source requirement characteristics of the computational task to be scheduled are mapped to the final state representation vector through a fully connected layer. The same dimensions yield the task representation vector. ; Calculate the cosine similarity between the state representation and the task representation of each node as the initial fitness score. The initial fitness scores of all nodes are normalized to obtain the initial fitness score of each computing node relative to the computing task to be scheduled.
5. The computing power network resource scheduling method according to claim 1, characterized in that, The expression for the reward function of the scheduling decision network is: , In the formula, For the reward function, For preset weighting coefficients, The actual time to complete the task. The expected completion time is predicted based on node resources. Let i be the utilization rate of node i. For average utilization rate, The computational overhead of scheduling decisions, Baseline cost.
6. A computing power network resource scheduling system, characterized in that, include: The first acquisition module is configured to acquire real-time resource status data and historical behavior data of multiple computing nodes in the computing power network; The fusion module is configured to calculate the instantaneous availability of each computing node based on the real-time resource status data, and calculate the historical stability score of each computing node based on the historical behavior data. The instantaneous availability and the historical stability score are weighted and fused to obtain the comprehensive health of each computing node. The second acquisition module is configured to acquire the resource requirement characteristics of the computing task to be scheduled, wherein the resource requirement characteristics include expected computing power consumption, data storage volume, network communication volume and maximum tolerable latency. The generation module is configured to model the network topology relationship between computing nodes based on the comprehensive health of each computing node and the resource requirement characteristics of the computing task to be scheduled, and generate an initial fit score of each computing node relative to the computing task to be scheduled. The node features of the graph neural network are the comprehensive health of each computing node, and the edge features are the network latency and bandwidth between nodes. The output module is configured to input the initial fitness score into a deep reinforcement learning-based scheduling decision network, the scheduling decision network outputs the optimal target allocation node, the action of the scheduling decision network is the target node for task allocation, and the reward function is set as a weighted negative sum of task completion time, node load balancing degree and scheduling overhead; The sending module is configured to send the computation task to be scheduled to the target allocation node for execution.
7. An electronic device, characterized in that, include: At least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 5.
8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method described in any one of claims 1 to 5.