Global topology-aware computing power network gpu resource cooperative scheduling method and system

By constructing a global topology model and collaborative scheduling method for the computing power network GPU, the problem of insufficient global topology awareness in GPU resource scheduling is solved, achieving accurate resource matching and load balancing, and improving task execution efficiency and resource utilization.

CN122240262APending Publication Date: 2026-06-19JINAN JIEMING ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JINAN JIEMING ELECTRONICS CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies lack global topology awareness in GPU resource scheduling, resulting in inaccurate resource matching, unbalanced load, and low computing power utilization, which cannot meet the scheduling requirements of complex computing tasks with high real-time performance and high throughput.

Method used

A global topology model of the computing power network GPU is constructed. By quantifying node connections and task requirements, the resource status is evaluated in real time. Based on the global topology model, collaborative scheduling analysis is performed to determine the target GPU resource combination and optimize scheduling.

🎯Benefits of technology

It achieves precise matching and load balancing of GPU resources, improves communication efficiency and task execution efficiency, and enhances the resource utilization of the computing network.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240262A_ABST
    Figure CN122240262A_ABST
Patent Text Reader

Abstract

This invention discloses a globally topology-aware GPU resource collaborative scheduling method and system for computing power networks, belonging to the field of GPU resource collaborative scheduling technology. The method includes: constructing a global GPU topology model for the computing power network; determining target task requirement parameters; performing availability assessment on GPU resource usage status data to obtain GPU resource status parameters; performing collaborative scheduling analysis on the target task requirement parameters and GPU resource status parameters to determine the target GPU resource combination; and performing task decision execution and feedback scheduling optimization based on the target GPU resource combination. This invention solves the technical problems of existing technologies, such as lack of global topology awareness in GPU scheduling, inaccurate resource matching, unbalanced load, and low computing power utilization and task execution efficiency. It achieves precise GPU resource matching, improved communication efficiency, and optimized load balancing, thereby improving the task execution efficiency and resource utilization of the computing power network.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of GPU resource collaborative scheduling technology, specifically to a method and system for global topology-aware GPU resource collaborative scheduling in computing power networks. Background Technology

[0002] With the rapid development of artificial intelligence, cloud computing, and distributed computing, the scale of GPU nodes in computing networks is constantly expanding, and the types of tasks are becoming increasingly complex. Traditional GPU resource scheduling methods often rely on simple allocation based on local resource status, lacking awareness of the global topology and inter-node communication relationships, making it difficult to achieve cross-node collaborative optimization. Existing technologies typically only select based on basic hardware indicators such as computing power and memory, without comprehensively considering key factors such as inter-node communication latency, link bandwidth, and topology distance. This easily leads to problems such as low resource matching accuracy, high communication overhead, and uneven cluster load, resulting in low utilization of computing resources, increased task execution latency, and low collaborative efficiency of large-scale parallel tasks, failing to meet the scheduling requirements of complex computing tasks with high real-time performance and high throughput.

[0003] Existing technologies suffer from technical problems such as lack of global topology awareness in GPU scheduling, inaccurate resource matching, unbalanced load, and low computing power utilization and task execution efficiency. Summary of the Invention

[0004] This application provides a globally topology-aware GPU resource collaborative scheduling method and system for computing power networks, which is used to address the technical problems in the prior art such as lack of global topology awareness in GPU scheduling, inaccurate resource matching, unbalanced load, and low computing power utilization and task execution efficiency.

[0005] In view of the above problems, this application provides a method and system for global topology-aware GPU resource collaborative scheduling in computing power networks.

[0006] The first aspect of this application provides a method for collaborative scheduling of GPU resources in a globally topology-aware computing network, the method comprising:

[0007] For each GPU node in the computing power network, connection communication quantification and global topology analysis are performed to construct a global GPU topology model for the computing power network. Tasks to be processed are acquired through the computing power network, and the requirements of these tasks are analyzed and quantified to determine the target task requirement parameters. GPU resource usage status data in the computing power network is collected in real time, and availability assessment is performed on this data to obtain GPU resource status parameters. Based on the global GPU topology model of the computing power network, the target task requirement parameters and the GPU resource status parameters are analyzed for collaborative scheduling to determine the target GPU resource combination. Task decision-making, execution, and feedback scheduling optimization are then performed using the target GPU resource combination.

[0008] A second aspect of this application provides a globally topology-aware GPU resource collaborative scheduling system for computing power networks, the system comprising:

[0009] The topology model construction module is used to quantify the connection communication of each GPU node in the computing power network and perform global topology analysis to construct a global GPU topology model for the computing power network. The target task requirement parameter determination module is used to obtain tasks to be processed through the computing power network, perform requirement parsing and quantification on the tasks to be processed, and determine the target task requirement parameters. The resource status parameter acquisition module is used to collect GPU resource usage status data in the computing power network in real time, perform availability assessment on the GPU resource usage status data, and obtain GPU resource status parameters. The collaborative scheduling parsing module is used to perform collaborative scheduling parsing on the target task requirement parameters and the GPU resource status parameters based on the global GPU topology model of the computing power network, determine the target GPU resource combination, and perform task decision execution and feedback scheduling optimization through the target GPU resource combination.

[0010] One or more technical solutions provided in this application have at least the following technical effects or advantages:

[0011] A global GPU topology model for a computing power network is constructed. Tasks to be processed are acquired through the computing power network, and their requirements are analyzed and quantified to determine target task requirement parameters. GPU resource usage status data in the computing power network is collected in real time, and availability assessment is performed to obtain GPU resource status parameters. Based on the global GPU topology model of the computing power network, the target task requirement parameters and GPU resource status parameters are collaboratively scheduled and analyzed to determine the target GPU resource combination. Task decision-making, execution, and feedback scheduling optimization are then performed using the target GPU resource combination. This achieves the technical effects of precise GPU resource matching, improved communication efficiency, and optimized load balancing, thereby improving the task execution efficiency and resource utilization of the computing power network. Attached Figure Description

[0012] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 A schematic diagram of the global topology-aware GPU resource collaborative scheduling method for computing power networks provided in this application embodiment;

[0014] Figure 2A schematic diagram of the structure of a global topology-aware computing network GPU resource collaborative scheduling system provided in this application embodiment.

[0015] Figure labeling: Topology model construction module 10, target task requirement parameter determination module 20, resource status parameter acquisition module 30, and collaborative scheduling parsing module 40. Detailed Implementation

[0016] This application provides a globally topology-aware GPU resource collaborative scheduling method and system for computing power networks, which addresses the technical problems in existing technologies such as lack of global topology awareness in GPU scheduling, inaccurate resource matching, unbalanced load, and low computing power utilization and task execution efficiency.

[0017] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.

[0018] Example 1, as Figure 1 As shown, this application provides a globally topology-aware GPU resource collaborative scheduling method for computing power networks, the method comprising:

[0019] Step S100: Perform connection communication quantization and global topology analysis on each GPU node in the computing power network to construct a global GPU topology model of the computing power network.

[0020] Specifically, the process first detects and collects GPU connection information such as connection methods and connection parameters of each GPU node in the computing power network. Based on this information, the node topology connection is completed and a GPU node connection topology graph is constructed. Then, the topology graph is traversed to perform quantitative calculations of connection communication capabilities and generate GPU device communication sub-tables. Finally, the communication sub-tables are aggregated to the computing power network control plane to complete global topology modeling and verification optimization, and finally construct a global GPU topology model for the computing power network.

[0021] Step S200: Obtain the task to be processed through the computing power network, perform requirement analysis and quantification on the task to be processed, and determine the target task requirement parameters.

[0022] Specifically, after obtaining the task to be processed from the computing network, the task is first structured and extracted to accurately obtain key information. Then, the key information is broken down into multiple sub-tasks, and the dependencies between the sub-tasks are analyzed to determine the target task execution information. Finally, the target task execution information is analyzed, quantified, and integrated to determine the target task requirement parameters, including computing resource requirement parameters, storage resource requirement parameters, and network resource requirement parameters.

[0023] Step S300: Collect GPU resource usage status data in the computing power network in real time, perform availability assessment on the GPU resource usage status data, and obtain GPU resource status parameters.

[0024] Specifically, a GPU resource status indicator set is first constructed, which includes computing resource indicators, storage resource indicators, and network resource indicators. The availability impact of each indicator in the indicator set is evaluated, and the resource status indicators are assigned weight coefficients based on the evaluation results. Then, based on the weight coefficients, the availability level weighted evaluation and label fitting of the GPU resource status indicator set are carried out to construct a GPU availability evaluation model. Finally, the model is used to evaluate the availability of GPU resource usage status data in the real-time collected computing power network to obtain GPU resource status parameters.

[0025] Step S400: Based on the GPU global topology model of the computing power network, perform collaborative scheduling analysis on the target task requirement parameters and the GPU resource status parameters to determine the target GPU resource combination, and perform task decision execution and feedback scheduling optimization through the target GPU resource combination.

[0026] Specifically, the process begins by initially screening GPU resource status parameters according to the target task requirements to obtain a set of available GPU resources. Then, based on the global GPU topology model of the computing power network, the communication efficiency of this set is evaluated to obtain a GPU communication efficiency coefficient. Based on this coefficient, the available GPU resources are collaboratively scheduled and optimized to construct multiple candidate GPU combinations. Subsequently, a GPU resource scheduling objective function is constructed, and the GPU combination selection space after optimization and expansion of the candidate GPU combinations is globally evaluated and optimized to determine the target GPU resource combination. Finally, the target GPU resource combination is used for task decision-making, execution, and real-time monitoring to obtain task execution feedback parameters and GPU status feedback parameters. Based on these two types of feedback parameters, GPU scheduling strategies are optimized and tasks are continuously executed.

[0027] In one possible implementation, step S100 further includes:

[0028] Step S110: Detect and collect GPU connection information of each GPU node in the computing power network. The GPU connection information includes connection method and connection parameters.

[0029] Step S120: Based on the GPU connection information of each GPU node, perform node topology connection to construct a GPU node connection topology graph.

[0030] Step S130: Traverse the GPU node connection topology graph to quantify the connection communication capabilities and generate a GPU device communication sub-table.

[0031] Step S140: The GPU device communication sub-table is aggregated into the control plane of the computing power network for global topology modeling, verification and optimization, and a global topology model of the computing power network GPU is constructed.

[0032] Specifically, a full scan of all GPU nodes in the network is performed using a computing power network node detection protocol. Combined with the GPU device management interface, hardware-level and network-level connection information of each node is collected in real time. The hardware-level connection methods include onboard direct connection, PCIe bridging, and other types, as well as corresponding connection parameters such as the number of channels and transmission rate. The network-level connection methods include fiber optic direct connection, switch networking, and other types, as well as corresponding connection parameters such as bandwidth, latency, and packet loss rate. This completes the collection and structured storage of full-dimensional GPU connection information.

[0033] Based on the collected information such as the connection methods and parameters of each GPU node, a topological graph theory modeling method is adopted to map each GPU node as a vertex of the topological graph, and the GPU connection relationships between nodes and within nodes as edges of the topological graph. Parameters such as connection bandwidth and transmission rate are assigned as attribute values ​​of the edges. The association between vertices and edges is completed through graph structure visualization and data modeling tools to generate a GPU node connection topological graph with attribute annotations.

[0034] First, a unified quantization standard containing GPU connection type weights and communication performance indicators is preset. Then, a topology traversal algorithm is used to traverse the GPU node connection topology graph one by one, including inter-node GPU connection pairs and intra-node GPU connection pairs. Based on the quantization standard, the communication performance of each connection pair, such as transmission rate, bandwidth, and latency, is weighted and quantized to obtain the communication score of each connection pair and form a GPU connection pair communication score set. Next, the unique identification code of all GPU nodes in the topology graph is extracted, and the GPU identification code is associated, matched and structuredly integrated with the corresponding communication score set. Finally, a GPU device communication score table is generated with GPU nodes as the index and containing connection pair information and communication quantization scores.

[0035] Through the unified data interaction interface of the computing network control plane, the communication tables of GPU devices of each distributed node are standardized, summarized and stored in a structured manner. Based on the graph computing engine, global topology modeling is carried out on the summarized communication tables to generate an initial global topology model of the computing network GPU that integrates node attributes, connection relationships and communication quantification data. Then, the effectiveness of the initial model is verified by simulating different communication scenarios. Parameter calibration and topology optimization are performed to address model deviations. Finally, a global topology model of the computing network GPU that accurately reflects the global connectivity and communication capabilities of the computing network GPU nodes is constructed.

[0036] In one possible implementation, step S130 further includes:

[0037] Step S131: Construct a GPU connection quantization standard, which includes GPU connection type weights and communication performance metrics.

[0038] Step S132: Traverse the GPU node connection pairs and internal GPU connection pairs in the GPU node connection topology graph, and perform communication capability quantification calculation on the GPU node connection pairs and internal GPU connection pairs according to the GPU connection quantization standard to obtain the GPU connection pair communication score set.

[0039] Step S133: Extract the GPU identification code from the GPU node connection topology map, associate and combine the GPU identification code with the GPU connection communication score set to generate a GPU device communication score table.

[0040] Specifically, based on the actual communication application scenarios and scheduling requirements of GPU nodes in the computing power network, a standardized GPU connection quantification standard is constructed. This standard includes two core dimensions: GPU connection type weight and communication performance indicators. For different GPU connection types such as onboard direct connection, PCIe bridging, fiber direct connection, and switch networking, differentiated weight coefficients are assigned according to their inherent communication transmission characteristics. At the same time, core parameters that can intuitively reflect communication quality, such as transmission rate, communication bandwidth, network latency, and packet loss rate, are selected as communication performance indicators, and the quantification calculation rules and value ranges of each indicator are clearly defined.

[0041] The Depth-First Search (DFS) algorithm is used as the core topology traversal framework. First, the traversal stack is initialized, and GPU nodes connected to the root node of the topology graph are pushed onto the stack. By iteratively executing the DFS core logic of "pop-visit-adjacent node push," a full traversal and unique identification of all GPU node connections and intra-node GPU connections in the topology graph are performed, such as (node ​​A-GPU1, node B-GPU2). During the traversal, a pre-defined communication capability quantification calculation function is called. This function follows an algorithm structure of index normalization, weighted averaging, and total score calculation. The first step is to normalize the communication performance indicators of each connection pair, such as transmission rate, bandwidth, latency, and packet loss rate, to the range of 0 to 1, eliminating differences in units. The second step is to perform a weighted product operation on the normalized indicator values ​​and the corresponding GPU connection type weights, such as 0.8 for onboard direct connection and 0.6 for PCIe bridging. The third step is to sum the weighted indicator values ​​to obtain the communication capability quantification score of a single connection pair. Finally, the unique identifier of all connection pairs and their corresponding quantification scores are stored in a structured key-value pair format to form a GPU connection pair communication score set.

[0042] A unique identifier for all GPU devices in the GPU node connection topology is extracted using a topology graph metadata parsing algorithm. This identifier contains feature information such as the node's physical address and GPU hardware serial number to ensure uniqueness. Subsequently, a hash mapping algorithm framework is constructed, using the GPU identifier as the key and all connection pairs belonging to the same GPU device and their corresponding communication quantization scores as the values ​​to complete the key-value pair mapping association. Finally, following the column structure of "GPU identifier - connection pair identifier - connection type - communication performance index value - quantization score", the associated key-value pair data is structurally transformed and formatted in the row and column dimensions to generate a GPU device communication sub-table containing the communication quantization information of all GPU devices. At the same time, the integrity of the sub-table data is verified to ensure that each GPU identifier matches the corresponding communication score and that there is no missing or redundant data.

[0043] In one possible implementation, step S200 further includes:

[0044] Step S210: Perform structured identification and extraction analysis on the task to be processed to obtain key task information.

[0045] Step S220: Decompose the key information of the task into multiple sub-tasks, and perform dependency analysis on the multiple sub-tasks to determine the execution information of the target task.

[0046] Step S230: Perform requirement analysis, quantification, and integration on the target task execution information to determine the target task requirement parameters, which include computing resource requirement parameters, storage resource requirement parameters, and network resource requirement parameters.

[0047] Specifically, the system performs full-dimensional structured identification and extraction analysis on the multi-source input data of the task to be processed. First, it collects structured configuration files, semi-structured execution scripts, and unstructured descriptive texts of the task and performs format normalization processing. Then, it uses NLP entity recognition, keyword matching, and feature parsing technology to accurately extract core information such as task type, execution priority, data processing scale, parallelism requirements, and computing architecture adaptability. At the same time, it performs deduplication, verification, and information completion on the parsing results. Finally, it integrates the results to form standardized and structured key task information, providing complete and accurate basic data support for subsequent task process decomposition.

[0048] Based on standardized key task information, the modular decomposition tool in the process decomposition engine is invoked. Following the principles of minimizing functional granularity and independent execution logic, regular expression matching and semantic analysis techniques are used to decompose the overall task into multiple subtasks according to data processing stages, such as data preprocessing, model training, result output, or computational logic units. Each subtask is assigned a unique subtask ID, estimated execution time, and basic computing power requirements. Subsequently, a Directed Acyclic Graph (DAG) construction algorithm is used to map each subtask to a DAG node. This is achieved by parsing the input-output data relationships and resource call order of the subtasks. The system identifies and marks serial dependencies between subtasks, such as subtask B waiting for subtask A to complete; parallel dependencies, such as subtasks C and D being able to execute synchronously; and resource mutual exclusion dependencies, such as subtasks E and F not being able to share the same GPU. These dependencies are then converted into directed edges of a Directed Acyclic Graph (DAG) and labeled with dependency types and constraints. Finally, the completed DAG model is logically validated to remove abnormal relationships such as circular dependencies. Target task execution information, including a list of subtasks, attribute labels, a DAG dependency topology, and execution constraint rules, is generated and stored in the computing power network task database in structured JSON format.

[0049] Based on the subtask attributes and execution dependencies in the target task execution information, the requirements for computing, storage, and network resources are analyzed and quantitatively evaluated. Computing resource requirements, such as the number of computing cores, GPU memory usage, and floating-point performance, are determined according to the subtask's computation type, parallel scale, and execution duration. Storage resource requirements, such as storage space and I / O bandwidth, are determined according to the subtask's data read / write volume, intermediate data cache size, and persistent storage requirements. Network resource requirements, such as network bandwidth, communication latency threshold, and data throughput, are determined according to the data interaction volume between subtasks, transmission latency requirements, and inter-node communication frequency. Finally, the various resource requirements are aggregated, peak-calibrated, and integrated according to the subtask execution order and parallel constraints, ultimately forming the target task requirement parameters, which include computing resource requirements, storage resource requirements, and network resource requirements.

[0050] In one possible implementation, step S300 further includes:

[0051] Step S310: Construct a GPU resource status indicator set, which includes computing resource indicators, storage resource indicators, and network resource indicators.

[0052] Step S320: Assess the availability impact of each indicator in the GPU resource status indicator set, and assign weight coefficients to the resource status indicators based on the assessment results.

[0053] Step S330: Based on the resource status index weight coefficients, perform availability level weighted evaluation and label fitting on the GPU resource status index set to construct a GPU availability evaluation model.

[0054] Step S340: Use the GPU availability assessment model to assess the availability of the GPU resource usage status data to obtain GPU resource status parameters.

[0055] Specifically, based on the operating characteristics and scheduling requirements of GPU nodes in the computing network, a multi-dimensional and comprehensive set of GPU resource status indicators is constructed. This set of indicators is divided into three categories: computing resource indicators, storage resource indicators, and network resource indicators. The computing resource indicators include GPU computing power utilization, CUDA core load rate, video memory utilization, and floating-point operation capability margin, etc. The storage resource indicators include local video memory cache utilization, data read and write bandwidth, and persistent storage available capacity, etc. The network resource indicators include GPU node uplink / downlink bandwidth utilization, link communication latency, data packet loss rate, and port connection status, etc. All indicators adopt standardized definitions and unified units, forming a complete indicator system that can be directly used for subsequent availability assessment.

[0056] The Analytic Hierarchy Process (AHP) is used to assess the availability impact of various indicators in the GPU resource status index set, including computing, storage, and network. Based on the actual impact of each indicator on GPU task execution stability, communication quality, and resource availability, its importance level is determined. Then, through pairwise comparison matrix calculation and normalization, differentiated resource status index weight coefficients are assigned to different indicators to ensure that high-impact indicators correspond to higher weights and low-impact indicators correspond to lower weights, forming an objective and quantifiable index weight system.

[0057] Using a set of GPU resource status indicators as input and the obtained weight coefficients of each indicator as the basis for calculation, a multi-indicator weighted fusion algorithm is used to evaluate the availability level of GPU operation. First, the original monitoring values ​​of each indicator are normalized to eliminate differences in units. Then, the normalized results are weighted and summed with the corresponding indicator weight coefficients to obtain a comprehensive availability score. Next, the score is mapped to high availability, available, and unavailable levels according to a preset threshold, and the level label is fitted. Finally, a GPU availability evaluation model integrating indicator normalization, weighted calculation, level determination, and label output is formed.

[0058] The real-time collected GPU resource usage status data is input into the constructed GPU availability assessment model. First, the raw data of each indicator is normalized and preprocessed. Then, the data is weighted and summed according to the weight coefficients of the corresponding resource status indicators in the model to obtain the comprehensive availability score of a single GPU node. Based on the score, the availability level identifier is fitted. Finally, the GPU resource status parameters containing the unique GPU identifier, comprehensive availability score, availability level, resource status of each dimension, and anomaly marker information are output.

[0059] In one possible implementation, step S400 further includes:

[0060] Step S410: Perform preliminary screening of the GPU resource status parameters according to the target task requirement parameters to obtain a set of available GPU resources.

[0061] Step S420: Based on the GPU global topology model of the computing power network, evaluate the communication efficiency of the available GPU resource set to obtain the GPU communication efficiency coefficient.

[0062] Step S430: Based on the GPU communication efficiency coefficient, perform collaborative scheduling analysis on the available GPU resource set to determine the target GPU resource combination.

[0063] Specifically, the computing resources, storage resources, and network resources required in the target task requirements are used as the screening criteria. The GPU resource status parameters are matched and verified with the task requirements item by item. Threshold judgments are made on key indicators such as computing power performance, memory capacity, bandwidth conditions, and availability level. GPU devices that do not meet the minimum resource requirements of the task, are overloaded, or are in abnormal status are filtered out. All GPU devices that meet the conditions are uniformly collected to form a set of available GPU resources that meet the basic operating conditions of the task.

[0064] Based on the global topology model of GPU computing power network, the physical location, link connection method, topology distance and communication bandwidth of each GPU node in the available GPU resource set are analyzed. The topology path evaluation algorithm is used to quantify the communication latency, data transmission rate and link stability between nodes. The GPU communication efficiency coefficient is obtained by weighting various communication performance indicators, which is used to characterize the quality of collaborative communication between GPU nodes and node combinations.

[0065] Using GPU communication efficiency coefficient as the core optimization index, and combining task parallelism, subtask dependency and resource load balancing constraints, a collaborative scheduling optimization algorithm is used to perform global analysis and optimal combination of available GPU resources. By quantitatively comparing the communication overhead, execution efficiency and load distribution of different GPU combinations, the GPU node combination with the best communication efficiency, the highest resource adaptability and meeting the overall task execution requirements is selected, and finally the target GPU resource combination for task execution is determined.

[0066] In one possible implementation, step S430 further includes:

[0067] Step S431: Based on the GPU communication efficiency coefficient, perform cooperative scheduling optimization on the available GPU resource set to construct multiple candidate GPU combinations.

[0068] Step S432: Construct a GPU resource scheduling objective function, and use the GPU resource scheduling objective function to perform global optimization on the multiple candidate GPU combinations to determine the target GPU resource combination.

[0069] Specifically, based on the GPU communication efficiency coefficient, a collaborative scheduling optimization method combining combination generation and constraint verification is adopted. The available GPU resource set is traversed, and GPU nodes with high communication efficiency coefficient, short topological distance and sufficient link bandwidth are combined and generated according to task parallelism, subtask dependency and node load balancing requirements. At the same time, each candidate combination is verified for constraints such as resource conflict, communication bottleneck and execution compatibility, and combinations that do not meet the scheduling requirements are eliminated. Finally, multiple compliant, efficient and feasible candidate GPU combinations are constructed to provide stable and reliable alternatives for subsequent global optimization.

[0070] With the optimization objectives of minimizing communication overhead, maximizing resource utilization, and optimizing load balancing, a multi-objective weighted GPU resource scheduling objective function is constructed, which includes GPU communication efficiency coefficient, computing power matching degree, node load variance, and topology link cost. The real-time state parameters of each candidate GPU combination are substituted into the function for quantitative calculation and fitness scoring. A global optimization algorithm is used to traverse all candidate combinations and compare the optimal value of the objective function. The candidate combination with the highest score and the best overall scheduling efficiency is selected as the final target GPU resource combination.

[0071] In one possible implementation, step S432 further includes:

[0072] Step S4321: Optimize and expand the multiple candidate GPU combinations to construct a GPU combination selection space.

[0073] Step S4322: Use the GPU resource scheduling objective function to perform a global evaluation and optimization of the GPU combination selection space to determine the target GPU resource combination.

[0074] Specifically, a neighborhood search and node perturbation strategy is adopted to optimize and expand multiple candidate GPU combinations. By locally replacing, fine-tuning the number of GPU nodes in the candidate combinations, and reconstructing the topology, an expanded combination with better communication performance and more reasonable resource allocation is generated. Redundant, conflicting, and invalid combinations that do not meet task constraints are eliminated. The original candidate combinations and the optimized and expanded effective combinations are integrated to construct a GPU combination selection space that covers multiple scheduling schemes and has global representativeness.

[0075] A GPU resource scheduling objective function is constructed with the core optimization goals of minimizing communication overhead, maximizing resource utilization, and optimizing load balancing. This function is constructed by assigning differentiated weights to key indicators such as communication efficiency coefficient, computing power matching degree, and node load variance and then summing them up. At the same time, a constraint term is introduced to limit the resource usage boundary. Then, all candidate combination parameters in the GPU combination selection space are substituted into the objective function, and a genetic algorithm is used for global traversal and iterative optimization to quantitatively calculate the objective function value of each combination. By comparing the comprehensive performance of different combinations in terms of communication overhead, resource utilization, and load balancing, the combination with the optimal objective function value is selected, and finally, the target GPU resource combination that adapts to the task execution requirements is determined.

[0076] In one possible implementation, step S400 further includes:

[0077] Step S440: Perform task decision-making, execution, and monitoring through the target GPU resource combination to obtain task execution feedback parameters and GPU status feedback parameters.

[0078] Step S450: Optimize the GPU scheduling strategy and continue task execution based on the task execution feedback parameters and GPU status feedback parameters.

[0079] Specifically, the task to be executed is sent to the corresponding GPU node in the target GPU resource combination, the task computation and collaborative execution are started, and the task running process and GPU node status are collected throughout the process through the real-time monitoring module. Task execution feedback parameters such as task execution progress, execution latency, computation accuracy, and task completion rate are obtained, as well as GPU status feedback parameters such as GPU computing power utilization, memory usage, communication bandwidth, node load, and operating temperature.

[0080] A closed-loop adaptive iterative algorithm is adopted to analyze the task execution feedback parameters and GPU status feedback parameters in real time, identify communication bottlenecks, load imbalances and resource waste in the current scheduling scheme, dynamically adjust the weight coefficients, node selection thresholds and combination optimization rules in the GPU resource scheduling objective function, complete the adaptive optimization of the GPU scheduling strategy, and apply the optimized strategy to task allocation and node collaboration in real time. Under the premise of ensuring the stable operation of GPU nodes, the task execution is continuously promoted, and the closed-loop optimization of resource scheduling and task operation is achieved.

[0081] Example 2, based on the same inventive concept as the global topology-aware computing network GPU resource collaborative scheduling method in the foregoing examples, such as... Figure 2 As shown, this application provides a globally topology-aware GPU resource collaborative scheduling system for computing power networks. The system and method embodiments in this application are based on the same inventive concept. The system includes:

[0082] The topology model construction module 10 is used to perform connection communication quantization and global topology analysis on each GPU node in the computing power network, and to construct a global topology model of the computing power network GPU.

[0083] The target task requirement parameter determination module 20 is used to obtain the task to be processed through the computing power network, perform requirement analysis and quantification on the task to be processed, and determine the target task requirement parameters.

[0084] The resource status parameter acquisition module 30 is used to collect GPU resource usage status data in the computing power network in real time, perform availability assessment on the GPU resource usage status data, and obtain GPU resource status parameters.

[0085] The collaborative scheduling parsing module 40 is used to perform collaborative scheduling parsing on the target task requirement parameters and the GPU resource status parameters based on the GPU global topology model of the computing power network, determine the target GPU resource combination, and perform task decision execution and feedback scheduling optimization through the target GPU resource combination.

[0086] Furthermore, the system is also used to implement the following functions:

[0087] The system detects and collects GPU connection information for each GPU node in the computing power network, including connection methods and connection parameters; it then performs node topology connections based on the GPU connection information of each GPU node to construct a GPU node connection topology graph; it traverses the GPU node connection topology graph to quantify connection communication capabilities and generate a GPU device communication sub-table; finally, it aggregates the GPU device communication sub-table to the control plane of the computing power network for global topology modeling, verification, and optimization to construct a global GPU topology model for the computing power network.

[0088] Furthermore, the system is also used to implement the following functions:

[0089] A GPU connection quantization standard is constructed, which includes GPU connection type weights and communication performance indicators. The GPU node connection pairs and intra-node GPU connection pairs in the GPU node connection topology are traversed, and their communication capabilities are quantified according to the GPU connection quantization standard to obtain a GPU connection pair communication score set. The GPU identification code of the GPU node connection topology is extracted, and the GPU identification code is associated and combined with the GPU connection pair communication score set to generate a GPU device communication score table.

[0090] Furthermore, the system is also used to implement the following functions:

[0091] The task to be processed is structured, identified, extracted, and parsed to obtain key task information; the key task information is then broken down into multiple sub-tasks, and dependency analysis is performed on these sub-tasks to determine the target task execution information; the target task execution information is then parsed, quantified, and integrated to determine the target task requirement parameters, which include computing resource requirement parameters, storage resource requirement parameters, and network resource requirement parameters.

[0092] Furthermore, the system is also used to implement the following functions:

[0093] A GPU resource status indicator set is constructed, including computing resource indicators, storage resource indicators, and network resource indicators. The availability impact of each indicator in the GPU resource status indicator set is assessed, and weight coefficients are assigned to the resource status indicators based on the assessment results. Based on the weight coefficients, a availability level weighted assessment and label fitting are performed on the GPU resource status indicator set to construct a GPU availability assessment model. The GPU availability assessment model is then used to assess the availability of the GPU resource usage status data to obtain GPU resource status parameters.

[0094] Furthermore, the system is also used to implement the following functions:

[0095] The GPU resource status parameters are initially screened according to the target task requirements parameters to obtain a set of available GPU resources; the communication efficiency of the available GPU resource set is evaluated based on the global GPU topology model of the computing power network to obtain a GPU communication efficiency coefficient; and the available GPU resource set is collaboratively scheduled and analyzed based on the GPU communication efficiency coefficient to determine the target GPU resource combination.

[0096] Furthermore, the system is also used to implement the following functions:

[0097] Based on the GPU communication efficiency coefficient, the available GPU resources are optimized through collaborative scheduling to construct multiple candidate GPU combinations; a GPU resource scheduling objective function is constructed, and the GPU resource scheduling objective function is used to perform global optimization on the multiple candidate GPU combinations to determine the target GPU resource combination.

[0098] Furthermore, the system is also used to implement the following functions:

[0099] The multiple candidate GPU combinations are optimized and expanded to construct a GPU combination selection space; the GPU resource scheduling objective function is used to perform a global evaluation and optimization of the GPU combination selection space to determine the target GPU resource combination.

[0100] Furthermore, the system is also used to implement the following functions:

[0101] Task decision-making and execution are performed and monitored using the target GPU resource combination to obtain task execution feedback parameters and GPU status feedback parameters; GPU scheduling strategy optimization and continuous task execution are then performed based on the task execution feedback parameters and GPU status feedback parameters.

[0102] It should be noted that the order of the embodiments described above is for descriptive purposes only and does not represent the superiority or inferiority of the embodiments. Specific embodiments of this specification have been described above. Furthermore, the processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired results. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0103] The above description is only a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

[0104] This specification and accompanying drawings are merely illustrative examples of this application and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of this application. Clearly, those skilled in the art can make various alterations and modifications to this application without departing from its scope. Therefore, if such modifications and variations fall within the scope of this application and its equivalents, this application intends to include such modifications and variations.

Claims

1. A globally topology-aware GPU resource collaborative scheduling method for computing power networks, characterized in that, The method includes: Perform connection communication quantization and global topology analysis on each GPU node in the computing power network, and construct a global GPU topology model of the computing power network. The computing power network is used to obtain tasks to be processed, and the requirements of the tasks to be processed are analyzed and quantified to determine the target task requirement parameters. Real-time data on GPU resource usage status in the computing network is collected, and availability is assessed to obtain GPU resource status parameters. Based on the global topology model of the computing power network GPU, the target task requirement parameters and the GPU resource status parameters are analyzed and coordinated to determine the target GPU resource combination, and the task decision execution and feedback scheduling optimization are performed through the target GPU resource combination.

2. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 1, characterized in that, Construct a global topology model for the computing power network using GPUs, including: The GPU connection information of each GPU node in the computing power network is detected and collected, and the GPU connection information includes connection method and connection parameters; Based on the GPU connection information of each GPU node, node topology connections are made to construct a GPU node connection topology graph; The connection communication capabilities of the GPU nodes are quantified by traversing the GPU node connection topology graph, and a GPU device communication partition table is generated. The GPU device communication tables are aggregated into the control plane of the computing power network for global topology modeling, verification and optimization, thus constructing a global GPU topology model for the computing power network.

3. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 2, characterized in that, Generate GPU device communication tables, including: A GPU connectivity quantization standard is constructed, which includes GPU connectivity type weights and communication performance metrics. Traverse the GPU node connection pairs and internal GPU connection pairs in the GPU node connection topology graph, and perform communication capability quantification calculation on the GPU node connection pairs and internal GPU connection pairs according to the GPU connection quantification standard to obtain the GPU connection pair communication score set. The GPU identification code of the GPU node connection topology is extracted, and the GPU identification code is associated and combined with the GPU connection communication score set to generate a GPU device communication score table.

4. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 1, characterized in that, Determine the target task requirements parameters, including: The task to be processed is structured, identified, extracted, and parsed to obtain key task information; The key information of the task is broken down into multiple sub-tasks, and the dependency relationship of the multiple sub-tasks is analyzed to determine the execution information of the target task. The target task execution information is analyzed, quantified, and integrated to determine the target task requirement parameters, which include computing resource requirement parameters, storage resource requirement parameters, and network resource requirement parameters.

5. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 1, characterized in that, Obtain GPU resource status parameters, including: Construct a GPU resource status indicator set, which includes computing resource indicators, storage resource indicators, and network resource indicators; The availability impact of each indicator in the GPU resource status indicator set is evaluated, and the weight coefficients of the resource status indicators are assigned according to the evaluation results. Based on the weight coefficients of the resource status indicators, the availability level weighted evaluation and label fitting of the GPU resource status indicator set are performed to construct a GPU availability evaluation model. The GPU availability assessment model is used to assess the availability of the GPU resource usage status data to obtain GPU resource status parameters.

6. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 1, characterized in that, Determine the target GPU resource combination, including: The GPU resource status parameters are initially filtered according to the target task requirement parameters to obtain a set of available GPU resources; Based on the global topology model of the computing power network GPU, the communication efficiency of the available GPU resource set is evaluated to obtain the GPU communication efficiency coefficient. Based on the GPU communication efficiency coefficient, the available GPU resource set is collaboratively scheduled and analyzed to determine the target GPU resource combination.

7. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 6, characterized in that, Based on the GPU communication efficiency coefficient, the available GPU resource set is collaboratively scheduled and analyzed to determine the target GPU resource combination, including: Based on the GPU communication efficiency coefficient, the available GPU resource set is optimized through collaborative scheduling to construct multiple candidate GPU combinations; A GPU resource scheduling objective function is constructed, and the GPU resource scheduling objective function is used to perform global optimization on the multiple candidate GPU combinations to determine the target GPU resource combination.

8. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 7, characterized in that, The target GPU resource combination is determined by globally optimizing the multiple candidate GPU combinations using the GPU resource scheduling objective function, including: The multiple candidate GPU combinations are optimized and expanded to construct a GPU combination selection space; The GPU resource scheduling objective function is used to perform a global evaluation and optimization of the GPU combination selection space to determine the target GPU resource combination.

9. The globally topology-aware GPU resource collaborative scheduling method for computing power networks as described in claim 1, characterized in that, Optimizing task decision-making, execution, and feedback scheduling through the target GPU resource combination includes: The target GPU resource combination is used to perform task decision-making, execution, and monitoring to obtain task execution feedback parameters and GPU status feedback parameters. GPU scheduling strategy optimization and continuous task execution are performed based on the task execution feedback parameters and GPU status feedback parameters.

10. A globally topology-aware GPU resource collaborative scheduling system for computing power networks, characterized in that, The system is used to implement the globally topology-aware GPU resource collaborative scheduling method for computing power networks according to any one of claims 1-9, and the system comprises: The topology model building module is used to perform connection communication quantization and global topology analysis on each GPU node in the computing power network, and to build a global GPU topology model of the computing power network. The target task requirement parameter determination module is used to obtain the task to be processed through the computing power network, perform requirement analysis and quantification on the task to be processed, and determine the target task requirement parameters. The resource status parameter acquisition module is used to collect GPU resource usage status data in the computing power network in real time, perform availability assessment on the GPU resource usage status data, and obtain GPU resource status parameters. The collaborative scheduling parsing module is used to perform collaborative scheduling parsing on the target task requirement parameters and the GPU resource status parameters based on the GPU global topology model of the computing power network, determine the target GPU resource combination, and perform task decision execution and feedback scheduling optimization through the target GPU resource combination.