Distributed task scheduling method and device based on multi-tenant cluster, equipment and medium

By identifying task bottleneck types based on network status and performance metrics in a multi-tenant cluster and designing a scheduling scheme, the problem of low resource utilization in existing technologies is solved, and a dynamic balance between cluster resources and task efficiency is achieved.

CN122247984APending Publication Date: 2026-06-19PENG CHENG LAB

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PENG CHENG LAB
Filing Date
2026-03-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, AI workloads deployed based on cloud-native containerization use a preset resource quota mode, which is difficult to adapt to dynamically changing cluster loads, resulting in low cluster resource utilization.

Method used

Based on the network status of the multi-tenant cluster and the performance indicators of distributed tasks, the bottleneck types of tasks are determined. Candidate scheduling schemes are designed using bandwidth matrices and network interference diagrams. Finally, the scheduling scheme is determined by estimating the speedup ratio and resource utilization changes to achieve scaling up or down scheduling.

Benefits of technology

In complex multi-tenant environments, it adapts to dynamically changing cluster loads and collaboratively achieves a dynamic balance between cluster resource utilization and task execution efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247984A_ABST
    Figure CN122247984A_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, device, and medium for distributed task scheduling based on a multi-tenant cluster, relating to the field of task scheduling. The method includes: determining the bandwidth matrix and network interference diagram between cluster nodes based on the network status of the multi-tenant cluster; determining the task bottleneck type of each distributed task based on the performance indicators of each distributed task in the multi-tenant cluster; determining candidate scheduling schemes for each distributed task using the task bottleneck type and performance indicators, and based on the bandwidth matrix and network interference diagram, and estimating the expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster after executing the candidate scheduling schemes; determining the target comprehensive score corresponding to the candidate scheduling schemes based on the expected speedup ratio and the change in resource utilization; and determining the final target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task based on the comprehensive score, so as to expand or shrink the task replicas of the corresponding distributed tasks during scheduling.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of task scheduling, and in particular to a distributed task scheduling method, apparatus, device, and medium based on a multi-tenant cluster. Background Technology

[0002] With the evolution of cloud computing technology, cloud-native computing paradigms, represented by containerization, microservices, and Kubernetes (K8s, an open-source container orchestration system), have become the mainstream of enterprise IT (Information Technology) infrastructure. At the same time, the rapid development of artificial intelligence technologies such as large language models and computer vision has led to an exponential increase in the scale of AI (Artificial Intelligence) training and inference tasks. To improve resource delivery efficiency and management agility, more and more enterprises and research institutions are choosing to deploy AI workloads in cloud-native environments, especially in complex cluster environments with multi-tenant shared data centers and edge cloud collaboration.

[0003] Currently, AI workloads deployed on cloud-native containerization typically run using a pre-defined resource quota model. In this model, users must statically declare the required number of CPUs (Central Processing Units), memory, and GPUs (Graphics Processing Units) when submitting a task. However, this static binding method struggles to adapt to dynamically changing cluster loads, resulting in low cluster resource utilization. Summary of the Invention

[0004] In view of this, the purpose of this invention is to provide a distributed task scheduling method, apparatus, device, and medium based on a multi-tenant cluster, which can simultaneously consider the network status of the multi-tenant cluster and the performance indicators of the distributed tasks, thereby adapting to dynamically changing cluster loads in complex multi-tenant environments and collaboratively achieving a dynamic balance between cluster resource utilization and task execution efficiency. The specific solution is as follows: Firstly, this application provides a distributed task scheduling method based on a multi-tenant cluster, including: Based on the network status of the multi-tenant cluster, the bandwidth matrix and network interference graph between each cluster node are determined, and based on the performance indicators of each distributed task in the multi-tenant cluster, the task bottleneck type of each distributed task is determined. Using the task bottleneck type and the performance indicators, and based on the bandwidth matrix and the network interference graph, a candidate scheduling scheme is determined for each distributed task, and the expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster are estimated after executing the candidate scheduling scheme. Based on the expected speedup ratio and the change in resource utilization, the target comprehensive score corresponding to the candidate scheduling scheme is determined; Based on the comprehensive score, the target scheduling scheme to be executed is determined from the scheduling schemes corresponding to each distributed task, and the target scheduling scheme is used to expand or shrink the task replicas of the corresponding distributed tasks.

[0005] Optionally, based on the network status of the multi-tenant cluster, determine the network interference graph between cluster nodes, including: Based on the network status of the multi-tenant cluster, determine the latency matrix and port bandwidth utilization between each cluster node; Based on the location information of each cluster node, construct the adjacency relationship of each cluster node; Using the location information of each cluster node, and based on the delay matrix and port bandwidth utilization between each cluster node, the amount of network interference between each cluster node is determined; By utilizing the adjacency relationships of each cluster node and based on the amount of network interference between each cluster node, a network interference graph between each cluster node is determined.

[0006] Optionally, determining the bottleneck type of each distributed task based on the performance metrics of each distributed task in the multi-tenant cluster includes: Collect performance metrics for each distributed task in the multi-tenant cluster; the performance metrics include the total execution time per execution and the communication time during the execution process. The communication percentage is determined based on the communication duration and the total duration of a single execution. Based on the historical execution data of each distributed task, determine the ideal total execution time, and determine the time difference between the single execution total time and the ideal total execution time; If the communication ratio is greater than a preset ratio threshold and the time difference is greater than a preset difference, then the network bottleneck is determined as the task bottleneck type of each distributed task; otherwise, the performance bottleneck is determined as the task bottleneck type of each distributed task.

[0007] Optionally, using the task bottleneck type and the performance metric, and based on the bandwidth matrix and the network interference graph, a candidate scheduling scheme is determined for each distributed task, and the expected speedup ratio of each distributed task after executing the candidate scheduling scheme is estimated, including: If the task bottleneck type is a network bottleneck, then based on the bandwidth matrix and the network interference diagram, a target expansion node that meets the preset node affinity condition is determined from the idle cluster nodes, and an expansion scheme is determined based on the target expansion node to obtain a candidate scheduling scheme for each distributed task. If the task bottleneck type is a performance bottleneck, then the target edge replica with the longest communication path is determined from the task replicas of each distributed task, and a scaling-down scheme is determined based on the target edge replica to obtain a candidate scheduling scheme for each distributed task. After obtaining candidate scheduling schemes for each distributed task, the expected total execution time of each distributed task after executing the candidate scheduling scheme is estimated based on network sensitivity; wherein, the network sensitivity is a sensitivity obtained by fitting the total execution time of a single execution and the bandwidth matrix. The expected speedup ratio of each distributed task is determined based on the ratio of the total execution time of a single execution to the expected total execution time.

[0008] Optionally, the step of scaling up or down the task replicas of the corresponding distributed task using the target scheduling scheme includes: If the target scheduling scheme is an expansion scheme, then the task replicas of the corresponding distributed tasks are expanded using the target scheduling scheme to obtain new replicas, and the new replicas are placed in the corresponding expansion nodes based on the target scheduling scheme. If the target scheduling scheme is a scaling-down scheme, then the edge replicas to be scaled down are determined based on the target scheduling scheme, the edge replicas to be scaled down are removed, and the cluster nodes where the edge replicas to be scaled down are located are marked as idle.

[0009] Optionally, determining the target comprehensive score corresponding to the candidate scheduling scheme based on the expected speedup ratio and the change in resource utilization includes: Based on the network interference graph, the change in network interference of the multi-tenant cluster after executing the candidate scheduling scheme is estimated. The expected speedup ratio, the change in resource utilization, and the change in network interference are weighted and calculated to obtain the target comprehensive score corresponding to the candidate scheduling scheme.

[0010] Optionally, determining the final target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task based on a comprehensive score includes: The scheduling schemes corresponding to each distributed task are sorted according to the comprehensive score from high to low to obtain the sorted scheduling schemes, and the first scheduling scheme in the sorted scheduling schemes is determined as the current scheduling scheme. Perform a preset judgment operation to determine whether there is a resource conflict between the current scheduling scheme and the scheduling schemes in the current result set, and whether the total required resources do not exceed the resource limit of the multi-tenant cluster; the current result set is initially an empty set; the total required resources are the total resources required by the current scheduling scheme and the scheduling schemes in the current result set. If there are no resource conflicts and the resource limit of the multi-tenant cluster is not exceeded, the current scheduling scheme is added to the current result set, and the next scheduling scheme in the sorted scheduling scheme is determined as the new current scheduling scheme, and the process jumps back to the step of performing the preset judgment operation. If there is a resource conflict or the resource limit of the multi-tenant cluster is exceeded, the current scheduling scheme is discarded, and the next scheduling scheme in the sorted scheduling scheme is determined as the new current scheduling scheme, and the process jumps back to the step of performing the preset judgment operation. If the current scheduling scheme is the last scheduling scheme in the sorted scheduling schemes, then the final target scheduling scheme to be executed is determined based on the latest current result set.

[0011] Secondly, this application provides a distributed task scheduling device based on a multi-tenant cluster, comprising: The task bottleneck determination module is used to determine the bandwidth matrix and network interference graph between cluster nodes based on the network status of the multi-tenant cluster, and to determine the task bottleneck type of each distributed task based on the performance indicators of each distributed task in the multi-tenant cluster. The scheduling scheme determination module is used to determine a candidate scheduling scheme for each distributed task by utilizing the task bottleneck type and the performance indicators, and based on the bandwidth matrix and the network interference diagram, and to estimate the expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster after executing the candidate scheduling scheme. The comprehensive score determination module is used to determine the target comprehensive score corresponding to the candidate scheduling scheme based on the expected speedup ratio and the change in resource utilization. The task replica scheduling module is used to determine the target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task based on the comprehensive score, and to use the target scheduling scheme to expand or shrink the task replicas of the corresponding distributed tasks.

[0012] Thirdly, this application provides an electronic device, comprising: Memory, used to store computer programs; A processor is used to execute the computer program to implement the aforementioned distributed task scheduling method based on a multi-tenant cluster.

[0013] Fourthly, this application provides a computer-readable storage medium for storing a computer program, which, when executed by a processor, implements the aforementioned distributed task scheduling method based on a multi-tenant cluster.

[0014] In this application, based on the network status of a multi-tenant cluster, a bandwidth matrix and network interference graph are determined among the cluster nodes. Based on the performance metrics of each distributed task in the multi-tenant cluster, the bottleneck type of each distributed task is determined. Using the bottleneck type and the performance metrics, and based on the bandwidth matrix and the network interference graph, candidate scheduling schemes are determined for each distributed task. The expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster after executing the candidate scheduling schemes are estimated. Based on the expected speedup ratio and the change in resource utilization, a target comprehensive score corresponding to the candidate scheduling scheme is determined. Based on the comprehensive score, the final target scheduling scheme to be executed is determined from the scheduling schemes corresponding to each distributed task. The target scheduling scheme is then used to scale up or down the task replicas of the corresponding distributed tasks.

[0015] Therefore, this application not only emphasizes the crucial role of network resources in the distributed task load of a multi-tenant cluster environment, but also designs performance indicators for distributed tasks to accurately reflect the bottleneck type of distributed tasks. By simultaneously considering the network status of the multi-tenant cluster and the performance indicators of distributed tasks, it can determine whether the distributed task is a network bottleneck or a performance computing bottleneck, thereby designing candidate scheduling schemes suitable for each distributed task. Subsequently, while enriching the scaling decision dimensions based on network resources and performance computing resources, it comprehensively determines whether the scheduling scheme should be executed by considering the impact of candidate scheduling schemes on the execution efficiency of each distributed task and on the resource utilization of the multi-tenant cluster. Thus, in a complex multi-tenant environment, it adapts to dynamically changing cluster loads and collaboratively achieves a dynamic balance between cluster resource utilization and task execution efficiency. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0017] Figure 1 A flowchart of a distributed task scheduling method based on a multi-tenant cluster is provided for embodiments of this application; Figure 2 A scheduling scheme decision-making flowchart is provided for an embodiment of this application; Figure 3 A pre-defined scheduling system architecture diagram provided in this application embodiment; Figure 4 A specific flowchart of distributed task scheduling based on a multi-tenant cluster is provided for an embodiment of this application; Figure 5 A structural diagram of related components of a scheduling system provided in an embodiment of this application; Figure 6 A schematic diagram of a distributed task scheduling device based on a multi-tenant cluster provided in this application embodiment; Figure 7 This is a structural diagram of an electronic device provided in an embodiment of this application. Detailed Implementation

[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0019] Currently, AI workloads deployed on cloud-native containerization typically operate using a pre-defined resource quota model. In this model, users must statically declare the required number of CPUs (Central Processing Units), memory, and GPUs (Graphics Processing Units) when submitting a task. However, this static binding method struggles to adapt to dynamically changing cluster loads, resulting in low cluster resource utilization. To address this, this application provides a distributed task scheduling method based on multi-tenant clusters. This method simultaneously considers the network status of the multi-tenant cluster and the performance metrics of distributed tasks, thereby adapting to dynamically changing cluster loads in complex multi-tenant environments and collaboratively achieving a dynamic balance between cluster resource utilization and task execution efficiency.

[0020] See Figure 1 As shown in the figure, this invention discloses a distributed task scheduling method based on a multi-tenant cluster, including: Step S11: Based on the network status of the multi-tenant cluster, determine the bandwidth matrix and network interference diagram between each cluster node, and based on the performance indicators of each distributed task in the multi-tenant cluster, determine the task bottleneck type of each distributed task.

[0021] The method in this application embodiment is applied to a preset scheduling system, which mainly includes three functional modules: a network indicator collector, a task performance profiler, and an elastic decision-maker. The network indicator collector is responsible for collecting and maintaining node-level network information in real time from the underlying network infrastructure and operating environment of the cluster. The task performance profiler is responsible for collecting and analyzing task-level performance information in real time from the runtime environment of distributed tasks. The elastic decision-maker receives multi-dimensional input data from the network indicator collector and the task performance profiler, and generates network-aware elastic scheduling decisions for each distributed task.

[0022] Specifically, the network metrics collector determines the bandwidth matrix and network interference graph between cluster nodes based on the network status of the multi-tenant cluster; the task performance analyzer determines the task bottleneck type of each distributed task based on the performance metrics of each distributed task in the multi-tenant cluster.

[0023] According to one example, determining the network interference graph between cluster nodes based on the network state of a multi-tenant cluster can specifically include: determining the delay matrix and port bandwidth utilization between cluster nodes based on the network state of the multi-tenant cluster; constructing the adjacency relationship between cluster nodes based on the location information of each cluster node; determining the amount of network interference between cluster nodes using the location information of each cluster node and based on the delay matrix and port bandwidth utilization between each cluster node; and determining the network interference graph between cluster nodes using the adjacency relationship between each cluster node and based on the amount of network interference between each cluster node.

[0024] Specifically, this application embodiment takes a Kubernetes cluster with 8 nodes as an example to demonstrate how the network metrics collector generates network snapshots (including bandwidth matrices, latency matrices, and port bandwidth utilization among the cluster nodes) and network interference maps, specifically including the following steps: 1. Active Inter-Node Probing: The DaemonSet monitoring agent deployed on each node periodically performs ping and iperf3 (network probing) to collect bandwidth and latency data between nodes, which is used to construct a bandwidth matrix B between cluster nodes. 8×8 With delay matrix L 8×8 .

[0025] 2. Passive Traffic Statistics: The eBPF (extended Berkeley Packet Filter) program captures bidirectional traffic on the veth (Virtual Ethernet, a pair of virtual network interfaces) interfaces and calculates the port bandwidth utilization u between cluster nodes within a sliding time window. ij = Bidirectional traffic / Historical peak bandwidth, where i and j represent cluster nodes.

[0026] 3. Topology label parsing: Obtain the node labels of the cluster nodes: rack (indicating the rack where the cluster node is located) and zone (indicating the area where the cluster node is located). Based on the location information reflected by the node labels of the cluster nodes, construct the adjacency relationship of each cluster node.

[0027] 4. Network interference graph construction algorithm: If the node label rack of cluster node i is equal to the node label rack of cluster node j (meaning cluster nodes i and j are located in the same rack), and the port bandwidth utilization u[i][j] between cluster nodes i and j is greater than a preset threshold, such as 0.7, then according to... This formula calculates the network interference w between cluster node i and cluster node j. Here, L[i][j] represents the network latency between cluster node i and cluster node j.

[0028] If the node label zone of cluster node i is not equal to the node label zone of cluster node j, meaning cluster node i and cluster node j are not located in the same region (e.g., the same rack, the same data center, the same area), and the normalized value of the network latency L[i][j] between cluster node i and cluster node j is greater than 1.0, then according to... This formula calculates the network interference w between cluster node i and cluster node j.

[0029] Furthermore, after calculating the network interference w between cluster node i and cluster node j, it is necessary to take the minimum value of w and 1.0, i.e., min(w,1.0), as the final network interference between cluster node i and cluster node j.

[0030] Finally, the network interference graph between cluster nodes is determined by utilizing the adjacency relationships of each cluster node and the amount of network interference between each cluster node; that is, the relationship edges between each cluster node in the network interference graph are determined based on the adjacency relationships of each cluster node, and the relationship weights of the relationship edges between each cluster node in the network interference graph are determined based on the amount of network interference between each cluster node.

[0031] It should be noted that the network snapshots (including the bandwidth matrix, latency matrix, and port bandwidth utilization between each cluster node) are stored in the time series database in JSON (JavaScript Object Notation, a lightweight data exchange format) format. The network interference graph is also serialized and stored in the time series database for the elastic decision-maker to query.

[0032] In addition, the network metrics collector also features an adaptive degradation mechanism. Specifically, considering the overhead of network measurement, a lightweight network measurement mechanism can be switched to in clusters with low performance or high cluster load. This mechanism acquires necessary network state information through a combination of active latency probing and passive bandwidth inference. Specifically, the scheduling system primarily focuses on latency and bandwidth as network state metrics. It retains latency measurement probes, while bandwidth, as a time-sensitive metric, is inferred from the recent iteration durations of the distributed task (i.e., the distributed training task). When the number of replicas in the distributed task is constant, the amount of data transmitted per iteration is also constant. The estimated or relative value of network bandwidth is obtained by calculating the amount of data transmitted per unit time.

[0033] According to one example, determining the bottleneck type of each distributed task based on its performance metrics in a multi-tenant cluster can specifically include: collecting performance metrics for each distributed task in the multi-tenant cluster; performance metrics include the total execution time and communication time during a single execution; determining the communication ratio based on the communication time and the total execution time; determining the ideal total execution time based on the historical execution data of each distributed task, and determining the time difference between the total execution time and the ideal total execution time; if the communication ratio is greater than a preset ratio threshold and the time difference is greater than a preset difference, then the network bottleneck is determined as the bottleneck type of each distributed task; otherwise, the performance bottleneck is determined as the bottleneck type of each distributed task.

[0034] Specifically, this application uses a distributed task (4 Pods, using RingAll-Reduce communication mode) running in a Kubernetes cluster as an example to illustrate in detail how the task performance profiler determines whether the current task bottleneck is a performance bottleneck or a network bottleneck by collecting performance metrics of the distributed task, building a performance model, calculating the communication ratio and residual (i.e., time difference), and combining network sensitivity. The specific steps include: 1. Performance metrics of distributed tasks were collected (by injecting Hooks into PyTorch, an open-source deep learning framework for machine learning and deep learning). The collected performance metrics are shown in Table 1 below: Table 1 Performance Indicators

[0035] In the case of distributed tasks, such as image classification and object detection, the total execution time of a single execution is equal to the total iteration time of a single step, the pure computation time in a single execution is equal to the pure computation time in a single iteration, the communication time in a single execution is equal to the communication time in a single iteration, and the amount of data in a single communication is equal to the amount of data in a single communication.

[0036] 2. Calculation of communication ratio: In one approach, adopt This formula calculates the proportion of communication.

[0037] In another approach, stability is improved by using the sliding window mean (e.g., the mean of the last 5 iterations). Based on the historical sliding window W, the average communication ratio is calculated to eliminate metric fluctuations in a single iteration. The calculation formula is as follows: .

[0038] 3. Performance model prediction and residual (i.e., time difference) calculation: Based on historical execution data and micro-perturbation experiments of distributed tasks, a baseline model dependent on network and computing resources is constructed: =f(G GPU M mem B bw ,L latency ), where G GPU M mem This refers to computing resources, specifically GPU resources and memory resources, B bw L latency This refers to network resources, specifically bandwidth resources and latency resources. This represents the ideal total execution time.

[0039] The formula for calculating residuals (i.e., time differences) is: .

[0040] 4. Calculation of network sensitivity coefficient: Through online micro-perturbation experiments (briefly limiting the Pod network egress by 10%, observing T...), iter (change), fitting , of which S net Indicating network sensitivity, B bw Bandwidth resources.

[0041] If S net The result is -0.60s / Gbps, which means that for every 1Gbps increase in bandwidth, the total execution time (i.e., the total time of a single iteration) decreases by 0.6 seconds.

[0042] 5. Bottleneck Diagnosis Algorithm: If the communication ratio exceeds a preset threshold and the time difference exceeds a preset difference, then the network bottleneck is identified as the task bottleneck type for this distributed task, and it is marked. , "Suspected network bandwidth limitation." The theoretically required bandwidth increment can also be calculated based on time difference and network sensitivity. And, based on the amount of data in a single communication and the communication duration during a single execution, estimate the current bandwidth requirement of the task: .

[0043] If the communication percentage is not greater than a preset percentage threshold or the time difference is not greater than a preset difference, then the performance bottleneck is identified as the task bottleneck type for each distributed task, and is then marked. , "Computational performance is limited or the model has not converged", and set as well as .

[0044] 6. Result Encapsulation and Reporting: Encapsulate the results of determining the bottleneck type of the distributed task into a task performance report. And write it into the time series database.

[0045] In addition, the task performance profiler also takes into account monitoring overhead and can switch to a lightweight sampling mechanism when the distributed task load is high, by reducing the Hook injection frequency or enabling statistical sampling to obtain the necessary performance status information.

[0046] Step S12: Using the task bottleneck type and the performance indicators, and based on the bandwidth matrix and the network interference diagram, determine the candidate scheduling scheme for each distributed task, and estimate the expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster after executing the candidate scheduling scheme.

[0047] Step S13: Determine the target comprehensive score corresponding to the candidate scheduling scheme based on the expected speedup ratio and the change in resource utilization.

[0048] Step S14: Based on the comprehensive score, determine the target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task, and use the target scheduling scheme to expand or shrink the task replicas of the corresponding distributed tasks.

[0049] In this embodiment, the elastic decision-maker periodically or event-drivenly pulls task performance reports (including residuals, communication ratios, and network sensitivity), network snapshots (including bandwidth matrices, latency matrices, and port bandwidth utilization between cluster nodes), and network interference maps from a time-series database. Then, using task bottleneck types and performance metrics, and based on the bandwidth matrix and network interference map, candidate scheduling schemes are determined for each distributed task. The expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster are estimated after executing the candidate scheduling schemes. Simultaneously, based on the network interference map, the change in network interference of the multi-tenant cluster after executing the candidate scheduling schemes is estimated. A weighted calculation is performed on the expected speedup ratio, the change in resource utilization, and the change in network interference to obtain the target comprehensive score corresponding to the candidate scheduling scheme. After calculating the comprehensive score for all distributed tasks in the multi-tenant cluster, the final target scheduling scheme to be executed is determined from the scheduling schemes corresponding to all distributed tasks based on the comprehensive score. The target scheduling scheme is then used to scale up or down the task replicas of the corresponding distributed tasks.

[0050] According to one example, to determine candidate scheduling schemes for each distributed task based on task bottleneck type and performance metrics, and using a bandwidth matrix and network interference graph, and to estimate the expected speedup ratio of each distributed task after executing the candidate scheduling schemes, the specific steps may include: If the task bottleneck type is a network bottleneck, then based on the bandwidth matrix and network interference graph, a target scaling node that meets preset node affinity conditions is determined from the idle cluster nodes, and a scaling scheme is determined based on the target scaling node to obtain a candidate scheduling scheme for each distributed task. If the task bottleneck type is a performance bottleneck, then a target edge replica with the longest communication path is determined from the task replicas of each distributed task, and a scaling-down scheme is determined based on the target edge replica to obtain a candidate scheduling scheme for each distributed task. After obtaining the candidate scheduling schemes for each distributed task, the expected total execution time of each distributed task after executing the candidate scheduling schemes is estimated based on network sensitivity; where network sensitivity is the sensitivity obtained by fitting the total execution time of a single execution to the bandwidth matrix; and the expected speedup ratio of each distributed task is determined based on the ratio of the total execution time of a single execution to the expected total execution time.

[0051] According to one example, the target scheduling scheme is used to scale up or down the task replicas of the corresponding distributed task. Specifically, this may include: if the target scheduling scheme is an expansion scheme, then the task replicas of the corresponding distributed task are scaled up using the target scheduling scheme to obtain new replicas, and the new replicas are placed in the corresponding expansion nodes based on the target scheduling scheme. If the target scheduling scheme is a reduction scheme, then the edge replicas to be reduced are identified based on the target scheduling scheme, and the edge replicas to be reduced are removed, and the cluster nodes where the edge replicas to be reduced reside are marked as idle.

[0052] like Figure 2 As shown, determining the final target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task based on a comprehensive score can specifically include the following steps: Step S141: Sort the scheduling schemes corresponding to each distributed task according to the comprehensive score from high to low to obtain the sorted scheduling schemes, and determine the first scheduling scheme in the sorted scheduling schemes as the current scheduling scheme. Step S142: Perform a preset judgment operation to determine whether there is a resource conflict between the current scheduling scheme and the scheduling scheme in the current result set, and whether the total required resources do not exceed the resource limit of the multi-tenant cluster; the current result set is initially an empty set; the total required resources are the total resources required by the current scheduling scheme and the scheduling scheme in the current result set. Step S143: If there is no resource conflict and the resource limit of the multi-tenant cluster is not exceeded, the current scheduling scheme is added to the current result set, and the next scheduling scheme in the sorted scheduling scheme is determined as the new current scheduling scheme, and the process jumps back to the step of performing the preset judgment operation. Step S144: If there is a resource conflict or the resource limit of the multi-tenant cluster is exceeded, the current scheduling scheme is discarded, and the next scheduling scheme in the sorted scheduling scheme is determined as the new current scheduling scheme, and the process jumps back to the step of performing the preset judgment operation. Step S145: If the current scheduling scheme is the last scheduling scheme in the sorted scheduling schemes, then determine the final target scheduling scheme to be executed based on the latest current result set.

[0053] Specifically, this application uses a multi-tenant Kubernetes cluster (16 GPU nodes running 3 distributed training tasks) as an example to demonstrate how the elastic decision-maker balances global resource contention and individual task efficiency, generating scaling recommendations that take into account both cluster utilization and training speedup. The decision is based on the "suspected network bottleneck" assumption from the task performance profiler, the network interference graph and network snapshot from the network metric collector, and the cluster-level resource status, achieving fair and efficient collaborative scheduling through a multi-objective optimization function. The specific steps include: 1. Initialize the candidate scheduling scheme set .

[0054] 2. Generate candidate scheduling schemes for each distributed task: For each distributed task t in a multi-tenant cluster, if the bottleneck type is a network bottleneck (i.e., suspected=True), then based on the bandwidth matrix and network interference graph, a target expansion node with high bandwidth, low interference, and satisfying preset node affinity conditions (such as geographical proximity, shortest communication path, etc.) is determined from the idle cluster nodes. Based on the target expansion node, an expansion scheme is determined to obtain candidate scheduling schemes for each distributed task. Simultaneously, based on network sensitivity S... net and the target expansion node n candidate Available bandwidth The estimated total execution time of each distributed task after executing the candidate scheduling scheme is... , And based on the total execution time T of a single session old And the expected total execution time T new The ratio of these values ​​determines the expected speedup ratio for each distributed task, i.e. Then, the candidate scheduling scheme is added to the candidate scheduling scheme set to update the candidate scheduling scheme set, i.e. .

[0055] If the task bottleneck type is a performance bottleneck (suspected=False), then the target edge replica Pod with the longest communication path is determined from the task replicas of each distributed task. A scaling-down scheme is then determined based on the target edge replica to obtain candidate scheduling schemes for each distributed task. Simultaneously, the expected total execution time of each distributed task after implementing the candidate scheduling scheme is estimated based on network sensitivity. The expected speedup ratio of each distributed task is determined based on the ratio of the total execution time of a single execution to the expected total execution time. Then, this candidate scheduling scheme is added to the candidate scheduling scheme set to update the candidate scheduling scheme set. .

[0056] 3. Calculation of the overall score for the scheduling plan: For each scheduling scheme in the candidate scheduling scheme set, the expected speedup ratio (accel) and resource utilization change for each scheduling scheme are analyzed. and network interference variation A weighted calculation is performed to obtain the comprehensive score corresponding to each scheduling scheme, i.e. α, β, and γ all represent preset weighting coefficients.

[0057] 4. Resource conflict verification and resource limit verification: A greedy strategy is employed. All candidate scheduling schemes are sorted in descending order of their overall score (Score). Each scheme is evaluated, determining whether it conflicts with any scheme in result set A (e.g., requiring the same node in the cluster) and whether the total resource requirement does not exceed the multi-tenant cluster's resource limit (e.g., exceeding the limit on the total number of available GPUs). If there is no resource conflict and the requirement does not exceed the multi-tenant cluster's resource limit, the scheduling scheme is added to result set A. If a resource conflict exists or the requirement exceeds the multi-tenant cluster's resource limit, the scheduling scheme is discarded. The final scheduling scheme in result set A is the target scheduling scheme to be executed.

[0058] 5. Decision Injection and Execution: For each target scheduling scheme in result set A, placement preference suggestions are generated for each scheme. For example, for a scaling-up scheme, the suggestion is to prioritize placing new replicas on high-bandwidth, low-latency neighboring nodes, which are the scaling-up nodes included in each target scheduling scheme. For a scaling-down scheme, the suggestion is to remove edge replicas with longer communication paths, which are the edge replicas included in each target scheduling scheme. These placement preference suggestions are then injected into the corresponding distributed task's Kubernetes resource object in the form of standard annotations or ConfigMaps (Configuration Maps, a type of resource object). The native Kubernetes scheduler then scales up or down the corresponding distributed task's task replica Pods and binds or unbinds them from the corresponding cluster nodes.

[0059] Furthermore, this application can obtain network metrics for multi-tenant clusters without using eBPF probes. Instead, it can directly connect to the telemetry interface of commercial-grade cloud-native application performance management tools or service meshes (such as Istio) to obtain network data for the multi-tenant cluster. Moreover, when a severe network bottleneck is identified, this application can avoid increasing the GPU / memory quota of existing containers by directly increasing the distributed communication overhead, without adding distributed communication nodes.

[0060] Based on the above scheme, this application has the following key technical features: 1. Network Resources: This application quantifies network bandwidth, latency, topology, and other information into measurable resources and deeply integrates them into the elastic scaling decision-making loop of the K8s multi-tenant cluster. This is a fundamental conceptual innovation. 2. Dynamic Network Sensitivity Modeling Method Based on Online Micro-Perturbation Experiments: This application breaks through the limitations of traditional methods that rely solely on underlying hardware indicators. Without interrupting the main business, it dynamically calculates the network sensitivity coefficient by injecting small network bandwidth constraints and monitoring changes in iteration duration, transforming scaling from a reactive behavior based on simple thresholds to a proactive optimization behavior. 3. Lightweight Monitoring Degradation Strategy Based on Adaptive Environment Awareness: This application can automatically and smoothly switch between "active detection" and "passive inference" according to the overall load pressure of the cluster, constructing a closed loop for monitoring performance that balances accuracy and low overhead. 4. Dynamic anti-affinity and interference avoidance mechanism with multi-objective trade-offs: This application proposes a quantitative concept of "network interference graph" and designs a multi-objective optimization function that integrates cluster utilization, task speedup ratio and interference penalty term. By dynamically generating Pod affinity annotations, network conflicts are proactively prevented.

[0061] Therefore, this application not only emphasizes the crucial role of network resources in the distributed task load of a multi-tenant cluster environment, but also designs performance indicators for distributed tasks to accurately reflect the bottleneck type of distributed tasks. By simultaneously considering the network status of the multi-tenant cluster and the performance indicators of distributed tasks, it can determine whether the distributed task is a network bottleneck or a performance computing bottleneck, thereby designing candidate scheduling schemes suitable for each distributed task. Subsequently, while enriching the scaling decision dimensions based on network resources and performance computing resources, it comprehensively determines whether the scheduling scheme should be executed by considering the impact of candidate scheduling schemes on the execution efficiency of each distributed task and on the resource utilization of the multi-tenant cluster. Thus, in a complex multi-tenant environment, it adapts to dynamically changing cluster loads and collaboratively achieves a dynamic balance between cluster resource utilization and task execution efficiency.

[0062] like Figure 3 and Figure 4 As shown, taking a multi-tenant cluster as a Kubernetes cluster, the distributed tasks in the multi-tenant cluster include, but are not limited to, distributed training tasks for image classification models, distributed training tasks for industrial defect detection models, etc. A detailed description of a distributed task scheduling method based on a multi-tenant cluster provided by this embodiment of the invention is given, specifically including: The method of this application is applied to a pre-set scheduling system, which includes three functional modules: a network metric collector, a task performance analyzer, and a flexible decision-maker.

[0063] (1) Environment-aware network metric collector: responsible for collecting and maintaining node-level network status metrics in real time from the cluster's underlying network infrastructure and operating environment, including but not limited to: ① Multidimensional state acquisition: Inter-node bandwidth and latency: By periodically performing network probes between nodes (such as ping and iperf), the bandwidth and latency between nodes are collected, and a bandwidth matrix B and a latency matrix L are constructed between the cluster nodes. Node network egress / ingress rate: Real-time throughput of the monitoring node's network interface card; Network device status: By integrating with the CNI (Container Network Interface) plugin, information such as the port utilization of the underlying network switch is obtained to determine the port bandwidth utilization between each cluster node; Monitor and obtain network topology change information for Services and Pods within a K8s multi-tenant cluster.

[0064] ②Construction of network interference graph: The collected raw network metrics are preprocessed and standardized, and the competition for network link resources between nodes is analyzed to generate a network interference map.

[0065] Network status snapshots (including bandwidth matrices, latency matrices, and port bandwidth utilization among cluster nodes) and network interference graphs are stored in a time-series database to provide data support for subsequent analysis and decision-making. Network emergencies, such as traffic surges and network outages, are reported.

[0066] (2) Task performance profiler: responsible for collecting and analyzing task-level performance metrics in real time from the runtime environment of distributed training tasks, including but not limited to: ① Iteration speed and single-step time: periodically collecting the number of iterations per second, total single-step time, and computation / communication stage division by embedding lightweight Hooks in the training framework or using eBPF bypass monitoring; ② Communication behavior characteristics: monitoring the call frequency, data volume, and synchronization wait time of distributed communication primitives (such as All-Reduce and All-Gather); ③ Resource sensitivity metrics: inferring the response curve of task performance to the number of GPUs, memory capacity, and network bandwidth / latency based on historical running trajectories and online micro-perturbation experiments.

[0067] Its specific functions include: By embedding training code or using bypass monitoring, key performance indicators such as iteration speed, gradient synchronization time, computation time, communication time, communication ratio, and single-step data transmission volume of distributed training tasks are collected.

[0068] Automatically identify the communication mode used in distributed training tasks (such as Ring All-Reduce, ParameterServer) and construct a communication traffic feature vector.

[0069] By combining historical data and online micro-perturbation observations, an approximate functional model of task performance with respect to computational and network resources is constructed. =f(G GPU M mem B bw ,L latency And calculate the network sensitivity coefficient in real time. .

[0070] Real-time diagnosis of performance bottleneck types, such as network bottlenecks or performance bottlenecks, generating data including residuals (i.e., time differences) and network sensitivity S. net A structured task performance report that includes communication share and resource demand forecasts (theoretical bandwidth increment, current task bandwidth demand, etc.).

[0071] The collected raw performance indicators and modeling results are preprocessed and standardized, and then stored in a time series database to provide a unified analytical input for the elastic decision-maker.

[0072] (3) Elastic Decision Maker: The core component of this invention. It receives multi-dimensional input data from the network metric collector and the task performance profiler, and generates network-aware elastic scheduling decisions for each distributed training task. Its decision logic is based on a multi-objective optimization function, which aims to minimize the execution time of the distributed task, while the constraint is the total resources of the cluster. It includes, but is not limited to: ① Network impact assessment: determining whether the bottleneck type of the distributed task is a network bottleneck or a performance bottleneck; ② Scaling up / down decision: determining the number of task replicas of the distributed task that need to be adjusted and the corresponding cluster nodes; ③ Placement preference suggestion: outputting recommended node placement strategies (such as prioritizing the same rack, the same availability zone, or the low-interference domain) to guide subsequent K8s scheduling.

[0073] Its specific functions include: Periodically or event-driven, retrieve task performance reports (including residuals, communication percentages, and network sensitivity) and network status snapshots (including topology, link load, and interference graphs) from the time-series database.

[0074] Decision-making for scaling up and down: If a network bottleneck exists and bandwidth improvement is limited, it is recommended to increase the number of replicas to distribute the communication load in parallel; if bandwidth improvement is insufficient, it is recommended to reduce the number of replicas. Note that the scaling down operation must be rigorously evaluated by a multi-objective optimization function. Scaling down should only be performed when the cluster utilization improvement and communication overhead reduction brought about by reducing replicas can substantially compensate for the loss of increased computing time per card, so as to avoid memory overflow or performance drop caused by blind scaling down.

[0075] Output placement preference suggestions: Based on scaling decisions, placement preference suggestions are generated. For example, it is recommended to prioritize the placement of new replicas on neighboring nodes with high bandwidth and low latency, or to suggest removing edge replicas with long communication paths.

[0076] Placement preference suggestions are injected into the corresponding distributed task's K8s resource object via API-Server in the form of standard annotations or ConfigMaps. The K8s native scheduler is responsible for the final node binding and replica adjustment, and writes the adjusted result status back to the time series database.

[0077] The API-Server not only serves as the system interface for obtaining distributed tasks from the user, but also provides resource adjustment feedback and performance indicator feedback. It is also the channel for the scheduler to obtain cluster status and submit scheduling decisions.

[0078] Furthermore, this invention focuses on network-aware intelligent decision-making logic and deeply integrates the Kubernetes elastic mechanism. The system functional modules include the following system-related components: Figure 5 As shown, this invention uses Elastic Operator (an intelligent cluster management tool designed for Kubernetes environments) as the core control plane, integrating a network metric collector, task performance profiler, and elastic decision-maker. Through the Metrics Provider in the Custom Metrics server, custom metrics obtained from the time-series database are exposed to HPA (Horizontal Pod Autoscaler) via the Metrics API, achieving seamless integration with native Kubernetes elasticity. All collected data is uniformly stored in the time-series database, and the decision results are injected into the HPA resource object via the Kubernetes API Server in the form of HPA.spec.replicas and affinity annotations, with the Kubernetes scheduler completing the final execution. This architecture requires no modification to the core Kubernetes code, only achieving elastic scheduling of network-aware distributed task replica Pods through standard extension points (Metrics API, etc.), possessing high portability and production availability.

[0079] This invention, through a hierarchical design of "task analyzer proposing hypotheses and flexible decision-maker making decisions," centralizes network indicator consumption permissions at the decision-making hub, avoiding the functional overlap between the perception module and scheduling module in traditional systems, and improving the modularity and maintainability of the system.

[0080] See Figure 6As shown, this embodiment of the invention discloses a distributed task scheduling device based on a multi-tenant cluster, comprising: The task bottleneck determination module 11 is used to determine the bandwidth matrix and network interference diagram between cluster nodes based on the network status of the multi-tenant cluster, and to determine the task bottleneck type of each distributed task based on the performance indicators of each distributed task in the multi-tenant cluster. The scheduling scheme determination module 12 is used to determine a candidate scheduling scheme for each distributed task by utilizing the task bottleneck type and the performance indicators, and based on the bandwidth matrix and the network interference diagram, and to estimate the expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster after executing the candidate scheduling scheme. The comprehensive score determination module 13 is used to determine the target comprehensive score corresponding to the candidate scheduling scheme based on the expected speedup ratio and the change in resource utilization. The task replica scheduling module 14 is used to determine the target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task based on the comprehensive score, and to use the target scheduling scheme to expand or shrink the task replicas of the corresponding distributed tasks.

[0081] Since the embodiments of the device part correspond to the embodiments of the aforementioned method part, the specific implementation steps of the embodiments of the device part can be referred to the relevant steps of the embodiments of the aforementioned method part, and will not be repeated here.

[0082] Therefore, this application not only emphasizes the crucial role of network resources in the distributed task load of a multi-tenant cluster environment, but also designs performance indicators for distributed tasks to accurately reflect the bottleneck type of distributed tasks. By simultaneously considering the network status of the multi-tenant cluster and the performance indicators of distributed tasks, it can determine whether the distributed task is a network bottleneck or a performance computing bottleneck, thereby designing candidate scheduling schemes suitable for each distributed task. Subsequently, while enriching the scaling decision dimensions based on network resources and performance computing resources, it comprehensively determines whether the scheduling scheme should be executed by considering the impact of candidate scheduling schemes on the execution efficiency of each distributed task and on the resource utilization of the multi-tenant cluster. Thus, in a complex multi-tenant environment, it adapts to dynamically changing cluster loads and collaboratively achieves a dynamic balance between cluster resource utilization and task execution efficiency.

[0083] Furthermore, embodiments of this application also disclose an electronic device, Figure 7 This is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content of the diagram should not be construed as limiting the scope of this application.

[0084] Figure 7This is a schematic diagram of the structure of an electronic device 20 provided in an embodiment of this application. Specifically, the electronic device 20 may include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input / output interface 25, and a communication bus 26. The memory 22 stores a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the distributed task scheduling method based on a multi-tenant cluster disclosed in any of the foregoing embodiments. Alternatively, the electronic device 20 in this embodiment may specifically be an electronic computer.

[0085] In this embodiment, the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be any communication protocol applicable to the technical solution of this application, and is not specifically limited here; the input / output interface 25 is used to acquire external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.

[0086] In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, random access memory, disk or optical disk, etc. The resources stored thereon can include operating system 221, computer program 222, etc., and the storage method can be temporary storage or permanent storage.

[0087] The operating system 221 is used to manage and control the various hardware devices on the electronic device 20 and the computer program 222, which may be Windows Server, Netware, Unix, Linux, etc. In addition to including a computer program capable of performing the distributed task scheduling method based on a multi-tenant cluster executed by the electronic device 20 as disclosed in any of the foregoing embodiments, the computer program 222 may further include computer programs capable of performing other specific tasks.

[0088] Furthermore, this application also discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the aforementioned distributed task scheduling method based on a multi-tenant cluster. Specific steps of this method can be found in the corresponding content disclosed in the foregoing embodiments, and will not be repeated here.

[0089] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0090] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0091] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0092] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0093] The technical solutions provided in this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A distributed task scheduling method based on a multi-tenant cluster, characterized in that, include: Based on the network status of the multi-tenant cluster, the bandwidth matrix and network interference graph between each cluster node are determined, and based on the performance indicators of each distributed task in the multi-tenant cluster, the task bottleneck type of each distributed task is determined. Using the task bottleneck type and the performance indicators, and based on the bandwidth matrix and the network interference graph, a candidate scheduling scheme is determined for each distributed task, and the expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster are estimated after executing the candidate scheduling scheme. Based on the expected speedup ratio and the change in resource utilization, the target comprehensive score corresponding to the candidate scheduling scheme is determined; Based on the comprehensive score, the target scheduling scheme to be executed is determined from the scheduling schemes corresponding to each distributed task, and the target scheduling scheme is used to expand or shrink the task replicas of the corresponding distributed tasks.

2. The distributed task scheduling method based on a multi-tenant cluster according to claim 1, characterized in that, Based on the network status of the multi-tenant cluster, determine the network interference graph between cluster nodes, including: Based on the network status of the multi-tenant cluster, determine the latency matrix and port bandwidth utilization between each cluster node; Based on the location information of each cluster node, construct the adjacency relationship of each cluster node; Using the location information of each cluster node, and based on the delay matrix and port bandwidth utilization between each cluster node, the amount of network interference between each cluster node is determined; By utilizing the adjacency relationships of each cluster node and based on the amount of network interference between each cluster node, a network interference graph between each cluster node is determined.

3. The distributed task scheduling method based on a multi-tenant cluster according to claim 1, characterized in that, The process of determining the bottleneck type of each distributed task based on the performance metrics of each distributed task in the multi-tenant cluster includes: Collect performance metrics for each distributed task in the multi-tenant cluster; the performance metrics include the total execution time per execution and the communication time during the execution process. The communication percentage is determined based on the communication duration and the total duration of a single execution. Based on the historical execution data of each distributed task, determine the ideal total execution time, and determine the time difference between the single total execution time and the ideal total execution time; If the communication ratio is greater than a preset ratio threshold and the time difference is greater than a preset difference, then the network bottleneck is determined as the task bottleneck type of each distributed task; otherwise, the performance bottleneck is determined as the task bottleneck type of each distributed task.

4. The distributed task scheduling method based on a multi-tenant cluster according to claim 3, characterized in that, Using the task bottleneck type and the performance metrics, and based on the bandwidth matrix and the network interference graph, candidate scheduling schemes are determined for each distributed task, and the expected speedup ratio of each distributed task after executing the candidate scheduling schemes is estimated, including: If the task bottleneck type is a network bottleneck, then based on the bandwidth matrix and the network interference diagram, a target expansion node that meets the preset node affinity condition is determined from the idle cluster nodes, and an expansion scheme is determined based on the target expansion node to obtain a candidate scheduling scheme for each distributed task. If the task bottleneck type is a performance bottleneck, then the target edge replica with the longest communication path is determined from the task replicas of each distributed task, and a scaling-down scheme is determined based on the target edge replica to obtain a candidate scheduling scheme for each distributed task. After obtaining candidate scheduling schemes for each distributed task, the expected total execution time of each distributed task after executing the candidate scheduling scheme is estimated based on network sensitivity; wherein, the network sensitivity is a sensitivity obtained by fitting the total execution time of a single execution and the bandwidth matrix. The expected speedup ratio of each distributed task is determined based on the ratio of the total execution time of a single execution to the expected total execution time.

5. The distributed task scheduling method based on a multi-tenant cluster according to claim 4, characterized in that, The step of scaling up or down the task replicas of the corresponding distributed tasks using the target scheduling scheme includes: If the target scheduling scheme is an expansion scheme, then the task replicas of the corresponding distributed tasks are expanded using the target scheduling scheme to obtain new replicas, and the new replicas are placed in the corresponding expansion nodes based on the target scheduling scheme. If the target scheduling scheme is a scaling-down scheme, then the edge replicas to be scaled down are determined based on the target scheduling scheme, the edge replicas to be scaled down are removed, and the cluster nodes where the edge replicas to be scaled down are located are marked as idle.

6. The distributed task scheduling method based on a multi-tenant cluster according to claim 1, characterized in that, The step of determining the target comprehensive score corresponding to the candidate scheduling scheme based on the expected speedup ratio and the change in resource utilization includes: Based on the network interference graph, the change in network interference of the multi-tenant cluster after executing the candidate scheduling scheme is estimated. The expected speedup ratio, the change in resource utilization, and the change in network interference are weighted and calculated to obtain the target comprehensive score corresponding to the candidate scheduling scheme.

7. The distributed task scheduling method based on a multi-tenant cluster according to any one of claims 1 to 6, characterized in that, The process of determining the final target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task based on a comprehensive score includes: The scheduling schemes corresponding to each distributed task are sorted according to the comprehensive score from high to low to obtain the sorted scheduling schemes, and the first scheduling scheme in the sorted scheduling schemes is determined as the current scheduling scheme. Perform a preset judgment operation to determine whether there is a resource conflict between the current scheduling scheme and the scheduling schemes in the current result set, and whether the total required resources do not exceed the resource limit of the multi-tenant cluster; the current result set is initially an empty set; the total required resources are the total resources required by the current scheduling scheme and the scheduling schemes in the current result set. If there are no resource conflicts and the resource limit of the multi-tenant cluster is not exceeded, the current scheduling scheme is added to the current result set, and the next scheduling scheme in the sorted scheduling scheme is determined as the new current scheduling scheme, and the process jumps back to the step of performing the preset judgment operation. If there is a resource conflict or the resource limit of the multi-tenant cluster is exceeded, the current scheduling scheme is discarded, and the next scheduling scheme in the sorted scheduling scheme is determined as the new current scheduling scheme, and the process jumps back to the step of performing the preset judgment operation. If the current scheduling scheme is the last scheduling scheme in the sorted scheduling schemes, then the final target scheduling scheme to be executed is determined based on the latest current result set.

8. A distributed task scheduling device based on a multi-tenant cluster, characterized in that, include: The task bottleneck determination module is used to determine the bandwidth matrix and network interference graph between cluster nodes based on the network status of the multi-tenant cluster, and to determine the task bottleneck type of each distributed task based on the performance indicators of each distributed task in the multi-tenant cluster. The scheduling scheme determination module is used to determine a candidate scheduling scheme for each distributed task by utilizing the task bottleneck type and the performance indicators, and based on the bandwidth matrix and the network interference diagram, and to estimate the expected speedup ratio of each distributed task and the change in resource utilization of the multi-tenant cluster after executing the candidate scheduling scheme. The comprehensive score determination module is used to determine the target comprehensive score corresponding to the candidate scheduling scheme based on the expected speedup ratio and the change in resource utilization. The task replica scheduling module is used to determine the target scheduling scheme to be executed from the scheduling schemes corresponding to each distributed task based on the comprehensive score, and to use the target scheduling scheme to expand or shrink the task replicas of the corresponding distributed tasks.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the distributed task scheduling method based on a multi-tenant cluster as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, Used to store computer programs, which, when executed by a processor, implement the distributed task scheduling method based on a multi-tenant cluster as described in any one of claims 1 to 7.