Heterogeneous computing dynamic scheduling method and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring multidimensional perception vectors and feature data from heterogeneous devices, and using a preset time calculation formula and reinforcement learning model to dynamically schedule computing tasks, the problem of insufficient adaptability of heterogeneous scheduling methods in dealing with complex AI loads is solved, achieving efficient resource utilization and dynamic performance optimization.

CN122195652APending Publication Date: 2026-06-12BEIJING UNIV OF POSTS & TELECOMM

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING UNIV OF POSTS & TELECOMM
Filing Date: 2026-03-09
Publication Date: 2026-06-12

Smart Images

Figure CN122195652A_ABST

Patent Text Reader

Abstract

The application provides a heterogeneous computing power dynamic scheduling calculation method and device, and belongs to the computing power scheduling field. The method comprises the following steps: obtaining first characteristic data corresponding to a target calculation task to be executed on a heterogeneous device and a multi-dimensional perception vector; calculating a basic calculation time based on the first characteristic data and a dynamic coefficient related to the hardware of the heterogeneous device; constructing a multi-dimensional state vector based on the basic calculation time, second characteristic data of the task and the multi-dimensional perception vector, and performing a task allocation step based on the multi-dimensional state vector; the task allocation step comprises the following steps: inputting the multi-dimensional state vector into a reinforcement learning model to obtain an initial continuous action vector; processing the initial continuous action vector based on a preset constraint condition and a multi-objective optimization condition to obtain a target continuous action vector; mapping the target continuous action vector into a control instruction and sending the control instruction to the heterogeneous device for execution. The application can improve the adaptability and working efficiency of device heterogeneous calculation, and effectively reduce the waste of computing resources.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of computing power scheduling technology, and more specifically, it relates to a method and apparatus for dynamic scheduling of heterogeneous computing power. Background Technology

[0002] With the rapid development of artificial intelligence, especially large language models (LLMs), the demand for computing power has exploded. Single-type computing chips can no longer meet the extreme requirements of modern AI applications for high performance, low latency, and high energy efficiency. Therefore, heterogeneous computing platforms, composed of computing units from different architectures and manufacturers (such as central processing units, graphics processing units, neural network processing units, and data center computing units), have become the core infrastructure supporting the continuous breakthroughs in AI technology.

[0003] Current heterogeneous scheduling methods generally suffer from insufficient adaptability when dealing with highly dynamic and complex AI workloads, leading to a waste of computing resources. For example, static scheduling schemes, which divide and fix computing tasks all at once before the task begins, cannot perceive or respond to real-time fluctuations in hardware performance. This results in weaker or poorly performing devices becoming the bottleneck of the entire computing process, causing a huge waste of computing resources. Summary of the Invention

[0004] The purpose of this application is to provide a method and apparatus for dynamic scheduling of heterogeneous computing power, so as to improve the adaptability and efficiency of heterogeneous computing and effectively reduce the waste of computing resources.

[0005] A first aspect of this application provides a method for dynamic scheduling of heterogeneous computing power, comprising: Obtain the first feature data and multi-dimensional perception vector corresponding to the target computing task to be executed on the heterogeneous device. The multi-dimensional perception vector is used to represent the hardware status, network status, load trend and performance characteristics of the heterogeneous device. The first feature data is used to represent the inherent attributes of the task itself. Based on the first feature data and the dynamic coefficients related to the hardware of heterogeneous devices, the basic computing time to complete the target computing task is calculated using a preset time calculation formula; the dynamic coefficients are determined according to the historical performance indicators of different heterogeneous devices. A multidimensional state vector is constructed based on the basic computation time, the second feature data of the target computation task, and the multidimensional perception vector. The task allocation steps are then executed based on the multidimensional state vector. The second feature data is used to represent the scheduling and management attributes of the task. The task assignment steps include: The multidimensional state vector is input into the reinforcement learning model to obtain the initial continuous action vector; The initial continuous action vector is filtered based on preset constraints to obtain the filtered initial continuous action vector; the target continuous action vector is optimized based on preset multi-objective optimization conditions; the first feature data is used to represent the inherent attributes of the task itself. The target's continuous motion vectors are mapped into control commands and sent to heterogeneous devices for execution.

[0006] A second aspect of this application provides a heterogeneous computing power dynamic scheduling computing device, comprising: The data acquisition module is used to acquire the first feature data and multi-dimensional perception vector corresponding to the target computing task to be executed on the heterogeneous device. The multi-dimensional perception vector is used to represent the hardware status, network status, load trend and performance characteristics of the heterogeneous device. The data calculation module is used to calculate the basic calculation time to complete the target calculation task based on the first feature data and the dynamic coefficients related to the heterogeneous device hardware, using a preset time calculation formula; the dynamic coefficients are determined according to the historical performance indicators of different heterogeneous devices. The task execution module is used to construct a multidimensional state vector based on the basic computing time, the second feature data of the target computing task, and the multidimensional perception vector, and to execute the task allocation steps based on the multidimensional state vector. When assigning tasks, the task execution module is specifically used for: The multidimensional state vector is input into the reinforcement learning model to obtain the initial continuous action vector; The initial continuous action vector is filtered based on preset constraints to obtain the filtered initial continuous action vector; the target continuous action vector is optimized based on preset multi-objective optimization conditions. The target's continuous motion vectors are mapped into control commands and sent to heterogeneous devices for execution.

[0007] A third aspect of this application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the steps of the above-described heterogeneous computing power dynamic scheduling calculation method.

[0008] In a fourth aspect of this application, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps of the above-described heterogeneous computing power dynamic scheduling calculation method.

[0009] The beneficial effects of the heterogeneous computing power dynamic scheduling calculation method and apparatus provided in this application are as follows: This embodiment calculates the basic computation time for the target computation task based on the first feature data and dynamic coefficients related to the heterogeneous device hardware. Subsequent calculations are then performed based on this basic computation time and multi-dimensional perception vectors. This allows scheduling decisions to perceive hardware performance fluctuations and load changes in real time, avoiding performance bottlenecks caused by fixed task allocation in static scheduling and effectively improving adaptability to dynamic and complex AI workloads. Furthermore, this embodiment utilizes a reinforcement learning model to generate initial continuous action vectors and optimizes them through preset constraint conditions and multi-objective optimization conditions. Under the premise of meeting hardware resource limitations, it balances objectives such as task completion latency, throughput, energy consumption, and load balancing, reducing resource waste and improving the overall resource utilization efficiency of heterogeneous devices. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 A flowchart illustrating a heterogeneous computing power dynamic scheduling calculation method provided in an embodiment of this application; Figure 2 A flowchart illustrating a heterogeneous computing power dynamic scheduling calculation method provided in another embodiment of this application; Figure 3 A structural block diagram of a heterogeneous computing power dynamic scheduling computing device provided in an embodiment of this application; Figure 4 This is a schematic block diagram of an electronic device provided in an embodiment of this application. Detailed Implementation

[0012] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.

[0013] To make the objectives, technical solutions, and advantages of this application clearer, the following description will be provided in conjunction with the accompanying drawings and specific embodiments.

[0014] Please refer to Figure 1 , Figure 1This is a flowchart illustrating a heterogeneous computing power dynamic scheduling calculation method provided in an embodiment of this application. The method can be executed by an electronic device and may include: S101~S103.

[0015] S101: Obtain the first feature data and multidimensional perception vector corresponding to the target computing task to be executed on the heterogeneous device. The multidimensional perception vector is used to represent the hardware status, network status, load trend and performance characteristics of the heterogeneous device.

[0016] In one embodiment, the first feature data corresponding to the target computing task is used to represent the inherent attributes of the task itself, including the number of task processing tokens, the number of AI model parameters used by the task, and the task complexity factor.

[0017] In this embodiment, the above three feature data are the main indicators for measuring the computational load.

[0018] In this context, for computational tasks related to large language models, the number of task processing tokens represents the total number of tokens that need to be processed; for other types of AI tasks, the number of task processing tokens represents the size of the input data or the length of the sequence. The number of task processing tokens directly reflects the input scale of the target computational task.

[0019] The task uses the number of AI model parameters to represent the total number of parameters of the AI model used in the target computation task, which directly reflects the scale and complexity of the model itself.

[0020] The task complexity factor represents a correction factor used to describe the inherent computational complexity of the target computation task. For example, for large language models, the computational complexity of the Prefill and Decode stages differs; for image models, the computational complexity also varies depending on the network architecture, such as Convolutional Neural Networks (CNNs) and Transformers. The task complexity factor can compensate for the differences in computational cost that cannot be fully captured by simply considering the number of tokens processed and the number of AI model parameters used in the task, making the calculation of the basic computation time to complete the target computation task more precise and accurate in the following text.

[0021] In this embodiment, in response to receiving a target computing task, the target computing task is parsed to obtain the task type, the number of tokens processed, the number of AI model parameters used in the task, and the task complexity factor. The task types mainly include the following four categories: Category 1: LLM training tasks. For example, using the Megatron-LM framework to perform full pre-training or fine-tuning of the GPT-3 model.

[0022] The second category: LLM inference-prefill tasks. For example, a user inputs a 4k-long text prompt, and the model needs to process these tokens in parallel to generate a KVCache. This type of task is computationally intensive.

[0023] The third category: LLM inference-decode tasks. For example, the model generates responses token by token. This type of task is memory and bandwidth intensive.

[0024] The fourth category: Traditional deep learning tasks (Non-LLM). For example: ResNet-50 image classification (CNN architecture) or pre-trained language model BERT text classification (Encoder-only architecture).

[0025] The number of task processing tokens, the number of AI model parameters used in the task, and the task complexity factor are different for different task types.

[0026] In this embodiment, the multidimensional perception vector includes a hardware status perception vector, a network topology perception vector, a load trend perception vector, and a performance variance perception vector, all of which are 32-dimensional feature vectors. The multidimensional perception vector is a direct concatenation of the above four types of feature vectors.

[0027] The hardware status awareness vector is a 32-dimensional numerical representation of the real-time operating status and health of the computing and storage units of various heterogeneous devices. This 32-dimensional feature vector includes core indicators such as GPU / NPU utilization, GPU / NPU memory usage, GPU / NPU core temperature, memory temperature, hardware power consumption, CPU utilization, memory usage, and port operating status. This embodiment comprehensively depicts the status of single / multiple hardware nodes through 32 dimensions, each dimension being a normalized continuous value that reflects the operating level of the hardware in various dimensions.

[0028] A network topology awareness vector is a 32-dimensional numerical representation of the link attributes, topological connection characteristics, and network operating status between heterogeneous devices or between electronic devices and heterogeneous devices. The network topology awareness vector includes metrics such as link bandwidth (available bandwidth / total bandwidth), end-to-end latency, one-way transmission latency, link packet loss rate, network congestion level, link transmission rate, as well as link connectivity, network jitter, port throughput, and cross-subnet transmission latency. This embodiment integrates the static structural features and dynamic operating attributes of the network topology across 32 dimensions, enabling a characterization of the network layer status of heterogeneous devices.

[0029] The load trend perception vector is a 32-dimensional numerical representation of the core features of future load changes extracted based on the LSTM time series forecasting model. The load trend perception vector includes core features such as load forecast values for different future time windows, load peak / valley forecast values, load fluctuation cycle characteristics, and correlation prediction coefficients between load and traffic volume. All 32 features are quantified output features of the LSTM model, directly representing the future load change patterns and key attributes.

[0030] The performance variance perception vector is a 32-dimensional numerical representation of the fluctuation and dispersion of core performance indicators of heterogeneous devices during historical operation. The performance variance perception vector includes variances such as response latency, throughput, computational efficiency, data transmission rate, and task completion rate.

[0031] S102: Based on the first feature data and the dynamic coefficients related to the heterogeneous device hardware, calculate the basic computing time to complete the target computing task using a preset time calculation formula; the dynamic coefficients are determined according to the historical performance indicators of different heterogeneous devices.

[0032] In this embodiment, the dynamic coefficients related to the heterogeneous device hardware It is related to a specific hardware type The relevant adjustment factor is expressed in seconds (s). Specific hardware types can be general-purpose parallel computing accelerators (Graphics Processing Units, GPUs), neural network-specific accelerators (Neural Processing Units, NPUs), or data center-level heterogeneous computing accelerators (Data Center Units, DCUs). In this embodiment... It has the following characteristics: Hardware awareness: Dynamic coefficients for different hardware types They are different and can be determined based on the historical performance indicators of different heterogeneous devices. These historical performance indicators are used to represent the performance differences of different heterogeneous devices when handling the same task. For example... and They have different values.

[0033] Data-driven: Dynamic coefficients for different hardware types It's not manually set, but rather automatically learned and updated through periodic regression analysis of historical performance data from heterogeneous devices, and this data can be stored in the electronic device's memory. This means that the longer the device operates, the more data accumulates. The closer it gets to the performance of real hardware.

[0034] Continuously updated: Heterogeneous devices may experience performance changes due to aging or firmware updates, leading to... Changes have occurred. In this embodiment... It is not static, but rather updated periodically based on time and data, therefore It can perform self-optimization and updates, thus providing a reliable basis for the accuracy of the basic computation time calculation.

[0035] In this embodiment, based on the first feature data and dynamic coefficients related to the heterogeneous device hardware, the basic computation time for completing the target computation task is calculated using a preset time calculation formula, including: The basic computation time to complete the target computation task is calculated using the following formula:

[0036] in, This represents the basic computation time to complete the target computation task, in seconds. This indicates the number of tokens processed in the task (usually a normalized value when used in calculations). This indicates the number of AI model parameters used in the task (usually normalized values when used in calculations). This represents the task complexity factor, which is dimensionless.

[0037] In this embodiment, by acquiring first feature data to represent the inherent attributes of the task itself, the allocation of time calculations for heterogeneous devices can be more targeted and better meet the needs of large language models.

[0038] S103: Construct a multidimensional state vector based on the basic computing time, the second feature data of the target computing task, and the multidimensional perception vector, and execute the task allocation steps based on the multidimensional state vector.

[0039] In this embodiment, the basic computation time can be understood as a 4-dimensional vector, containing (Normalized value) , and The second feature data of the target computation task is used to represent the task's scheduling and management attributes, including task priority, task dependencies, historical statistical features, and remaining budget features. Specifically, task priority indicates the urgency of the target computation task; task dependencies represent the IDs or number of preceding tasks (the total number of preceding tasks directly dependent on by the current task); historical statistical features represent the average completion time and average energy consumption of the target computation task for this type of task in the past; and the remaining budget feature represents the remaining time budget of the target computation task. In this embodiment, the second feature data can be a 124-dimensional vector.

[0040] In this embodiment, a multidimensional state vector is constructed based on the basic computing time, the second feature data of the target computing task, and the multidimensional perception vector. This can be understood as fusing the 4-dimensional basic computing time, the 128-dimensional multidimensional perception vector, and the 124-dimensional second feature data and then normalizing them to obtain the multidimensional state vector.

[0041] In one embodiment, the task allocation step based on the multidimensional state vector includes S1031~S1032.

[0042] S1031: Input the multidimensional state vector into the reinforcement learning model to obtain the initial continuous action vector.

[0043] In this embodiment, the reinforcement learning model is a Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning model (hereinafter referred to as the TD3 model). A multi-dimensional state vector is input into the reinforcement learning model, and an initial continuous action vector is obtained based on a multi-objective reward function. This represents the "distribution ratio of task load across different heterogeneous devices". Initial continuous action vector. It is an N-dimensional vector (N is the number of device types or device nodes of heterogeneous devices). For example, it means that 80% of the target computing task is allocated to the GPU and 20% to the NPU.

[0044] The multi-objective reward function is: .

[0045] in, The value of the multi-objective reward function. This indicates the actual measured task completion delay. Indicates the maximum task completion delay. This represents the actual measured throughput. Indicates the maximum throughput. Indicates actual energy consumption. Indicates maximum energy consumption. Indicates the degree of load imbalance. , , and These represent the weight coefficients corresponding to each objective, which can be set according to the current business type, for example, 0.25, 0.25, 0.25, 0.25 respectively.

[0046] S1032: Based on preset constraints, the initial continuous action vector is filtered to obtain the filtered initial continuous action vector; based on preset multi-objective optimization conditions, the filtered initial continuous action vector is optimized to obtain the target continuous action vector.

[0047] In one embodiment, the heterogeneous device includes multiple computing execution devices, and the constraints include: The video memory capacity constraint means that the video memory required for the amount of tasks allocated to the computing power execution device is less than or equal to the remaining video memory of the computing power execution device; Device availability constraint indicates that the computing power execution device is online and in a non-faulty state; The minimum granularity constraint means that the amount of subtasks after the target computation task is divided is not less than the minimum amount of tasks to be executed.

[0048] In this embodiment, the initial continuous action vectors that satisfy the above constraints are filtered to obtain the filtered initial continuous action vectors.

[0049] In one embodiment, the preset multi-objective optimization conditions are minimum task completion delay, maximum throughput, minimum energy consumption, and minimum load imbalance. This embodiment can optimize the initial continuous action vector after screening using the NSGA-II genetic algorithm based on the preset multi-objective optimization conditions to obtain the target continuous action vector. The specific steps are as follows: The initial continuous action vectors, selected from the initial pool, serve as the initial population. All individuals satisfy the constraints. Each individual (action vector) in the population is sorted according to four objective function values, creating different Pareto front levels, with individuals at the front being superior. For individuals at the same level, crowding is calculated to ensure a uniform distribution of individuals on the front and avoid local optima. Parents are selected based on non-dominated levels and crowding. New offspring are generated through crossover (combining the allocation schemes of two action vectors) and mutation (fine-tuning the task allocation ratio). Constraints are validated on the offspring individuals, and those that do not meet the constraints are removed. The parent and offspring populations are merged, and non-dominated sorting and crowding calculations are performed again to select the optimal individuals to form the next generation population. Once the iteration conditions are met, the target continuous action vector is obtained.

[0050] S1033: Map the target continuous motion vector into control commands and send them to heterogeneous devices for execution.

[0051] In this embodiment, the target continuous action vector represents the "distribution ratio of task load across different heterogeneous devices" that meets the constraints and optimization conditions of heterogeneous devices. This embodiment first parses the target continuous action vector to obtain information such as task distribution ratio and partitioning granularity. Through predefined mapping rules, it generates structured control instructions containing device selection, task fragment size, and scheduling timing. These instructions are then sent to the corresponding heterogeneous devices via a unified device communication protocol, enabling them to complete task loading, resource scheduling, and parallel computing according to the instructions. Simultaneously, a status feedback field is embedded in the instructions to achieve real-time monitoring and anomaly handling of the heterogeneous device execution process.

[0052] As can be seen from the above, this embodiment calculates the basic computation time for the target computation task based on the first feature data and dynamic coefficients related to the heterogeneous device hardware. Subsequent calculations are then performed based on this basic computation time and multi-dimensional perception vectors. This allows the scheduling decision to perceive hardware performance fluctuations and load changes in real time, avoiding performance bottlenecks caused by fixed task allocation in static scheduling and effectively improving adaptability to dynamic and complex AI workloads. This embodiment also utilizes a reinforcement learning model to generate initial continuous action vectors and optimizes them through preset constraint conditions and multi-objective optimization conditions. Under the premise of meeting hardware resource limitations, it can balance objectives such as task completion latency, throughput, energy consumption, and load balancing, reducing resource waste and improving the overall resource utilization efficiency of heterogeneous devices.

[0053] In one embodiment of this application, the heterogeneous computing power dynamic scheduling calculation method further includes: Obtain performance metrics for heterogeneous devices; If the performance metrics do not meet the preset conditions, the task allocation step will be executed again until the performance metrics meet the preset conditions.

[0054] In one embodiment, the performance indicators of heterogeneous devices include throughput, latency, device power consumption, and load utilization. These four indicators have been described above and will not be repeated here.

[0055] The preset conditions include: the weighted deviation values of throughput, latency, equipment energy consumption, and load utilization are less than the preset weighted deviation value; and / or, the deviation value of any one of the performance indicators is less than the preset deviation value of that performance indicator.

[0056] In this embodiment, if the weighted deviation values of throughput, latency, device power consumption, and load utilization are greater than or equal to a preset weighted deviation value, or if the deviation value of any performance indicator is greater than or equal to the preset deviation value of that performance indicator, it indicates that the allocation ratio of the target continuous action vector is unreasonable and needs to be readjusted. The adjustment method involves re-executing the task allocation step until the performance indicators meet the preset conditions. This can be understood as follows: when the performance indicators do not meet the preset conditions, the distribution of currently unfinished subtasks is immediately stopped, the task allocation step is re-executed, a new multidimensional state vector is generated based on the latest multidimensional perception vector, and finally a new target continuous action vector is obtained. This new target continuous action vector is then mapped to control commands and sent to heterogeneous devices for execution. For example, if it is found that the NPU processing speed is far lower than expected, rescheduling will immediately reduce the amount of data distributed to the NPU and instead distribute it to the GPU.

[0057] In this embodiment, because an adaptive adjustment mechanism is set up, heterogeneous devices can quickly adapt to sudden changes in environment, model or load within minutes without manual intervention, which greatly improves the robustness of the devices and the efficiency of operation and maintenance.

[0058] In one embodiment of this application, after obtaining the performance indicators of heterogeneous devices, the heterogeneous computing power dynamic scheduling calculation method further includes: If the performance indicators meet the preset conditions, the multidimensional state vector, the target continuous action vector, the multidimensional state vector after each action vector is executed, and the reward value will be stored as storage experience units. In response to receiving the completion instruction for the target computation task, the dynamic coefficients are updated based on the data in the stored experience unit and the preset time update level, and the reinforcement learning model is updated based on the data in the stored experience unit and the preset time update level.

[0059] In this embodiment, if the performance indicators meet the preset conditions, feedback information is collected after the target computation task is completed. This feedback information includes a multi-dimensional state vector, a target continuous action vector, a multi-dimensional state vector after each action vector is executed, and a reward value. The reward value is the reward value corresponding to the reinforcement learning model, representing the reward for this action. Post-implementation evaluation of the implementation effect.

[0060] In one embodiment, the reward value corresponding to the reinforcement learning model is calculated as follows: The reward value is calculated based on the throughput change rate, latency change rate, energy consumption change rate of heterogeneous devices and their respective weighting coefficients.

[0061] .

[0062] in, Indicates the reward value. This represents the percentage increase in throughput compared to the baseline (or the previous time step t-1). This indicates a percentage reduction in time delay. This indicates a percentage reduction in energy consumption. Indicates a penalty item. , and These represent the weight coefficients for each indicator, which can be set according to the task type, for example, 0.5, 0.25, and 0.25 respectively.

[0063] This embodiment can update the dynamic coefficients and the reinforcement learning model separately based on the collected feedback information and the preset time update level. For example, the time update level for the dynamic coefficients is daily, meaning they are updated every day. The time update level for the reinforcement learning model is hourly, meaning they are updated every hour, and the updated content is the TD3 model parameters.

[0064] This embodiment updates parameters based on the execution status of each task. As device runtime increases and data accumulates, the generated continuous action vector of the target becomes increasingly accurate. This embodiment utilizes multi-dimensional feedback to optimize computing power scheduling strategies and balances computational overhead and learning efficiency through hierarchical updates, continuously improving the efficiency, low latency, and low energy consumption of heterogeneous computing power scheduling, thereby enhancing the device's dynamic optimization capabilities.

[0065] refer to Figure 2 This embodiment provides an overall flowchart of a heterogeneous computing power dynamic scheduling calculation method.

[0066] Phase 1: System Initialization At this point, pre-trained models, perceptron models, and TD3 models can be loaded.

[0067] The pre-trained model can be a Reinforcement-Augmented and Variance-aware Execution Network (RAVEN) model that supports the dynamic scheduling method for heterogeneous computing power. The RAVEN model includes an adaptive basic estimation layer, a multi-dimensional real-time perception layer, a reinforcement learning decision layer, an adaptive scheduling execution layer, and a continuous learning and model evolution layer. The adaptive basic estimation layer calculates the basic computation time; the multi-dimensional real-time perception layer generates a 128-dimensional multi-dimensional state vector using perceptrons related to hardware status, network status, load trends, and performance characteristics; the reinforcement learning decision layer generates initial continuous action vectors using the TD3 model; the adaptive scheduling execution layer filters the initial continuous action vectors based on preset constraints; and the continuous learning and model evolution layer updates the model based on data in the stored experience units and preset time update levels.

[0068] Phase 2: Task Reception The received target computation task is parsed to obtain the number of task processing tokens (TokenNum), the number of AI model parameters used in the task (ParamNum), the task complexity factor (ComplexityFactor), and the task type.

[0069] Phase 3: Basic Calculation Time The adaptive time calculation formula is invoked based on the task type, and the base calculation time T_base is calculated based on TokenNum, ParamNum, ComplexityFactor, and dynamic coefficients related to heterogeneous device hardware.

[0070] Phase 4: Multidimensional Perception A 32-dimensional vector is generated by a hardware state sensor, a network state sensor, a load trend sensor, and a performance characteristic related sensor. These four 32-dimensional vectors are then concatenated to obtain a 128-dimensional multidimensional state vector.

[0071] Phase 5: Construction of Multidimensional State Vectors A 256-dimensional state vector s_t is constructed by integrating the basic computation time T_base (4-dimensional vector), a 128-dimensional multidimensional perception vector, and second feature data (task priority, task dependency, historical statistical features, and remaining budget features, a 124-dimensional vector).

[0072] Phase 6: Reinforcement Learning (RL) Policy Decision The TD3 model outputs an initial continuous action vector a_t, which is used to indicate the task allocation ratio of devices such as GPU, NPU and DCU.

[0073] Phase 7: Constraint Checking and Multi-Objective Optimization The initial continuous action vector is processed through preset constraints and multi-objective constraints to obtain the target continuous action vector. The constraints include memory capacity constraints, device availability constraints, and minimum partitioning granularity constraints. The preset multi-objective optimization conditions are minimum task completion latency, maximum throughput, minimum energy consumption, and minimum load imbalance.

[0074] Phase 8: Execution of Allocation The target continuous action vector generated in stage 7 is mapped into control commands and sent to heterogeneous devices for execution, and asynchronous performance monitoring is initiated.

[0075] Phase 9: Real-time monitoring The system monitors the task completion latency, throughput, energy consumption, and load imbalance of heterogeneous devices in real time, comparing the actual detected values with preset indicator values to calculate the deviation or weighted deviation value. If the preset conditions are met based on the deviation or weighted deviation value, i.e., the indicator is not exceeded, the system proceeds to stage 10; otherwise, online rescheduling is triggered, and stages 5-7 are executed again until the performance indicators meet the preset conditions.

[0076] Phase 10: Task Completion and Feedback Collection Collect task execution results and store the multi-dimensional state vector, the target continuous action vector, the multi-dimensional state vector after each action vector is executed, and the reward value as storage experience units. A storage experience unit can be (s_t, ... ,r,s_t+1), where s_t+1 represents the multidimensional state vector after the action vector is executed, and t represents the current action execution time.

[0077] Phase 11: Feedback Learning The reinforcement learning model is updated based on data stored in the experience unit and a preset time update level, adjusting the dynamic coefficients. Updates are performed to continuously optimize the RAVEN model.

[0078] As can be seen from the overall process of the heterogeneous computing power dynamic scheduling calculation method described above, this embodiment achieves real-time perception and dynamic adaptation to hardware status, network topology, load trends, and device performance variance through a dual-loop mechanism of online rescheduling and global feedback learning. Compared with existing technologies, this embodiment can reduce the load imbalance rate of heterogeneous device clusters, control the prediction error within 5%, shorten the adaptation time from several hours to several minutes, and achieve an optimal balance among multiple objectives such as throughput, latency, and energy consumption, significantly improving the resource utilization efficiency of AI computing power and reducing cost.

[0079] Corresponding to the heterogeneous computing power dynamic scheduling calculation method in the above embodiment, Figure 3 This is a structural block diagram of a heterogeneous computing power dynamic scheduling computing device provided in one embodiment of this application. For ease of explanation, only the parts relevant to the embodiment of this application are shown. References Figure 3 The heterogeneous computing power dynamic scheduling computing device 20 includes: a data acquisition module 21, a data computing module 22, and a task execution module 23.

[0080] The data acquisition module 21 is used to acquire the first feature data and multi-dimensional perception vector corresponding to the target computing task to be executed on the heterogeneous device. The multi-dimensional perception vector is used to represent the hardware status, network status, load trend and performance characteristics of the heterogeneous device; the first feature data is used to represent the inherent attributes of the task itself. Data calculation module 22 is used to calculate the basic calculation time to complete the target calculation task based on the first feature data and the dynamic coefficients related to the heterogeneous device hardware using a preset time calculation formula; the dynamic coefficients are determined according to the historical performance indicators of different heterogeneous devices. The task execution module 23 is used to construct a multidimensional state vector based on the basic computing time, the second feature data of the target computing task, and the multidimensional perception vector, and to execute the task allocation steps based on the multidimensional state vector; the second feature data is used to represent the scheduling and management attributes of the task. When assigning tasks, task execution module 23 is specifically used for: The multidimensional state vector is input into the reinforcement learning model to obtain the initial continuous action vector; The initial continuous action vector is filtered based on preset constraints to obtain the filtered initial continuous action vector; the target continuous action vector is optimized based on preset multi-objective optimization conditions. The target's continuous motion vectors are mapped into control commands and sent to heterogeneous devices for execution. In one embodiment of this application, the first feature data includes the number of task processing tokens, the number of AI model parameters used in the task, and the task complexity factor.

[0081] In one embodiment of this application, the heterogeneous computing power dynamic scheduling computing device 20 further includes a performance judgment module for obtaining the performance indicators of heterogeneous devices. If the performance metrics do not meet the preset conditions, the task allocation step will be executed again until the performance metrics meet the preset conditions.

[0082] In one embodiment of this application, the heterogeneous computing power dynamic scheduling computing device 20 further includes an update module, which is used to: after obtaining the performance indicators of the heterogeneous devices: If the performance indicators meet the preset conditions, the multidimensional state vector, the target continuous action vector, the multidimensional state vector after each action vector is executed, and the reward value will be stored as storage experience units. In response to receiving the completion instruction for the target computation task, the dynamic coefficients are updated based on the data in the stored experience unit and the preset time update level, and the reinforcement learning model is updated based on the data in the stored experience unit and the preset time update level.

[0083] In one embodiment of this application, the reward value corresponding to the reinforcement learning model is calculated as follows: The reward value is calculated based on the throughput change rate, latency change rate, energy consumption change rate of heterogeneous devices and their respective weighting coefficients.

[0084] In one embodiment of this application, the heterogeneous device includes multiple computing execution devices, and the constraints include: The video memory capacity constraint means that the video memory required for the amount of tasks allocated to the computing power execution device is less than or equal to the remaining video memory of the computing power execution device; Device availability constraint indicates that the computing power execution device is online and in a non-faulty state; The minimum granularity constraint means that the amount of subtasks after the target computation task is divided is not less than the minimum amount of tasks to be executed.

[0085] In one embodiment of this application, the performance indicators of heterogeneous devices include throughput, latency, device power consumption, and load utilization. The preset conditions include: The weighted deviations of throughput, latency, equipment energy consumption, and load utilization are less than the preset weighted deviation; and / or, The deviation of any performance indicator is less than the preset deviation value of that performance indicator.

[0086] In summary, the heterogeneous computing power dynamic scheduling calculation method and heterogeneous computing power dynamic scheduling calculation device provided in this application have at least the following beneficial effects: 1. The embodiments of this application can increase the overall resource utilization rate of heterogeneous device clusters from less than 50% in traditional methods to more than 85%, reduce the load imbalance by more than 60%, and achieve a 40%-60% increase in task throughput or a 30%-50% reduction in task completion latency.

[0087] 2. The online rescheduling and global feedback learning dual-loop mechanism of this application embodiment realizes real-time perception and dynamic adaptation of hardware status, network topology, load trend and device performance variance, solving the problem that the existing technology cannot perceive or respond to real-time fluctuations in hardware performance, which leads to devices with weak performance or poor status becoming the technical bottleneck of the entire computing process.

[0088] 3. As the running time increases and experience data accumulates, the parameters of each model in the embodiments of this application will become more in line with reality, and the overall performance of the equipment will continue to improve, providing enterprises with long-term value preservation and appreciation capabilities for their investments in AI infrastructure.

[0089] See Figure 4 , Figure 4 This is a schematic block diagram of an electronic device provided according to an embodiment of this application. Figure 4 The electronic device 300 in this embodiment may include one or more processors 301, one or more input devices 302, one or more output devices 303, and one or more memories 304. The processors 301, input devices 302, output devices 303, and memories 304 communicate with each other via a communication bus 305. The memories 304 store computer programs, including program instructions. The processors 301 execute the program instructions stored in the memories 304. Specifically, the processors 301 are configured to invoke the program instructions to perform the functions of the modules in the aforementioned device embodiments, for example... Figure 3 The functions of the data acquisition module 21, data calculation module 22, and task execution module 23 are shown.

[0090] It should be understood that, in the embodiments of this application, the processor 301 may be a central processing unit (CPU), but it may also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor.

[0091] Input device 302 may include a touchpad, a fingerprint sensor (for collecting the user's fingerprint information and fingerprint orientation information), a microphone, etc., and output device 303 may include a display (LCD, etc.), a speaker, etc.

[0092] The memory 304 may include read-only memory and random access memory, and provides instructions and data to the processor 301. A portion of the memory 304 may also include non-volatile random access memory.

[0093] In specific implementations, the processor 301, input device 302, and output device 303 described in the embodiments of this application can execute the implementation method described in the heterogeneous computing power dynamic scheduling calculation method provided in the embodiments of this application, or they can execute the implementation method of the electronic device described in the embodiments of this application, which will not be repeated here.

[0094] In another embodiment of this application, a computer-readable storage medium is provided. This computer-readable storage medium stores a computer program, which includes program instructions. When executed by a processor, the program instructions implement all or part of the processes in the methods described above. Alternatively, the computer program can instruct related hardware to complete the process. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include any entity or device capable of carrying computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc.

[0095] The computer-readable storage medium can be an internal storage unit of the electronic device in any of the foregoing embodiments, such as a hard disk or memory of the electronic device. The computer-readable storage medium can also be an external storage device of the electronic device, such as a plug-in hard disk, smart media card (SMC), secure digital card (SD), flash card, etc., equipped on the electronic device. Furthermore, the computer-readable storage medium can include both internal and external storage units of the electronic device. The computer-readable storage medium is used to store computer programs and other programs and data required by the electronic device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

[0096] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this application.

[0097] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the electronic devices and units described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0098] In the several embodiments provided in this application, it should be understood that the disclosed electronic devices and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces or modules, or it may be an electrical, mechanical, or other form of connection.

[0099] The modules described as separate components may or may not be physically separate. Similarly, the components shown as modules may or may not be physical modules; they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of the embodiments of this application, depending on actual needs.

[0100] Furthermore, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated modules described above can be implemented in hardware or as software functional modules.

[0101] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for dynamic scheduling of heterogeneous computing power, characterized in that, include: Acquire first feature data and multidimensional perception vector corresponding to the target computing task to be executed on heterogeneous devices. The multidimensional perception vector is used to represent the hardware status, network status, load trend and performance characteristics of heterogeneous devices. The first feature data is used to represent the inherent attributes of the task itself. Based on the first feature data and the dynamic coefficients related to the heterogeneous device hardware, the basic computing time to complete the target computing task is calculated using a preset time calculation formula. The dynamic coefficient is determined based on the historical performance indicators of different heterogeneous devices; A multidimensional state vector is constructed based on the basic computing time, the second feature data of the target computing task, and the multidimensional perception vector, and the task allocation step is executed based on the multidimensional state vector; The second feature data is used to represent the scheduling and management attributes of the task; The task allocation steps include: The multidimensional state vector is input into the reinforcement learning model to obtain the initial continuous action vector; The initial continuous action vector is filtered based on preset constraints to obtain a filtered initial continuous action vector; the filtered initial continuous action vector is optimized based on preset multi-objective optimization conditions to obtain a target continuous action vector. The target continuous motion vector is mapped into control commands and sent to the heterogeneous device for execution.

2. The heterogeneous computing power dynamic scheduling calculation method as described in claim 1, characterized in that, The first feature data includes the number of task processing tokens, the number of AI model parameters used in the task, and the task complexity factor.

3. The heterogeneous computing power dynamic scheduling calculation method as described in claim 1, characterized in that, Also includes: Obtain performance metrics for heterogeneous devices; If the performance indicators do not meet the preset conditions, the task allocation step is executed again until the performance indicators meet the preset conditions.

4. The heterogeneous computing power dynamic scheduling calculation method as described in claim 3, characterized in that, After obtaining the performance metrics of the heterogeneous devices, the method further includes: If the performance indicators meet the preset conditions, the multidimensional state vector, the target continuous action vector, the multidimensional state vector after each action vector is executed, and the reward value are stored as storage experience units. In response to receiving the completion instruction for the target computation task, the dynamic coefficients are updated based on the data in the storage experience unit and the preset time update level, and the reinforcement learning model is updated based on the data in the storage experience unit and the preset time update level.

5. The heterogeneous computing power dynamic scheduling calculation method as described in claim 1, characterized in that, The calculation method for the reward value corresponding to the reinforcement learning model includes: The reward value is calculated based on the throughput change rate, latency change rate, energy consumption change rate of heterogeneous devices and their respective weighting coefficients.

6. The heterogeneous computing power dynamic scheduling calculation method as described in claim 1, characterized in that, The heterogeneous devices include multiple computing power execution devices, and the constraints include: The video memory capacity constraint means that the video memory required for the amount of tasks allocated to the computing power execution device is less than or equal to the remaining video memory of the computing power execution device. Device availability constraint indicates that the computing power execution device is online and in a non-faulty state; The minimum granularity constraint means that the task quantity of the subtasks after the target computation task is divided is not less than the minimum execution task quantity.

7. The heterogeneous computing power dynamic scheduling calculation method as described in claim 3, characterized in that, The performance indicators of the heterogeneous devices include throughput, latency, device energy consumption, and load utilization. The preset conditions include: The weighted deviation of the throughput, the latency, the device energy consumption, and the load utilization is less than a preset weighted deviation; and / or, The deviation of any performance indicator is less than the preset deviation value of that performance indicator.

8. A heterogeneous computing power dynamic scheduling computing device, characterized in that, include: The data acquisition module is used to acquire first feature data and multi-dimensional perception vectors corresponding to the target computing task to be executed on heterogeneous devices. The multi-dimensional perception vectors are used to represent the hardware status, network status, load trend and performance characteristics of heterogeneous devices. The first feature data is used to represent the inherent attributes of the task itself. The data calculation module is used to calculate the basic calculation time to complete the target calculation task based on the first feature data and the dynamic coefficients related to the heterogeneous device hardware using a preset time calculation formula. The dynamic coefficient is determined based on the historical performance indicators of different heterogeneous devices; The task execution module is used to construct a multidimensional state vector based on the basic computing time, the second feature data of the target computing task, and the multidimensional perception vector, and to execute the task allocation step based on the multidimensional state vector; the second feature data is used to represent the scheduling and management attributes of the task; When assigning tasks, the task execution module is specifically used for: The multidimensional state vector is input into the reinforcement learning model to obtain the initial continuous action vector; The initial continuous action vector is filtered based on preset constraints to obtain a filtered initial continuous action vector; the filtered initial continuous action vector is optimized based on preset multi-objective optimization conditions to obtain a target continuous action vector. The target continuous motion vector is mapped into control commands and sent to the heterogeneous device for execution.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method as described in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 7.