Method, system and device for realizing elastic scheduling of AI computing resources based on service awareness, processor and readable storage medium thereof
By defining criticality levels and iteration cycle stages for business projects, combining weight coefficients and historical usage quotas, dynamically calculating priority scores, and adopting a smooth window preemption mechanism and serialized storage, the problem of insufficient business awareness in existing AI computing power scheduling is solved, and a scheduling strategy with efficient resource utilization and low loss is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUOTAI JUNAN SECURITIES CO LTD
- Filing Date
- 2026-05-06
- Publication Date
- 2026-06-30
AI Technical Summary
Existing AI computing power scheduling strategies lack business awareness, making it difficult to guarantee the quality of core business services during resource competition, causing R&D pace to be out of sync, high costs of preemption and recovery, inaccurate resource estimation, and the existing preemption mechanism to lead to loss of computing progress and waste of resources.
By defining business criticality levels and iteration cycle stages for each business project, combining weighting coefficients and historical usage quotas, priority scores are dynamically calculated, and a smooth window preemption mechanism and serialized storage are adopted to achieve flexible resource scheduling and preemption decisions.
It improved resource utilization, reduced preemption losses, ensured the service quality and computing progress of core businesses, and enhanced the overall efficiency of the cluster.
Smart Images

Figure CN122309175A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and more particularly to the intersection of cloud computing, distributed computing and artificial intelligence technologies. Specifically, it refers to a method, system, device, processor and computer-readable storage medium for achieving elastic scheduling of AI computing resources based on business awareness. Background Technology
[0002] Currently, in the field of AI computing power scheduling, the main scheduling strategies include: (1) Static allocation based on quotas: A fixed resource quota (such as the number of CPU / GPU cores, memory size) is allocated to each business project or user. This method is simple to configure, but it is easy to cause resource waste (quota not being fully utilized) or resource bottlenecks (insufficient resources during peak periods), and lacks flexibility; (2) Load-based elastic scaling: By monitoring resource utilization (such as CPU utilization and GPU memory usage), the number of instances or the size of resources can be automatically scaled up or down. This method only focuses on the underlying physical metrics and cannot distinguish the value of the business. When resources are scarce, it may lead to non-core businesses preempting resources from core businesses; (3) Priority-based preemptive scheduling: Allows high-priority tasks to preempt resources from low-priority tasks. Existing priorities are usually statically set by the administrator and cannot be dynamically changed according to the life cycle stage of the task. Moreover, the preemption process often adopts the method of "forced stop", which leads to the loss of the calculation progress of the preempted task and the high cost of recovery.
[0003] Problems exist with existing technology: (1) Lack of business awareness: Existing schedulers have difficulty understanding the strategic value of upper-layer services and cannot guarantee the quality of service (QoS) of core services when competing for resources. (2) Disconnection of R&D rhythm: The scheduling system cannot perceive the R&D stage of the task (such as different stages such as design, development, and testing), which causes the delivery schedule of test tasks at key nodes to be affected by queuing. (3) High preemption recovery cost: Traditional preemption mechanisms usually terminate low-priority processes directly without considering the recovery cost of interrupted tasks (such as loss of model training progress and need to reload memory data), which leads to a decrease in the overall efficiency of the cluster. (4) Inaccurate resource estimates: Users often overestimate their resource requirements, and the system lacks an effective dynamic correction mechanism to curb this waste. Summary of the Invention
[0004] The purpose of this invention is to overcome the shortcomings of the prior art and provide a method, system, device, processor and computer-readable storage medium for elastic scheduling of AI computing resources based on business awareness, which has high resource utilization, low preemption loss and wide applicability.
[0005] To achieve the above objectives, the present invention provides a method, system, apparatus, processor, and computer-readable storage medium for elastic scheduling of AI computing resources based on business awareness, as follows: The main feature of this method for achieving elastic scheduling of AI computing resources based on business awareness is that the method includes the following steps: (1) Predefine the business criticality level BK for each business project, and map the business criticality level BK to the corresponding weight coefficient V. bk ; (2) By connecting to the API interface of the project management platform, the task information of the business project can be obtained in real time or periodically, the current iteration cycle stage of the task can be automatically identified, and the corresponding iteration cycle dynamic weight coefficient C can be predefined for different iteration cycle stages. (3) For routine non-urgent tasks, based on the business criticality level weight coefficient Vbk, iteration cycle dynamic weight coefficient C, expected computing power consumption quota EPC, project historical usage average quota HQ, and computing power cluster total resource quota THQ, a dynamically changing priority score PF is generated through a preset priority calculation function. (4) Set a basic quota BHQ and an elastic quota FHQ for each business project. Based on the dynamic priority score PF, dynamically calculate and allocate the elastic quota FHQ of each business project in the preset adjustment period CA to obtain the total quota AHQ. At the same time, monitor the actual computing power consumption. For business projects whose actual computing power consumption is lower than the average computing power consumption, reduce the expected computing power consumption quota EPC of the business project in the next adjustment period. (5) When an emergency task is received, a resource preemption assessment is performed. If it is determined that preemption should be performed, a smooth window preemption mechanism is executed. (6) Iterate through steps (1) to (5) until all business tasks are completed.
[0006] Preferably, the business criticality level BK is divided based on at least one of the following factors: business strategic importance, revenue contribution, and user influence, and is divided into no fewer than three levels.
[0007] Preferably, the iteration cycle is divided into no less than three stages, including at least a development stage and a system testing stage, and the iteration cycle dynamic weight coefficient C corresponding to the system testing stage is higher than that of the development stage.
[0008] Preferably, in step (3), the calculation function generates a dynamically changing priority score PF, specifically as follows: The function generates a dynamically changing priority score PF according to the following formula: Where α, β, and γ are preset weighting coefficients that satisfy α+β+γ=1, and HQ is the arithmetic average of the historical usage quota of the business project.
[0009] Preferably, the calculation of the flexible quota FHQ in step (4) is specifically as follows: Calculate the flexible quota (FHQ) using the following formula: in, Let be the priority score of the i-th task. The sum of priority scores for all tasks. The preset basic quota, This represents the total number of tasks.
[0010] Preferably, the reduction in expected computing power consumption quota (EPC) is a preset value between 10% and 50%.
[0011] Preferably, step (5) specifically includes the following steps: When a new task marked as an urgent task is received, a resource preemption assessment is performed, and a preemption decision is generated based on a comparison of the priority and benefits of the urgent task and the potentially preempted task. If preemption is determined, a smooth window preemption mechanism is executed, which gradually reduces the resource limit of the preempted task within a smooth time window, while serializing and saving the current running data of the preempted task to persistent storage media.
[0012] Preferably, step (5) further includes an emergency task reporting and control mechanism, specifically: Each business project can only submit a maximum of EM emergency tasks within a preset period, where EM is a preset positive integer.
[0013] Preferably, the smooth window preemption mechanism is implemented in step (5), which specifically includes the following steps: (1) Send an early warning command to the potentially preempted task to trigger the serialization and saving of its running data; (2) Within the preset smooth time window, the computing power resource limit of the preempted task is gradually reduced in a gradient descent manner, and the released resources are allocated to the emergency task in real time.
[0014] The system for elastic scheduling of AI computing resources based on business awareness is characterized by the following: The business awareness module is used to store and maintain the business criticality level BK and weight coefficient Vbk of each business project, identify the iteration cycle stage of the task and maintain the corresponding dynamic weight coefficient C of the iteration cycle. The dynamic priority calculation module is used to generate a dynamic priority score PF for regular tasks based on the business criticality level weight coefficient Vbk, the iteration cycle dynamic weight coefficient C, the expected computing power consumption quota EPC, the project's historical average usage quota HQ, and the total resource quota of the computing power cluster THQ, through a preset priority calculation function. The elastic quota scheduling module is used to dynamically adjust the elastic quota FHQ and total quota AHQ of each business project based on the priority score PF, and to perform negative feedback adjustment according to the actual computing power consumption of the business project, and dynamically correct the expected computing power consumption quota EPC. The preemption decision module is used to receive emergency task requests and perform resource preemption assessment, and generate a preemption decision based on priority benefit comparison. If it is determined to perform preemption, a smooth window preemption mechanism is executed to gradually reduce the resource limit of the preempted task within a smooth time window, while triggering the serialization and saving of the running data of the preempted task. The scheduling and execution module is used to schedule business project tasks and allocate computing resources, and coordinate the cyclical operation of various modules.
[0015] The device for elastic scheduling of AI computing resources based on business awareness is characterized in that the device includes: A processor is configured to execute computer-executable instructions; The memory stores one or more computer-executable instructions, which, when executed by the processor, implement the various steps of the above-described method for elastic scheduling of AI computing resources based on business awareness.
[0016] The processor for elastic scheduling of AI computing resources based on business awareness is characterized in that the processor is configured to execute computer-executable instructions, and when the computer-executable instructions are executed by the processor, the various steps of the above-mentioned method for elastic scheduling of AI computing resources based on business awareness are implemented.
[0017] The computer-readable storage medium is characterized in that it stores a computer program that can be executed by a processor to implement the various steps of the above-described method for elastic scheduling of AI computing resources based on business awareness.
[0018] This invention employs a method, system, device, processor, and computer-readable storage medium for elastic scheduling of AI computing resources based on business awareness. This achieves deep integration of business and technology. By introducing business criticality levels and iteration cycle stages, the computing power scheduling strategy can perceive the strategic value and timeliness requirements of the business, prioritizing the service quality of core businesses and critical testing phases during resource competition. This invention improves resource utilization by dynamically reducing the expected coefficient of inflated task requests through a negative feedback adjustment mechanism based on actual consumption, effectively suppressing users' false reporting of resource needs and thus improving the overall utilization of the cluster. This invention reduces preemption losses by introducing "serialized preservation" and "smoothing time windows" mechanisms to preserve the computation progress of preempted tasks before preemption occurs, avoiding the waste of computing resources caused by traditional "violent termination" methods and maximizing the effective computing efficiency of the cluster. Attached Figure Description
[0019] Figure 1 This is a schematic diagram of the overall process of the method for elastic scheduling of AI computing resources based on business awareness according to the present invention.
[0020] Figure 2 This is a module structure and data flow diagram of the method for elastic scheduling of AI computing resources based on business awareness according to the present invention. Detailed Implementation
[0021] To more clearly describe the technical content of the present invention, the following description is provided in conjunction with specific embodiments.
[0022] The present invention provides a method for elastic scheduling of AI computing resources based on business awareness, comprising the following steps: (1) Predefine the business criticality level BK for each business project, and map the business criticality level BK to the corresponding weight coefficient V. bk ; (2) By connecting to the API interface of the project management platform, the task information of the business project can be obtained in real time or periodically, the current iteration cycle stage of the task can be automatically identified, and the corresponding iteration cycle dynamic weight coefficient C can be predefined for different iteration cycle stages. (3) For routine non-urgent tasks, based on the business criticality level weight coefficient Vbk, iteration cycle dynamic weight coefficient C, expected computing power consumption quota EPC, project historical usage average quota HQ, and computing power cluster total resource quota THQ, a dynamically changing priority score PF is generated through a preset priority calculation function. (4) Set a basic quota BHQ and an elastic quota FHQ for each business project. Based on the dynamic priority score PF, dynamically calculate and allocate the elastic quota FHQ of each business project in the preset adjustment period CA to obtain the total quota AHQ. At the same time, monitor the actual computing power consumption. For business projects whose actual computing power consumption is lower than the average computing power consumption, reduce the expected computing power consumption quota EPC of the business project in the next adjustment period. (5) When an emergency task is received, a resource preemption assessment is performed. If it is determined that preemption should be performed, a smooth window preemption mechanism is executed. (6) Iterate through steps (1) to (5) until all business tasks are completed.
[0023] As a preferred embodiment of the present invention, the business criticality level BK is divided based on at least one of the following factors: business strategic importance, revenue contribution, and user influence scope, and is divided into no less than three levels.
[0024] In a preferred embodiment of the present invention, the iteration cycle is divided into no less than three stages, including at least a development stage and a system testing stage, and the iteration cycle dynamic weight coefficient C corresponding to the system testing stage is higher than that of the development stage.
[0025] In a preferred embodiment of the present invention, the calculation function in step (3) generates a dynamically changing priority score PF, specifically as follows: The function generates a dynamically changing priority score PF according to the following formula: Where α, β, and γ are preset weighting coefficients that satisfy α+β+γ=1, and HQ is the arithmetic average of the historical usage quota of the business project.
[0026] In a preferred embodiment of the present invention, the calculation of the flexible quota FHQ in step (4) is specifically as follows: Calculate the flexible quota (FHQ) using the following formula: in, Let be the priority score of the i-th task. The sum of priority scores for all tasks. The preset basic quota, This represents the total number of tasks.
[0027] As a preferred embodiment of the present invention, the reduction of the expected computing power consumption quota EPC is a preset value between 10% and 50%.
[0028] In a preferred embodiment of the present invention, step (5) specifically includes the following steps: When a new task marked as an urgent task is received, a resource preemption assessment is performed, and a preemption decision is generated based on a comparison of the priority and benefits of the urgent task and the potentially preempted task. If preemption is determined, a smooth window preemption mechanism is executed, which gradually reduces the resource limit of the preempted task within a smooth time window, while serializing and saving the current running data of the preempted task to persistent storage media.
[0029] In a preferred embodiment of the present invention, step (5) further includes an emergency task reporting and control mechanism, specifically: Each business project can only submit a maximum of EM emergency tasks within a preset period, where EM is a preset positive integer.
[0030] In a preferred embodiment of the present invention, the smooth window preemption mechanism is executed in step (5), which specifically includes the following steps: (1) Send an early warning command to the potentially preempted task to trigger the serialization and saving of its running data; (2) Within the preset smooth time window, the computing power resource limit of the preempted task is gradually reduced in a gradient descent manner, and the released resources are allocated to the emergency task in real time.
[0031] The present invention discloses a system for elastic scheduling of AI computing resources based on business awareness, wherein the system comprises: The business awareness module is used to store and maintain the business criticality level BK and weight coefficient Vbk of each business project, identify the iteration cycle stage of the task and maintain the corresponding dynamic weight coefficient C of the iteration cycle. The dynamic priority calculation module is used to generate a dynamic priority score PF for regular tasks based on the business criticality level weight coefficient Vbk, the iteration cycle dynamic weight coefficient C, the expected computing power consumption quota EPC, the project's historical average usage quota HQ, and the total resource quota of the computing power cluster THQ, through a preset priority calculation function. The elastic quota scheduling module is used to dynamically adjust the elastic quota FHQ and total quota AHQ of each business project based on the priority score PF, and to perform negative feedback adjustment according to the actual computing power consumption of the business project, and dynamically correct the expected computing power consumption quota EPC. The preemption decision module is used to receive emergency task requests and perform resource preemption assessment, and generate a preemption decision based on priority benefit comparison. If it is determined to perform preemption, a smooth window preemption mechanism is executed to gradually reduce the resource limit of the preempted task within a smooth time window, while triggering the serialization and saving of the running data of the preempted task. The scheduling and execution module is used to schedule business project tasks and allocate computing resources, and coordinate the cyclical operation of various modules.
[0032] The apparatus of the present invention for elastic scheduling of AI computing resources based on business awareness, wherein the apparatus includes: A processor is configured to execute computer-executable instructions; The memory stores one or more computer-executable instructions, which, when executed by the processor, implement the various steps of the above-described method for elastic scheduling of AI computing resources based on business awareness.
[0033] The processor of the present invention for elastic scheduling of AI computing resources based on business awareness is configured to execute computer-executable instructions. When the computer-executable instructions are executed by the processor, the various steps of the above-mentioned method for elastic scheduling of AI computing resources based on business awareness are implemented.
[0034] The computer-readable storage medium of the present invention stores a computer program thereon, which can be executed by a processor to implement the various steps of the above-described method for elastic scheduling of AI computing resources based on business awareness.
[0035] This invention is a method for efficient and intelligent resource scheduling and management of computationally intensive tasks such as machine learning model training and inference in a shared AI computing power pool (such as a GPU / NPU cluster).
[0036] This invention discloses a business-aware AI computing resource elastic scheduling method and system, belonging to the interdisciplinary field of cloud computing, distributed computing, and artificial intelligence technologies. Addressing the characteristics of heterogeneous AI computing clusters, this invention proposes an intelligent scheduling scheme that integrates business value awareness and cost awareness. The method includes: constructing a multi-dimensional dynamic priority evaluation model based on the strategic importance of the business and the iteration cycle stage, enabling resource allocation towards high-value, high-time-sensitivity tasks; introducing a negative feedback adjustment mechanism based on actual consumption to dynamically correct the expected computing power consumption coefficient and suppress resource overstatement; and conducting preemption assessment for urgent task demands, achieving lossless or low-loss preemption through serialization and smoothing time windows. This invention can significantly improve the resource utilization of AI computing clusters, ensure the quality of core business services, and effectively reduce computing progress losses caused by task preemption, making it suitable for large-scale AI training and inference scenarios.
[0037] The business-aware AI computing resource elastic scheduling method of the present invention includes the following steps: Step 1: Predefine a business criticality level BK for each business project, and map BK to a specific weight coefficient V. bkThe business criticality level BK is divided into multiple preset levels based on at least one of the following factors: the strategic importance of the business project, its revenue contribution, and the scope of user influence.
[0038] Step 2: By connecting to the project management platform's API interface, obtain task information of business projects in real time or periodically, automatically identify the current iteration cycle stage of the task, and predefine the dynamic weight coefficient C of the iteration cycle for different stages.
[0039] Step 3: For routine, non-urgent tasks, based on the aforementioned business criticality level weighting coefficient V bk The iterative cycle dynamic weight coefficient C, the expected computing power consumption quota EPC, the project's historical average usage quota HQ, and the computing power cluster's total resource quota THQ are used to generate a dynamically changing priority score PF through a preset priority calculation function.
[0040] Step 4: Set a basic quota (BHQ) and an elastic quota (FHQ) for each business project, and dynamically adjust the elastic quota (FHQ) and the total quota (AHQ) of each business project based on the priority score (PF) at a preset adjustment period (CA). At the same time, monitor the actual computing power consumption of each business project in real time. If the monitoring finds that the actual computing power consumption of a business project is lower than the average computing power consumption of all business projects, then reduce the expected computing power consumption quota (EPC) of that business project in the next adjustment period.
[0041] Step 5: When a new task marked as an urgent task is received, a resource preemption assessment is performed. If preemption is determined, a smooth window preemption mechanism is executed. Specifically: When a new task marked as an emergency task is received, the resource status serialization recovery cost of at least one potentially preempted task is evaluated, and a preemption decision is generated based on a priority benefit comparison between the emergency task and the potentially preempted task. If preemption is determined, the resource limit of the preempted task is gradually reduced by setting a smooth time window, and the current running status data of the business project is serialized and saved to persistent storage medium.
[0042] Step 6: Iterate through steps 1 to 5 until all business tasks are completed.
[0043] The business criticality level BK in step 1 is divided into no fewer than three levels.
[0044] The iteration cycle in step 2 is divided into no fewer than three stages, including at least the development stage and the system testing stage, wherein the dynamic weight coefficient C of the testing stage is higher than that of the development stage.
[0045] In step 3, the expected computing power consumption quota (EPC) is declared by the business project itself when submitting the task. The priority score calculation function comprehensively considers factors such as the business criticality level (BK), dynamic weight coefficient (C), expected computing power consumption coefficient (EPC), the project's historical average quota usage (HQ), and the total resource quota of the computing power cluster (THQ). The specific calculation formula is as follows: Wherein, α, β, and γ are preset weighting coefficients that satisfy α+β+γ=1, HQ is the arithmetic average of the historical usage quota of the business project, and for new business projects with no historical usage, HQ is set to THQ / N, where N is the total number of tasks that currently require computing resources.
[0046] The expected computing power consumption quota for EPC is declared independently by the business project when submitting the task.
[0047] The formula for calculating the flexible quota (FHQ) in step 4 is as follows: in, Let be the priority score of the i-th task. The total quota for the business project in the current period is AHQ = BHQ + FHQ, which is the sum of the priority scores of all tasks.
[0048] In step 4, the expected computing power consumption quota (EPC) for this business project is reduced by a preset value between 10% and 50%.
[0049] The specific method for reducing the expected computing power consumption coefficient (EPC) of the business project is as follows: if the actual computing power consumption of the business project is lower than the average computing power consumption of each project in the current cluster, then the expected computing power consumption quota (EPC) of the business project will be reduced in the next adjustment cycle, and the reduction will be a preset value between 10% and 50%.
[0050] Step 5 also includes a mechanism for controlling emergency task applications: regular tasks and new tasks in the current queue can be applied for as emergency tasks as needed, but each business project can only apply for a maximum of EM emergency tasks within a preset period, where EM is a preset positive integer.
[0051] In step 5, the resource limit of the preempted task is gradually reduced by setting a smooth time window. Specifically, within the smooth time window, the available resources of the preempted task are reduced step by step from the current value to the target value to coordinate with the execution of the serialization and saving operation of the running data.
[0052] The number of emergency task applications is limited. Each business project is allowed to apply for a maximum of EM emergency tasks within a preset calendar month or iteration cycle, where EM is a preset positive integer.
[0053] The smooth window preemption mechanism includes: (1) Send an early warning command to the potentially preempted task to trigger the serialization and saving of its running data; (2) Within a preset smooth time window, the computing power resource limit of the preempted task is gradually reduced in a gradient descent manner, and the released resources are allocated to the emergency task in real time.
[0054] This invention provides a business-aware AI computing resource elastic scheduling system, comprising: Business Awareness Module: Used to store and maintain the business criticality level (BK) and iteration cycle stage information for each business project; Dynamic priority calculation module: used to execute step 3 and generate dynamic priority score PF; The elastic quota scheduling module is used to execute step 4 above and generate the quotas FHQ and AHQ for each business item. Preemption Decision Module: Used to execute step 5 above, make preemption decisions, implement progressive resource limits, and serialize and save runtime data; Scheduling and execution module: Based on the output results of the above modules, it schedules business project tasks and resources.
[0055] In specific embodiments of the present invention, the following examples are provided: Example 1: This example provides a business-aware AI computing resource elastic scheduling method. For example... Figure 1 As shown, the method includes the following steps: Step (1): Predefine the business criticality level.
[0056] The administrator of the computing power scheduling platform defines a business criticality level (BK) for each connected business project based on the enterprise's strategic plan. In this embodiment, BK is divided into four levels: core business (V... bk Set as 1), Important Business (V) bk Set to 0.8), General Business (V) bk Set to 0.5), edge services (V bk Set to 0.2).
[0057] Step (2): Automatically sense the iteration cycle stage and assign weights.
[0058] The system connects to project management platforms (such as Jira and ZenTao) via API interfaces to obtain the iteration stage of tasks under each business project in real time. The testing stage has a higher weight than the design and development stages. This embodiment identifies three stages: design, development, and system testing. A dynamic weight coefficient C is assigned to each stage; for example, C=0.3 for the design stage, C=0.3 for the development stage, and C=0.4 for the system testing stage.
[0059] Step (3): Calculate the dynamic priority score for regular tasks.
[0060] For non-urgent, routine tasks submitted by users, the system requires users to declare their expected computing power consumption quota (EPC) upon submission. Simultaneously, the system records the historical average quota usage (HQ) for that business project. Preset weighting coefficients α=0.3, β=0.3, and γ=0.4 are used. The current cluster has three tasks: T1, T2, and T3. The business criticality level weighting coefficient V for each task is also specified. bk The iterative cycle dynamic weight coefficient C, the expected computing power consumption quota EPC, and the historical average usage quota HQ are respectively: With a total computing power resource allocation of THQ=100, according to the formula The dynamic priority score (PF) values for the three tasks can be calculated separately: Step (4): Flexible quota dynamic adjustment and negative feedback regulation.
[0061] The system sets a base quota (BHQ) of 2 for each project and a flexible quota (FHQ). Every adjustment period (CA) (25 hours in this example), the system recalculates and allocates the flexible quota (FHQ) based on the PF scores of all tasks. According to the formula... The elastic quota and total allocated quota for the three tasks can be calculated as follows (retaining 2 significant figures). Meanwhile, the monitoring module collects the actual computing power consumption of each project in real time. If it is found that the actual consumption of a project is lower than the declared value for two consecutive cycles and is lower than the average level of the cluster, then in the next adjustment cycle, the system will automatically reduce the expected computing power consumption coefficient EPC of the project by 20% (within the range of 10%-50%) and recalculate the dynamic priority score PF, elastic quota FHQ and total quota AHQ according to steps (3) and (4).
[0062] Step (5): Perception and preemptive decision-making for urgent tasks.
[0063] Suppose that an emergency hotfix model test task T4 is received at this time. The system checks the number of emergency task applications submitted for this business project and confirms that the number of emergency task applications in this cycle is 2, which does not exceed the preset limit EM (e.g., EM=3 times). Therefore, the system determines that task preemption can be performed. The system iterates through all currently executing tasks, and task T1 has the lowest priority score PF, so it preempts the computing resources of T1.
[0064] When preempting execution, the system does not immediately terminate task T1, but sets a smooth time window (e.g., 300 seconds). Within these 300 seconds, the computing resource limit of T1 is gradually reduced from 100% to 0%; at the same time, the current running data (such as model weights, data cache, and intermediate calculation results) is serialized and written to persistent storage media (such as distributed persistent storage such as HDFS and Ceph). Only after saving is completed will the corresponding computing resources be released.
[0065] Repeat steps (2) to (5), and continuously adjust the scheduling strategy dynamically according to changes in business stages, resource usage and new task requests until all tasks have finished running.
[0066] Example 2: This example provides a business-aware AI computing resource elastic scheduling system to implement the above method. For example... Figure 2 As shown, the system includes: Business Awareness Module: Responsible for receiving administrator configurations, defining business criticality levels, and maintaining the mapping relationship between iteration cycle stages and weight coefficients by connecting with the project management platform.
[0067] Dynamic Priority Calculation Module: Receives task information from the Business Awareness Module, and combines the information from the Business Awareness Module, the user-submitted EPC, the total resource pool quota THQ, and the HQ in the historical database to calculate the dynamic priority score PF for each regular task in real time.
[0068] Quota allocation module: Responsible for calculating the PF value output by the dynamic priority calculation module, and periodically recalculating the elastic quota FHQ and total quota AHQ for each project.
[0069] Preemption Decision Module: Responsible for handling urgent tasks. It receives urgent task requests and evaluates whether to preempt the task. If preemption is decided, resource constraints with a smooth time window are implemented; simultaneously, this module is responsible for serializing and saving runtime data and recording the saved metadata in the database.
[0070] Scheduling and execution module: Based on the output results of the above modules, it schedules business project tasks and resources, monitors cluster resource status, coordinates the cyclical operation of each module, and ensures the continuous progress of the scheduling process.
[0071] For the specific implementation scheme of this embodiment, please refer to the relevant descriptions in the above embodiments, which will not be repeated here.
[0072] It is understood that the same or similar parts in the above embodiments can be referred to each other, and the contents not described in detail in some embodiments can be referred to the same or similar contents in other embodiments.
[0073] It should be noted that in the description of this invention, the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Furthermore, in the description of this invention, unless otherwise stated, "a plurality of" means at least two.
[0074] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process, and the scope of the preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as will be understood by those skilled in the art to which embodiments of the invention pertain.
[0075] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0076] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The corresponding program can be stored in a computer-readable storage medium. When the program is executed, it includes one or a combination of the steps of the method embodiments.
[0077] Furthermore, the functional units in the various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
[0078] The storage media mentioned above can be read-only memory, disk, or optical disk, etc.
[0079] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0080] This invention employs a method, system, device, processor, and computer-readable storage medium for elastic scheduling of AI computing resources based on business awareness. This achieves deep integration of business and technology. By introducing business criticality levels and iteration cycle stages, the computing power scheduling strategy can perceive the strategic value and timeliness requirements of the business, prioritizing the service quality of core businesses and critical testing phases during resource competition. This invention improves resource utilization by dynamically reducing the expected coefficient of inflated task requests through a negative feedback adjustment mechanism based on actual consumption, effectively suppressing users' false reporting of resource needs and thus improving the overall utilization of the cluster. This invention reduces preemption losses by introducing "serialized preservation" and "smoothing time windows" mechanisms to preserve the computation progress of preempted tasks before preemption occurs, avoiding the waste of computing resources caused by traditional "violent termination" methods and maximizing the effective computing efficiency of the cluster.
[0081] In this specification, the invention has been described with reference to specific embodiments thereof. However, it will be apparent that various modifications and variations can be made without departing from the spirit and scope of the invention. Therefore, the specification and drawings should be considered illustrative rather than restrictive.
Claims
1. A method for implementing AI computing resource elastic scheduling based on service awareness, characterized in that, The method includes the following steps: (1) defining a business criticality level BK for each business item, and mapping the business criticality level BK to a corresponding weight coefficient V bk ; (2) By connecting to the API interface of the project management platform, the task information of the business project can be obtained in real time or periodically, the current iteration cycle stage of the task can be automatically identified, and the corresponding iteration cycle dynamic weight coefficient C can be predefined for different iteration cycle stages. (3) For routine non-urgent tasks, based on the business criticality level weight coefficient Vbk, iteration cycle dynamic weight coefficient C, expected computing power consumption quota EPC, project historical usage average quota HQ, and computing power cluster total resource quota THQ, a dynamically changing priority score PF is generated through a preset priority calculation function. (4) Set a basic quota BHQ and an elastic quota FHQ for each business project. Based on the dynamic priority score PF, dynamically calculate and allocate the elastic quota FHQ of each business project in the preset adjustment period CA to obtain the total quota AHQ. At the same time, monitor the actual computing power consumption. For business projects whose actual computing power consumption is lower than the average computing power consumption, reduce the expected computing power consumption quota EPC of the business project in the next adjustment period. (5) When an emergency task is received, a resource preemption assessment is performed. If it is determined that preemption should be performed, a smooth window preemption mechanism is executed. (6) Iterate through steps (1) to (5) until all business tasks are completed.
2. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, The business criticality level BK is classified based on at least one of the following factors: strategic importance of business, revenue contribution, and scope of user influence, and is divided into no fewer than three levels.
3. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, The iteration cycle is divided into no fewer than three stages, including at least a development stage and a system testing stage, and the dynamic weight coefficient C of the iteration cycle corresponding to the system testing stage is higher than that of the development stage.
4. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, In step (3), the calculation function generates a dynamically changing priority score PF, specifically as follows: The function generates a dynamically changing priority score PF according to the following formula: Where α, β, and γ are preset weighting coefficients that satisfy α+β+γ=1, and HQ is the arithmetic average of the historical usage quota of the business project.
5. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, The calculation of the flexible quota FHQ in step (4) is as follows: Calculate the flexible quota (FHQ) using the following formula: in, Let be the priority score of the i-th task. The sum of priority scores for all tasks. The preset basic quota, This represents the total number of tasks.
6. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, The reduction in expected computing power consumption quota (EPC) is a preset value between 10% and 50%.
7. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, Step (5) specifically includes the following steps: When a new task marked as an urgent task is received, a resource preemption assessment is performed, and a preemption decision is generated based on a comparison of the priority and benefits of the urgent task and the potentially preempted task. If preemption is determined, a smooth window preemption mechanism is executed, which gradually reduces the resource limit of the preempted task within a smooth time window, while serializing and saving the current running data of the preempted task to persistent storage media.
8. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, Step (5) also includes an emergency task reporting and control mechanism, specifically: Each business project can only submit a maximum of EM emergency tasks within a preset period, where EM is a preset positive integer.
9. The method for elastic scheduling of AI computing resources based on business awareness according to claim 1, characterized in that, The smooth window preemption mechanism implemented in step (5) includes the following steps: (1) Send an early warning command to the potentially preempted task to trigger the serialization and saving of its running data; (2) Within the preset smooth time window, the computing power resource limit of the preempted task is gradually reduced in a gradient descent manner, and the released resources are allocated to the emergency task in real time.
10. A system for elastic scheduling of AI computing resources based on business awareness, characterized in that, The system includes: The business awareness module is used to store and maintain the business criticality level BK and weight coefficient Vbk of each business project, identify the iteration cycle stage of the task and maintain the corresponding dynamic weight coefficient C of the iteration cycle. The dynamic priority calculation module is used to generate a dynamic priority score PF for regular tasks based on the business criticality level weight coefficient Vbk, the iteration cycle dynamic weight coefficient C, the expected computing power consumption quota EPC, the project's historical average usage quota HQ, and the total resource quota of the computing power cluster THQ, through a preset priority calculation function. The elastic quota scheduling module is used to dynamically adjust the elastic quota FHQ and total quota AHQ of each business project based on the priority score PF, and to perform negative feedback adjustment according to the actual computing power consumption of the business project, and dynamically correct the expected computing power consumption quota EPC. The preemption decision module is used to receive emergency task requests and perform resource preemption assessment, and generate a preemption decision based on priority benefit comparison. If it is determined to perform preemption, a smooth window preemption mechanism is executed to gradually reduce the resource limit of the preempted task within a smooth time window, while triggering the serialization and saving of the running data of the preempted task. The scheduling and execution module is used to schedule business project tasks and allocate computing resources, and coordinate the cyclical operation of various modules.
11. A device for elastic scheduling of AI computing resources based on business awareness, characterized in that, The device includes: A processor is configured to execute computer-executable instructions; The memory stores one or more computer-executable instructions, which, when executed by the processor, implement the steps of the method for elastic scheduling of AI computing resources based on business awareness, as described in any one of claims 1 to 9.
12. A processor for elastic scheduling of AI computing resources based on business awareness, characterized in that, The processor is configured to execute computer-executable instructions, which, when executed by the processor, implement each step of the method for elastic scheduling of AI computing resources based on business awareness, as described in any one of claims 1 to 9.
13. A computer-readable storage medium, characterized in that, It stores a computer program that can be executed by a processor to implement the steps of the method for elastic scheduling of AI computing resources based on business awareness, as described in any one of claims 1 to 9.